| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
OK bluhm@, claudio@, deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.
|
|
|
|
|
|
| |
It also translated a documented send(2) EACCES case erroneously.
This was too much magic and always prone to errors.
from Jan Klemkow; man page jmc@; OK claudio@
|
|
|
|
|
|
| |
m_leadingspace() and m_trailingspace(). Convert all callers to call
directly the functions and remove the defines.
OK krw@, mpi@
|
|
|
|
|
|
| |
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@
|
|
|
|
|
|
|
|
| |
was NULL and nothing was traced. So save the old tcpcb and use
that to retrieve some information. Note that otb may be freed and
must not be dereferenced. Use a heuristic for cases where the
address family is in the IP header but not provided in the PCB.
OK visa@
|
|
|
|
|
|
| |
the delack timer had a different implementation. Use the same
mechanism for all TCP timer.
OK mpi@ visa@
|
|
|
|
|
|
|
|
|
| |
TCP_FACK was disabled by provos@ in June 1999.
TCP_FACK is an algorithm that decides that when something is lost, all
not SACKed packets until the most forward SACK are lost. It may be a
correct estimate, if network does not reorder packets.
OK visa@ mpi@ mikeb@
|
|
|
|
| |
OK deraadt, mpi, visa, job
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
buffers.
This is one step towards unlocking TCP input path. Note that all the
functions asserting for the socket lock are not necessarilly MP-safe.
All the fields of 'struct socket' aren't protected.
Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to
tell when a filter needs to lock the underlying data structures. Logic
and name taken from NetBSD.
Tested by Hrvoje Popovski.
ok claudio@, bluhm@, mikeb@
|
|
|
|
|
|
|
|
|
| |
<netinet/tcp_debug.h>.
The IPv6 variant was always included and the IPv4 version is not
present on all systems.
Most of the offending ports are already fixed, thanks to sthen@!
|
|
|
|
| |
ok mpi@ bluhm@
|
|
|
|
| |
OK claudio@ henning@
|
|
|
|
|
|
|
|
| |
After writing data into this loop, it was spinning forever causing
a kernel hang. Detect the loop by counting how often the same mbuf
is spliced. If that happens 128 times, assume that there is a loop
and abort the splicing with ELOOP.
Bug found by tedu@; OK tedu@ millert@ benno@
|
|
|
|
| |
ok henning
|
|
|
|
|
|
|
|
| |
Appart from the usual inet6 axe murdering exercise to keep you fit, this
allows us to get rid of a lot of layer violation due to the use of per-
ifp variables to store the current hop limit.
Imputs from bluhm@, ok phessler@, florian@, bluhm@
|
|
|
|
|
|
|
| |
ifpp - XXX: just for statistics
ifpp is always NULL in all callers so that statistic confirms ifpp is
dying
OK mpi@
|
|
|
|
|
|
|
| |
a zero window condition. If you send a 0-length packet, but there
is data is the socket buffer, and neither the rexmt or persist timer
is already set, then activate the persist timer.
From FreeBSD revision 284941; OK deraadt@ markus@ mikeb@ claudio@
|
|
|
|
|
|
|
|
| |
compatibility with 4.3BSD in September 1989.
*Pick your own definition for "temporary".
ok bluhm@, claudio@, dlg@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
receiving interface in the packet header of every mbuf.
The interface pointer should now be retrieved when necessary with
if_get(). If a NULL pointer is returned by if_get(), the interface
has probably been destroy/removed and the mbuf should be freed.
Such mechanism will simplify garbage collection of mbufs and limit
problems with dangling ifp pointers.
Tested by jmatthew@ and krw@, discussed with many.
ok mikeb@, bluhm@, dlg@
|
|
|
|
|
|
|
| |
annoying trailing, leading and embedded whitespace. No change to
.o files.
ok deraadt@
|
|
|
|
|
|
|
| |
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.
ok tedu@ deraadt@
|
|
|
|
|
| |
long live the one true internet.
ok henning mikeb
|
| |
|
|
|
|
|
|
| |
ever used to pass on uint32 (for ipsec). stop that madness and just pass
the uint32, 0 in all cases but the two that pass the ipsec flowinfo.
ok deraadt reyk guenther
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid the confusion by using an appropriate name for the variable.
Note that since routing domain IDs are a subset of the set of routing
table IDs, the following idiom is correct:
rtableid = rdomain
But to get the routing domain ID corresponding to a given routing table
ID, you must call rtable_l2(9).
claudio@ likes it, ok mikeb@
|
|
|
|
|
|
|
|
|
|
|
| |
localhost connections.
The plan is to always use the routing table for addresses and routes
resolutions, so there is no future for an option that wants to bypass
it. This option has never been implemented for IPv6 anyway, so let's
just remove the IPv4 bits that you weren't aware of.
Tested a least by lteo@, guenther@ and chrisz@, ok mikeb@, benno@
|
|
|
|
| |
for localhost connections. discussed with deraadt@
|
|
|
|
|
|
|
|
| |
use the routing table there's no future for an option that wants to
bypass it. This option has never been implemented for IPv6 anyway,
so let's just remove the IPv4 bits that you weren't aware of.
Tested by florian@, man pages inputs from jmc@, ok benno@
|
|
|
|
|
|
| |
global variables to in6.h.
ok deraadt@
|
|
|
|
|
|
|
|
|
|
| |
already there, just compute it - it's dirt cheap. since that happens
very late in ip_output, the rest of the stack doesn't have to care about
checksums at all any more, if something needs to be checksummed, just
set the flag on the pkthdr mbuf to indicate so.
stop pre-computing the pseudo header checksum and incrementally updating it
in the tcp and udp stacks.
ok lteo florian
|
|
|
|
|
|
| |
This is useful to aggregate data in the kernel from multiple sources
like writes and socket splicing. It avoids sending small packets.
From FreeBSD via David Hill; OK mikeb@ henning@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
was only done when a packet traveled up the stack from pf to
tcp_input(). Now also link the state and inpcb when the packet is
going down from tcp_output() to pf. As a consequence, divert-reply
states where the initial SYN does not get an answer, can be handled
more correctly.
This change is part of a larger diff that has been backed out in
2011. Bring the feature back in small steps to see when bad things
start to happen.
OK henning deraadt
|
|
|
|
|
|
|
|
| |
with the latter
no change in md5 checksum of generated files
ok claudio@ henning@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
at least krw@, pirofti@ and todd@ have been seeing panics (todd and krw
with xxxterm not sure about pirofti) involving pool corruption while
using this commit.
krw and todd confirm that this backout fixes the problem.
ok blambert@ krw@, todd@ henning@ and kettenis@
Double link between pf states and sockets. Henning has
already implemented half of it. The additional part is: -
The pf state lookup for outgoing packets is optimized by
using mbuf->inp->state.
- For incomming tcp, udp, raw, raw6 packets the socket
lookup always is optimized by using mbuf->state->inp.
- All protocols establish the link for incomming packets.
- All protocols set the inp in the mbuf for outgoing packets.
This allows the linkage beginning with the first packet
for outgoing connections.
- In case of divert states, delete the state when the socket
closes. Otherwise new connections could match on old
states instead of being diverted to the listen socket.
ok henning@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
implemented half of it. The additional part is:
- The pf state lookup for outgoing packets is optimized by using
mbuf->inp->state.
- For incomming tcp, udp, raw, raw6 packets the socket lookup always
is optimized by using mbuf->state->inp.
- All protocols establish the link for incomming packets.
- All protocols set the inp in the mbuf for outgoing packets.
This allows the linkage beginning with the first packet for
outgoing connections.
- In case of divert states, delete the state when the socket closes.
Otherwise new connections could match on old states instead of
being diverted to the listen socket.
ok henning@
|
|
|
|
| |
ok claudio krw
|
|
|
|
|
|
|
| |
The data received on the source socket will automatically be sent
on the drain socket. This allows to write relay daemons with zero
data copy.
ok markus@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Send buffer is scaled by not accounting unacknowledged on the wire
data against the buffer limit. Receive buffer scaling is done similar
to FreeBSD -- measure the delay * bandwith product and base the
buffer on that. The problem is that our RTT measurment is coarse
so it overshoots on low delay links. This does not matter that much
since the recvbuffer is almost always empty.
Add a back pressure mechanism to control the amount of memory
assigned to socketbuffers that kicks in when 80% of the cluster
pool is used.
Increases the download speed from 300kB/s to 4.4MB/s on ftp.eu.openbsd.org.
Based on work by markus@ and djm@.
OK dlg@, henning@, put it in deraadt@
|
|
|
|
|
|
|
|
| |
ip_forward() to know the difference between blocked packets and those that
can't be forwarded (EHOSTUNREACH). Only in the latter case an ICMP should
be sent. In the other callers of ip_output() change the error back to
EHOSTUNREACH since userland may not expect EACCES on a sendto().
OK henning@, markus@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows to run isakmpd/iked/ipsecctl in multiple rdomains
independently (with "route exec"); the kernel will pickup the rdomain
from the process context of the pfkey socket and load the flows and
SAs into the matching rdomain encap routing table. The network stack
also needs to pass the rdomain to the ipsec stack to lookup the
correct rdomain that belongs to an interface/mbuf/... You can now run
individual IPsec configs per rdomain or create IPsec VPNs between
multiple rdomains on the same machine ;). Note that a primary enc(4)
in addition to enc0 interface is required per rdomain, eg. enc1 rdomain 1.
Test by some people, mostly on existing "rdomain 0" setups. Was in
snaps for some days and people didn't complain.
ok claudio@ naddy@
|
|
|
|
|
|
|
|
|
|
|
|
| |
and make it possible to bind sockets (including listening sockets!)
to rtables and not just rdomains. This changes the name of the
system calls, socket option, and ioctl. After building with this
you should remove the files /usr/share/man/cat2/[gs]etrdomain.0.
Since this removes the existing [gs]etrdomain() system calls, the
libc major is bumped.
Written by claudio@, criticized^Wcritiqued by me
|
|
|
|
|
|
|
| |
aligned, otherwise we lose on strict alignment architecture. Should fix
problems with gcc4 compiled bsd.rd's that people see on sparc64.
ok millert@, beck@, jsing@
|
|
|
|
|
|
|
|
|
| |
alternate routing table and separate them from other interfaces in distinct
routing tables. The same network can now be used in any doamin at the same
time without causing conflicts.
This diff is mostly mechanical and adds the necessary rdomain checks accross
net and netinet. L2 and IPv4 are mostly covered still missing pf and IPv6.
input and tested by jsg@, phessler@ and reyk@. "put it in" deraadt@
|
|
|
|
|
|
|
|
|
|
| |
pcb. the state key ptr in the pcb is the one that had to be used by pf
outbound. but by convention the state key pointer in the pkthdr is the one
used INbound, so pf follows its reverse pointer to find the sk to use,
and since a reverse doesn't exist for locally terminated connections the
reverse pointer is null and thus the whole game a noop.
note that this only affects packets FROM local udp/tcp sockets, for the
other direction everything works as expected.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when we first do a pcb lookup and we have a pointer to a pf state key
in the mbuf header, store the state key pointer in the pcb and a pointer
to the pcb we just found in the state key. when either the state key
or the pcb is removed, clear the pointers.
on subsequent packets inbound we can skip the pcb lookup and just use the
pointer from the state key.
on subsequent packets outbound we can skip the state key lookup and use
the pointer from the pcb.
about 8% speedup with 100 concurrent tcp sessions, should help much more
with more tcp sessions.
ok markus ryan
|
| |
|
|
|
|
| |
ok markus@ henning@
|
|
|
|
| |
ok markus@ mcbride@ henning@ deraadt@
|
| |
|