| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike aggr(4) and trunk(4) for link aggregation, tpmr(4) bridges links
similar to bridge(4) and switch(4), yet its ioctl(2) interface is that of an
an aggregating interface.
Change SIOCSTRUNKPORT and SIOCSTRUNKDELPORT to SIOCBRDGADD and SIOCBRDGDEL
respectively and speak about members rather than ports in the manual to make
ifconfig(8) accept "add" and "del" commands as expected.
Status ioctls will follow such that "ifconfig tpmr" gets fixed accordingly.
Discussed with dlg after mentioning the lack of aggr(4) and tpmr(4)
documentation in ifconfig(8) which will follow as well after code cleanup.
Feedback OK dlg
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.
i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.
it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.
lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.
special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
|
|
|
|
| |
carp_input is only tried after vlan and bridge handling is done,
and after the ethernet packet doesnt match the parent interfaces
mac address.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this means there's a consistent order of processing of service
delimited (vlan and svlan) packets and bridging of packets. vlan
and svlan get to look at a packet first. it's only if they decline
a packet that a bridge can handle it. this allows operators to slice
vlans out for processing separate to the "native" vlan handling if
they want.
while here, this fixes up a bug in vlan_input if m_pullup needed
to prepend an mbuf.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
|
|
|
|
|
|
|
| |
this is a step toward making all types of bridges coordinate their
use of port interfaces, and is a step toward deprecating the interface
input handler lists.
bridge(4), switch(4), and tpmr(4) now coordinate their access so
only one of them can own a port at a time.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
|
|
|
|
| |
this is a step toward making all types of bridges coordinate their
use of port interfaces, and is a step toward deprecating the interface
input handler lists.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
|
|
|
|
|
| |
this is a step toward making all types of bridges coordinate their
use of port interfaces, and is a step toward deprecating the interface
input handler lists. it also moves tpmr away from the trunk ioctls
it's currently (ab)using.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
|
|
|
| |
if the bridge declines the packet, it just returns it to ether_input
to allow local deliver to proceed.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this is the first step in refactoring how ethernet frames are demuxed
by virtual interfaces, and also in deprecating interface input list
handling.
we now have drivers for three types of virtual bridges, bridge(4),
switch(4), and tpmr(4), and it doesn't make sense for any of them
to be enabled on the same "port" interfaces at the same time.
currently you can add a port interface to multiple types of bridge,
but which one gets to steal the packets depends on the order in
which they were attached.
this creates an ether_brport structure that holds an input function
for the bridge, and optionally some per port state that the bridge
can use. arpcom has a single pointer to one of these structs that
will be used during normal ether_input processing to see if a packet
should be passed to a bridge, and will be used instead of an if
input handler. because it is a single pointer, it will make sure
only one bridge of any type is attached to a port at any one time.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
|
|
|
| |
the checksum is exclusively used for pfsync to verify rulesets are identical
on all nodes. the automatic table names are random and have a near zero
chance to match. found at a customer in zurich
ok sashan kn
|
|
|
|
|
| |
OPT is misleading and usually refers to command line arguments to pfctl
ok sashan kn
|
|
|
|
| |
ok kn@, patrick@
|
|
|
|
|
|
|
|
|
|
|
|
| |
protects this list. Also corresponding assertion added to be sure the
required lock was held.
This is the step to clean locking mess around `if_list'.
Also we are going to protect `if_list' by it's own lock and this will
allow us to avoid lock order issues in future.
ok dlg@
|
|
|
|
|
|
|
|
|
|
|
| |
pfkeyv2_send() allocates multiple buffers using the same variable `i' to
calculate their sizes, use dedicated size variables for each buffer to reuse
them with free(9).
For this, make pfkeyv2_policy() pass back the size of its freshly allocated
buffer.
Tested, feedback and OK tobhe
|
|
|
|
|
|
|
|
|
|
|
| |
import_identities() calls import_identity() which allocates a buffer and
potentially frees it itself; if not, import_identities() uses it and frees
it afterwards.
Instead of crunching down the buffer size twice, make import_identity()
calculate and pass it back, similar to how pfkeyv2.c:pfkeyv2_get() does it.
Tested and OK tobhe
|
|
|
|
|
|
|
|
|
|
| |
pfkeyv2_get() and pfkeyv2_dump_policy() allocate buffers and can pass back
their sizes, those sizes are already used during copyout() and such.
Make one pfkeyv2_dump_policy() call pass back the size and reuse all sizes
in the respective free(9) calls.
Tested and OK tobhe
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One can prove that the Toeplitz matrix generated from a 16-bit seed is
invertible if and only if the seed has odd Boolean parity. Invertibility
is necessary and sufficient for the stoeplitz hash to take all 65536
possible values.
Generate a system stoeplitz seed of odd parity uniformly at random. This
is done by generating a random 16-bit number and then flipping its last
bit if it's of even parity. This works since flipping the last bit swaps
the numbers of even and odd parity, so we obtain a 2:1 mapping from all
16-bit numbers onto those with odd parity.
Implementation of parity via popcount provided by naddy; input from miod,
David Higgs, Matthew Martin, Martin Vahlensieck and others.
ok dlg
|
|
|
|
|
|
| |
within pipex(4) layer.
ok mpi@
|
|
|
|
| |
ok mpi@
|
|
|
|
| |
ok mpi@
|
|
|
|
|
|
|
| |
All of these buffers are cleared with explicit sizes before free(), so
reuse the given sizes.
tested and OK tobhe
|
|
|
|
|
| |
Previous may have fixed the build without pf(4), but broke wireguard in
normal kernels: the condition NPF > 0 is false if pf.h is not in scope.
|
|
|
|
|
|
|
| |
the packet parsing code expects Ethernet packets, so only allow
Ethernet interfaces to be added.
ok sthen@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this was annoying if i made a typo like "ifconfig bridge0 add gre0"
instead of "ifconfig bridge0 add egre0" because it would create gre0
and then get upset cos it's not an Ethernet interface. also, it
left gre0 lying around.
this used to be useful when configuring a bridge on boot because
interfaces used to be created when they were configured, and bridges
could be configured before some virtual interfaces. however, netstart
now creates all necessary interfaces before configuring any of them,
so bridge being helpful isn't necessary anymore.
ok kn@
|
| |
|
|
|
|
| |
ok dlg@ tobhe@
|
|
|
|
| |
ok dlg@ tobhe@
|
|
|
|
|
|
| |
"new" API.
ok dlg@ tobhe@
|
|
|
|
|
|
|
| |
malloc(9) in pppxopen(). We can avoid these races without rwlock. Also
we move malloc(9) out of rwlock.
ok mpi@
|
|
|
|
|
|
| |
prevent collecting entropy from pppx(4).
ok mpi@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this means you can observe what the network stack is trying to do
when it's working with a nic driver that supports multiple rings.
a nic with only one set of rings still gets queues though, and this
still exports their stats.
here is a small example of what kstat(8) currently outputs for these
stats:
em0:0:rxq:0
packets: 2292 packets
bytes: 229846 bytes
qdrops: 0 packets
errors: 0 packets
qlen: 0 packets
em0:0:txq:0
packets: 1297 packets
bytes: 193413 bytes
qdrops: 0 packets
errors: 0 packets
qlen: 0 packets
maxqlen: 511 packets
oactive: false
|
|
|
|
|
|
|
| |
simultaneously protected by KERNEL_LOCK() and NET_LOCK() and now we have
the only lock for it. This step reduces locking mess in this layer.
ok mpi@
|
|
|
|
|
|
| |
pipex_destroy_session() instead of pool_put(9) to prevent memory leak.
ok mpi@
|
|
|
|
|
|
|
|
|
|
| |
capital letters in locking annotations. Therefore harmonize the existing
annotations.
Also, if multiple locks are required they should be delimited using
commas.
ok mpi@
|
|
|
|
|
|
|
|
|
|
| |
provides stronger integrity checks, it needn't cover the end-to-end transport
path. And it is in any case a layer violation for one layer to disable the
checks of another. Skipping the network check saved ~2.4% +/- ~0.2% of cp_time
(sys+intr) on the forwarding path of a 1Ghz AMD G-T40N (apu1). Other checksum
speedups exist which do not skip the check.
ok claudio@ kn@ stsp@
|
|
|
|
| |
ok deraadt yasuoka
|
|
|
|
|
| |
Size taken from if_creategroup();
OK mvs
|
|
|
|
|
|
| |
Reported-by: syzbot+6fef0091252d57113bfb@syzkaller.appspotmail.com
ok kn@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.
This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).
There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.
There is no performance cost on 64-bit (__LP64__) platforms.
With input from visa@, dlg@, and tedu@.
Several bugs squashed by visa@.
ok kettenis@
|
|
|
|
| |
ok mpi@
|
|
|
|
|
| |
From Jason A. Donenfeld" <Jason (at) zx2c4.com>
ok patrick@
|
|
|
|
|
| |
From Jason A. Donenfeld" <Jason (at) zx2c4.com>
ok patrick@
|
| |
|
|
|
|
|
|
|
|
| |
- There is no panic() condition while inserting `pxi' to tree so drop
RBT_FIND() to avoid two lookups.
- Modify text in panic() message in delete case.
ok yasuoka@ claudio@
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example the bridge_ioctl() function calls NET_UNLOCK() unconditionally
and so calling if_ioctl() without netlock will trigger an assert because
of not holding the netlock. Make sure the ioctl handlers are called with
the netlock held and drop the lock for the wg(4) specific ioctls in the
wg_ioctl handler. This fixes a panic in bridge_ioctl() triggered by
ifconfig(8) issuing a SIOCGWG ioctl against bridge(4).
This is just a workaround this needs more cleanup but at least this way
the panic can not be triggered anymore.
OK stsp@, tested by semarie@
|
|
|
|
|
|
| |
sessions by pipex_iface_fini() or by pipex_ioctl() with `PIPEXSMODE' command.
ok yasuoka@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
livelock detection used to rely on code running at softnet blocking
the softclock handling at a lower interrupt priority level. if the
hard clock interrupt count diverged from one kept by a timeout, we
assumed the network stack was doing too much work and we should
apply backpressure to the receptions of packets.
the network stack doesnt really block timeouts from firing anymore
though. this is especially true on MP systems, because timeouts
fire on cpu0 and the nettq thread could be somewhere else entirely.
this means network activity doesn't make the softclock lose ticks,
which means we aren't scaling rx ring activity like we think we
are.
the alternative way to detect livelock is when a driver queues
packets for the stack to process, if there's too many packets built
up then the input routine return value tells the driver to slow
down. this enables finer grained livelock detection too. the rx
ring accounting is done per rx ring, and each rx ring is tied to a
specific nettq. if one of them is going too fast it shouldn't affect
the others. the tick based detection was done system wide and
punished all the drivers.
ive converted all the drivers to the new mechanism. let's see how
we go with it.
jmatthew@ confirms rings still shrink, so some backpressure is being
applied.
|
|
|
|
|
|
|
|
|
|
|
| |
thanks to Matt Dunwoodie and Jason A. Donenfeld for their effort.
it's at least as functional as the go implementation, and maybe
more so since this one works on more architectures.
i'm sure there's further development that can be done, but you can
say that about anything and everything that's in the tree.
ok deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i'm still not a fan of the peer semantics of wireguard interfaces
where each interface can have multiple peers and each peer has a
set of the allowed ips configurred, aka cryptokey routing. traditionally
we would use a tunnel (IFT_TUNNEL) style interface per peer, which
means there's a 1:1 mapping between a peer and an interface. in
turn that means you can apply policy with things like pf to the
interface and it implies policy on the peer.
so allowed ips inside a wg interface feels like a bandaid for a
self inflicted wound to some degree. however, deraadt@ points out
that the boat has sailed, and being compatible with the larger
ecosystem has benefits. admins can choose to setup an interface per
peer if they want too, so we get the best of both worlds.
i will admit an interface per peer sucks in a concentrator situation
though. that's why we still have pppac(4) as well as pppx(4). i
also don't have any better ideas for how to scale or even express
this kind of policy in a concentrator setting either.
apologies for the teary.
from Matt Dunwoodie and Jason A. Donenfeld
ok deraadt@
|