summaryrefslogtreecommitdiffstats
path: root/sys/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* filter vlan and svlan packets by default.dlg2020-07-221-1/+22
|
* Change tpmr(4) from ifconfig [-]trunkport to add|del synopsiskn2020-07-221-11/+12
| | | | | | | | | | | | | | | | | Unlike aggr(4) and trunk(4) for link aggregation, tpmr(4) bridges links similar to bridge(4) and switch(4), yet its ioctl(2) interface is that of an an aggregating interface. Change SIOCSTRUNKPORT and SIOCSTRUNKDELPORT to SIOCBRDGADD and SIOCBRDGDEL respectively and speak about members rather than ports in the manual to make ifconfig(8) accept "add" and "del" commands as expected. Status ioctls will follow such that "ifconfig tpmr" gets fixed accordingly. Discussed with dlg after mentioning the lack of aggr(4) and tpmr(4) documentation in ifconfig(8) which will follow as well after code cleanup. Feedback OK dlg
* deprecate interface input handler lists, just use one input function.dlg2020-07-228-188/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | the interface input handler lists were originally set up to help us during the intial mpsafe network stack work. at the time not all the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc) were mpsafe, so we wanted a way to avoid them by default, and only take the kernel lock hit when they were specifically enabled on the interface. since then, they have been fixed up to be mpsafe. i could leave the list in place, but it has some semantic problems. because virtual interfaces filter packets based on the order they were attached to the parent interface, you can get packets taken away in surprising ways, especially when you reboot and netstart does something different to what you did by hand. by hardcoding the order that things like vlan and bridge get to look at packets, we can document the behaviour and get consistency. it also means we can get rid of a use of SRPs which were difficult to replace with SMRs. the interface input handler list is an SRPL, which we would like to deprecate. it turns out that you can sleep during stack processing, which you're not supposed to do with SRPs or SMRs, but SRPs are a lot more forgiving and it worked. lastly, it turns out that this code is faster than the input list handling, so lots of winning all around. special thanks to hrvoje popovski and aaron bieber for testing. this has been in snaps as part of a larger diff for over a week.
* move carp_input into ether_input, instead of via an input handler.dlg2020-07-221-1/+19
| | | | | | | | carp_input is only tried after vlan and bridge handling is done, and after the ethernet packet doesnt match the parent interfaces mac address. this has been in snaps as part of a larger diff for over a week.
* move vlan_input into ether_input, instead of via an input handler.dlg2020-07-223-48/+75
| | | | | | | | | | | | | | this means there's a consistent order of processing of service delimited (vlan and svlan) packets and bridging of packets. vlan and svlan get to look at a packet first. it's only if they decline a packet that a bridge can handle it. this allows operators to slice vlans out for processing separate to the "native" vlan handling if they want. while here, this fixes up a bug in vlan_input if m_pullup needed to prepend an mbuf. this has been in snaps as part of a larger diff for over a week.
* register as a bridge port, not an input handler, on member ifaces.dlg2020-07-221-14/+26
| | | | | | | | | | | this is a step toward making all types of bridges coordinate their use of port interfaces, and is a step toward deprecating the interface input handler lists. bridge(4), switch(4), and tpmr(4) now coordinate their access so only one of them can own a port at a time. this has been in snaps as part of a larger diff for over a week.
* register as a bridge port, not an input handler, on member ifaces.dlg2020-07-221-11/+21
| | | | | | | | this is a step toward making all types of bridges coordinate their use of port interfaces, and is a step toward deprecating the interface input handler lists. this has been in snaps as part of a larger diff for over a week.
* register tpmr as a bridge port, not an input handler, on member ifaces.dlg2020-07-221-25/+53
| | | | | | | | | this is a step toward making all types of bridges coordinate their use of port interfaces, and is a step toward deprecating the interface input handler lists. it also moves tpmr away from the trunk ioctls it's currently (ab)using. this has been in snaps as part of a larger diff for over a week.
* if an iface is a bridge port, pass the packet to the bridge in ether_input.dlg2020-07-221-1/+22
| | | | | | | if the bridge declines the packet, it just returns it to ether_input to allow local deliver to proceed. this has been in snaps as part of a larger diff for over a week.
* add code to coordinate how bridges attach to ethernet interfaces.dlg2020-07-221-1/+54
| | | | | | | | | | | | | | | | | | | | | | | this is the first step in refactoring how ethernet frames are demuxed by virtual interfaces, and also in deprecating interface input list handling. we now have drivers for three types of virtual bridges, bridge(4), switch(4), and tpmr(4), and it doesn't make sense for any of them to be enabled on the same "port" interfaces at the same time. currently you can add a port interface to multiple types of bridge, but which one gets to steal the packets depends on the order in which they were attached. this creates an ether_brport structure that holds an input function for the bridge, and optionally some per port state that the bridge can use. arpcom has a single pointer to one of these structs that will be used during normal ether_input processing to see if a packet should be passed to a bridge, and will be used instead of an if input handler. because it is a single pointer, it will make sure only one bridge of any type is attached to a port at any one time. this has been in snaps as part of a larger diff for over a week.
* when calculating the ruleset's checksum, skip automatic table names.henning2020-07-211-2/+4
| | | | | | | the checksum is exclusively used for pfsync to verify rulesets are identical on all nodes. the automatic table names are random and have a near zero chance to match. found at a customer in zurich ok sashan kn
* rename PF_OPT_TABLE_PREFIX to PF_OPTIMIZER_TABLE_PFX and move it to pfvar.hhenning2020-07-211-1/+2
| | | | | OPT is misleading and usually refers to command line arguments to pfctl ok sashan kn
* Make sure to explicit_bzero() buffers holding sensitive SA data.tobhe2020-07-211-6/+11
| | | | ok kn@, patrick@
* Move insertions to `if_list' out of NET_LOCK() because KERNEL_LOCK()mvs2020-07-201-3/+6
| | | | | | | | | | | | protects this list. Also corresponding assertion added to be sure the required lock was held. This is the step to clean locking mess around `if_list'. Also we are going to protect `if_list' by it's own lock and this will allow us to avoid lock order issues in future. ok dlg@
* Add size to free(9) callskn2020-07-182-31/+35
| | | | | | | | | | | pfkeyv2_send() allocates multiple buffers using the same variable `i' to calculate their sizes, use dedicated size variables for each buffer to reuse them with free(9). For this, make pfkeyv2_policy() pass back the size of its freshly allocated buffer. Tested, feedback and OK tobhe
* Add size to free(9) callskn2020-07-181-11/+16
| | | | | | | | | | | import_identities() calls import_identity() which allocates a buffer and potentially frees it itself; if not, import_identities() uses it and frees it afterwards. Instead of crunching down the buffer size twice, make import_identity() calculate and pass it back, similar to how pfkeyv2.c:pfkeyv2_get() does it. Tested and OK tobhe
* Add size to free(9) callskn2020-07-181-5/+6
| | | | | | | | | | pfkeyv2_get() and pfkeyv2_dump_policy() allocate buffers and can pass back their sizes, those sizes are already used during copyout() and such. Make one pfkeyv2_dump_policy() call pass back the size and reuse all sizes in the respective free(9) calls. Tested and OK tobhe
* Randomize the system stoeplitz keytb2020-07-171-1/+30
| | | | | | | | | | | | | | | | | | One can prove that the Toeplitz matrix generated from a 16-bit seed is invertible if and only if the seed has odd Boolean parity. Invertibility is necessary and sufficient for the stoeplitz hash to take all 65536 possible values. Generate a system stoeplitz seed of odd parity uniformly at random. This is done by generating a random 16-bit number and then flipping its last bit if it's of even parity. This works since flipping the last bit swaps the numbers of even and odd parity, so we obtain a 2:1 mapping from all 16-bit numbers onto those with odd parity. Implementation of parity via popcount provided by naddy; input from miod, David Higgs, Matthew Martin, Martin Vahlensieck and others. ok dlg
* Use interface index instead of pointer to corresponding interfacemvs2020-07-174-32/+68
| | | | | | within pipex(4) layer. ok mpi@
* Check destruction ability before search instance of clone interface.mvs2020-07-171-4/+4
| | | | ok mpi@
* Fix races in pppacopen() caused by malloc(9).mvs2020-07-151-4/+5
| | | | ok mpi@
* Add sizes to free(9) callskn2020-07-151-6/+6
| | | | | | | All of these buffers are cleared with explicit sizes before free(), so reuse the given sizes. tested and OK tobhe
* Unbreak wg(4).tb2020-07-131-1/+2
| | | | | Previous may have fixed the build without pf(4), but broke wireguard in normal kernels: the condition NPF > 0 is false if pf.h is not in scope.
* let's be explicit about only supporting Ethernet ports as members.dlg2020-07-131-7/+7
| | | | | | | the packet parsing code expects Ethernet packets, so only allow Ethernet interfaces to be added. ok sthen@
* when adding a non-existent interface as a port, don't try create missing ones.dlg2020-07-131-9/+1
| | | | | | | | | | | | | | | this was annoying if i made a typo like "ifconfig bridge0 add gre0" instead of "ifconfig bridge0 add egre0" because it would create gre0 and then get upset cos it's not an Ethernet interface. also, it left gre0 lying around. this used to be useful when configuring a bridge on boot because interfaces used to be created when they were configured, and bridges could be configured before some virtual interfaces. however, netstart now creates all necessary interfaces before configuring any of them, so bridge being helpful isn't necessary anymore. ok kn@
* Fix build without pfkn2020-07-121-1/+3
|
* Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.patrick2020-07-1022-55/+48
| | | | ok dlg@ tobhe@
* Change users of IFQ_PURGE() to use the "new" API.patrick2020-07-107-22/+17
| | | | ok dlg@ tobhe@
* Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use thepatrick2020-07-1011-34/+23
| | | | | | "new" API. ok dlg@ tobhe@
* Kill `pppx_devs_lk' rwlock. It used only to prevent races caused bymvs2020-07-101-42/+12
| | | | | | | malloc(9) in pppxopen(). We can avoid these races without rwlock. Also we move malloc(9) out of rwlock. ok mpi@
* Set missing `IFXF_CLONED' flag to pppx(4) related `ifnet'. That shouldmvs2020-07-101-1/+2
| | | | | | prevent collecting entropy from pppx(4). ok mpi@
* add kstats for rx queues (ifiqs) and transmit queues (ifqs).dlg2020-07-072-2/+129
| | | | | | | | | | | | | | | | | | | | | | | | | this means you can observe what the network stack is trying to do when it's working with a nic driver that supports multiple rings. a nic with only one set of rings still gets queues though, and this still exports their stats. here is a small example of what kstat(8) currently outputs for these stats: em0:0:rxq:0 packets: 2292 packets bytes: 229846 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets em0:0:txq:0 packets: 1297 packets bytes: 193413 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets maxqlen: 511 packets oactive: false
* Protect the whole pipex(4) layer by NET_LOCK(). pipex(4) wasmvs2020-07-064-39/+47
| | | | | | | simultaneously protected by KERNEL_LOCK() and NET_LOCK() and now we have the only lock for it. This step reduces locking mess in this layer. ok mpi@
* pipex_rele_session() frees memory pointed by `old_session_keys'. Use it inmvs2020-07-061-2/+2
| | | | | | pipex_destroy_session() instead of pool_put(9) to prevent memory leak. ok mpi@
* It's been agreed upon that global locks should be expressed usinganton2020-07-041-5/+5
| | | | | | | | | | capital letters in locking annotations. Therefore harmonize the existing annotations. Also, if multiple locks are required they should be delimited using commas. ok mpi@
* Permit the stack to check transport and network checksums. Although the linkprocter2020-07-041-9/+3
| | | | | | | | | | provides stronger integrity checks, it needn't cover the end-to-end transport path. And it is in any case a layer violation for one layer to disable the checks of another. Skipping the network check saved ~2.4% +/- ~0.2% of cp_time (sys+intr) on the forwarding path of a 1Ghz AMD G-T40N (apu1). Other checksum speedups exist which do not skip the check. ok claudio@ kn@ stsp@
* Remove unused declaration.mvs2020-06-301-4/+1
| | | | ok deraadt yasuoka
* Add size to free(9) callkn2020-06-301-2/+2
| | | | | Size taken from if_creategroup(); OK mvs
* state import should accept AF_INET/AF_INET6 onlysashan2020-06-281-3/+12
| | | | | | Reported-by: syzbot+6fef0091252d57113bfb@syzkaller.appspotmail.com ok kn@
* kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha2020-06-2413-102/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
* Fix `IFF_RUNNING' bit handling for pppx(4) and pppac(4).mvs2020-06-241-3/+7
| | | | ok mpi@
* Enable MPSAFE start routine to keep encryption workers more active.tobhe2020-06-231-12/+12
| | | | | From Jason A. Donenfeld" <Jason (at) zx2c4.com> ok patrick@
* Increase TX mitigation backlog size for increased throughput.tobhe2020-06-231-1/+2
| | | | | From Jason A. Donenfeld" <Jason (at) zx2c4.com> ok patrick@
* add missing rcs idjasper2020-06-222-0/+4
|
* Rework checks for `pppx_ifs' tree modification.mvs2020-06-221-8/+4
| | | | | | | | - There is no panic() condition while inserting `pxi' to tree so drop RBT_FIND() to avoid two lookups. - Modify text in panic() message in delete case. ok yasuoka@ claudio@
* The interface if_ioctl routine must be called with the NET_LOCK() held.claudio2020-06-222-8/+5
| | | | | | | | | | | | For example the bridge_ioctl() function calls NET_UNLOCK() unconditionally and so calling if_ioctl() without netlock will trigger an assert because of not holding the netlock. Make sure the ioctl handlers are called with the netlock held and drop the lock for the wg(4) specific ioctls in the wg_ioctl handler. This fixes a panic in bridge_ioctl() triggered by ifconfig(8) issuing a SIOCGWG ioctl against bridge(4). This is just a workaround this needs more cleanup but at least this way the panic can not be triggered anymore. OK stsp@, tested by semarie@
* Prevent potencial `state_list' corruption while pppac(4) destroys pipex(4)mvs2020-06-221-2/+4
| | | | | | sessions by pipex_iface_fini() or by pipex_ioctl() with `PIPEXSMODE' command. ok yasuoka@
* deprecate network livelock detection using the softclock.dlg2020-06-221-38/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | livelock detection used to rely on code running at softnet blocking the softclock handling at a lower interrupt priority level. if the hard clock interrupt count diverged from one kept by a timeout, we assumed the network stack was doing too much work and we should apply backpressure to the receptions of packets. the network stack doesnt really block timeouts from firing anymore though. this is especially true on MP systems, because timeouts fire on cpu0 and the nettq thread could be somewhere else entirely. this means network activity doesn't make the softclock lose ticks, which means we aren't scaling rx ring activity like we think we are. the alternative way to detect livelock is when a driver queues packets for the stack to process, if there's too many packets built up then the input routine return value tells the driver to slow down. this enables finer grained livelock detection too. the rx ring accounting is done per rx ring, and each rx ring is tied to a specific nettq. if one of them is going too fast it shouldn't affect the others. the tick based detection was done system wide and punished all the drivers. ive converted all the drivers to the new mechanism. let's see how we go with it. jmatthew@ confirms rings still shrink, so some backpressure is being applied.
* add wg(4), an in kernel driver for WireGuard vpn communication.dlg2020-06-217-1/+5218
| | | | | | | | | | | thanks to Matt Dunwoodie and Jason A. Donenfeld for their effort. it's at least as functional as the go implementation, and maybe more so since this one works on more architectures. i'm sure there's further development that can be done, but you can say that about anything and everything that's in the tree. ok deraadt@
* add IFT_WIREGUARD.dlg2020-06-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | i'm still not a fan of the peer semantics of wireguard interfaces where each interface can have multiple peers and each peer has a set of the allowed ips configurred, aka cryptokey routing. traditionally we would use a tunnel (IFT_TUNNEL) style interface per peer, which means there's a 1:1 mapping between a peer and an interface. in turn that means you can apply policy with things like pf to the interface and it implies policy on the peer. so allowed ips inside a wg interface feels like a bandaid for a self inflicted wound to some degree. however, deraadt@ points out that the boat has sailed, and being compatible with the larger ecosystem has benefits. admins can choose to setup an interface per peer if they want too, so we get the best of both worlds. i will admit an interface per peer sucks in a concentrator situation though. that's why we still have pppac(4) as well as pppx(4). i also don't have any better ideas for how to scale or even express this kind of policy in a concentrator setting either. apologies for the teary. from Matt Dunwoodie and Jason A. Donenfeld ok deraadt@