wireguard-openbsd - WireGuard implementation for the OpenBSD kernel

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	filter vlan and svlan packets by default.	dlg	2020-07-22	1	-1/+22
\|
*	Change tpmr(4) from ifconfig [-]trunkport to add\|del synopsis	kn	2020-07-22	1	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike aggr(4) and trunk(4) for link aggregation, tpmr(4) bridges links similar to bridge(4) and switch(4), yet its ioctl(2) interface is that of an an aggregating interface. Change SIOCSTRUNKPORT and SIOCSTRUNKDELPORT to SIOCBRDGADD and SIOCBRDGDEL respectively and speak about members rather than ports in the manual to make ifconfig(8) accept "add" and "del" commands as expected. Status ioctls will follow such that "ifconfig tpmr" gets fixed accordingly. Discussed with dlg after mentioning the lack of aggr(4) and tpmr(4) documentation in ifconfig(8) which will follow as well after code cleanup. Feedback OK dlg
*	deprecate interface input handler lists, just use one input function.	dlg	2020-07-22	8	-188/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the interface input handler lists were originally set up to help us during the intial mpsafe network stack work. at the time not all the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc) were mpsafe, so we wanted a way to avoid them by default, and only take the kernel lock hit when they were specifically enabled on the interface. since then, they have been fixed up to be mpsafe. i could leave the list in place, but it has some semantic problems. because virtual interfaces filter packets based on the order they were attached to the parent interface, you can get packets taken away in surprising ways, especially when you reboot and netstart does something different to what you did by hand. by hardcoding the order that things like vlan and bridge get to look at packets, we can document the behaviour and get consistency. it also means we can get rid of a use of SRPs which were difficult to replace with SMRs. the interface input handler list is an SRPL, which we would like to deprecate. it turns out that you can sleep during stack processing, which you're not supposed to do with SRPs or SMRs, but SRPs are a lot more forgiving and it worked. lastly, it turns out that this code is faster than the input list handling, so lots of winning all around. special thanks to hrvoje popovski and aaron bieber for testing. this has been in snaps as part of a larger diff for over a week.
*	move carp_input into ether_input, instead of via an input handler.	dlg	2020-07-22	1	-1/+19
\| \| \| \| \| \| \| \|	carp_input is only tried after vlan and bridge handling is done, and after the ethernet packet doesnt match the parent interfaces mac address. this has been in snaps as part of a larger diff for over a week.
*	move vlan_input into ether_input, instead of via an input handler.	dlg	2020-07-22	3	-48/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	this means there's a consistent order of processing of service delimited (vlan and svlan) packets and bridging of packets. vlan and svlan get to look at a packet first. it's only if they decline a packet that a bridge can handle it. this allows operators to slice vlans out for processing separate to the "native" vlan handling if they want. while here, this fixes up a bug in vlan_input if m_pullup needed to prepend an mbuf. this has been in snaps as part of a larger diff for over a week.
*	register as a bridge port, not an input handler, on member ifaces.	dlg	2020-07-22	1	-14/+26
\| \| \| \| \| \| \| \| \| \| \|	this is a step toward making all types of bridges coordinate their use of port interfaces, and is a step toward deprecating the interface input handler lists. bridge(4), switch(4), and tpmr(4) now coordinate their access so only one of them can own a port at a time. this has been in snaps as part of a larger diff for over a week.
*	register as a bridge port, not an input handler, on member ifaces.	dlg	2020-07-22	1	-11/+21
\| \| \| \| \| \| \| \|	this is a step toward making all types of bridges coordinate their use of port interfaces, and is a step toward deprecating the interface input handler lists. this has been in snaps as part of a larger diff for over a week.
*	register tpmr as a bridge port, not an input handler, on member ifaces.	dlg	2020-07-22	1	-25/+53
\| \| \| \| \| \| \| \| \|	this is a step toward making all types of bridges coordinate their use of port interfaces, and is a step toward deprecating the interface input handler lists. it also moves tpmr away from the trunk ioctls it's currently (ab)using. this has been in snaps as part of a larger diff for over a week.
*	if an iface is a bridge port, pass the packet to the bridge in ether_input.	dlg	2020-07-22	1	-1/+22
\| \| \| \| \| \| \|	if the bridge declines the packet, it just returns it to ether_input to allow local deliver to proceed. this has been in snaps as part of a larger diff for over a week.
*	add code to coordinate how bridges attach to ethernet interfaces.	dlg	2020-07-22	1	-1/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	this is the first step in refactoring how ethernet frames are demuxed by virtual interfaces, and also in deprecating interface input list handling. we now have drivers for three types of virtual bridges, bridge(4), switch(4), and tpmr(4), and it doesn't make sense for any of them to be enabled on the same "port" interfaces at the same time. currently you can add a port interface to multiple types of bridge, but which one gets to steal the packets depends on the order in which they were attached. this creates an ether_brport structure that holds an input function for the bridge, and optionally some per port state that the bridge can use. arpcom has a single pointer to one of these structs that will be used during normal ether_input processing to see if a packet should be passed to a bridge, and will be used instead of an if input handler. because it is a single pointer, it will make sure only one bridge of any type is attached to a port at any one time. this has been in snaps as part of a larger diff for over a week.
*	when calculating the ruleset's checksum, skip automatic table names.	henning	2020-07-21	1	-2/+4
\| \| \| \| \| \| \|	the checksum is exclusively used for pfsync to verify rulesets are identical on all nodes. the automatic table names are random and have a near zero chance to match. found at a customer in zurich ok sashan kn
*	rename PF_OPT_TABLE_PREFIX to PF_OPTIMIZER_TABLE_PFX and move it to pfvar.h	henning	2020-07-21	1	-1/+2
\| \| \| \| \|	OPT is misleading and usually refers to command line arguments to pfctl ok sashan kn
*	Make sure to explicit_bzero() buffers holding sensitive SA data.	tobhe	2020-07-21	1	-6/+11
\| \| \| \|	ok kn@, patrick@
*	Move insertions to `if_list' out of NET_LOCK() because KERNEL_LOCK()	mvs	2020-07-20	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \|	protects this list. Also corresponding assertion added to be sure the required lock was held. This is the step to clean locking mess around `if_list'. Also we are going to protect `if_list' by it's own lock and this will allow us to avoid lock order issues in future. ok dlg@
*	Add size to free(9) calls	kn	2020-07-18	2	-31/+35
\| \| \| \| \| \| \| \| \| \| \|	pfkeyv2_send() allocates multiple buffers using the same variable `i' to calculate their sizes, use dedicated size variables for each buffer to reuse them with free(9). For this, make pfkeyv2_policy() pass back the size of its freshly allocated buffer. Tested, feedback and OK tobhe
*	Add size to free(9) calls	kn	2020-07-18	1	-11/+16
\| \| \| \| \| \| \| \| \| \| \|	import_identities() calls import_identity() which allocates a buffer and potentially frees it itself; if not, import_identities() uses it and frees it afterwards. Instead of crunching down the buffer size twice, make import_identity() calculate and pass it back, similar to how pfkeyv2.c:pfkeyv2_get() does it. Tested and OK tobhe
*	Add size to free(9) calls	kn	2020-07-18	1	-5/+6
\| \| \| \| \| \| \| \| \| \|	pfkeyv2_get() and pfkeyv2_dump_policy() allocate buffers and can pass back their sizes, those sizes are already used during copyout() and such. Make one pfkeyv2_dump_policy() call pass back the size and reuse all sizes in the respective free(9) calls. Tested and OK tobhe
*	Randomize the system stoeplitz key	tb	2020-07-17	1	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One can prove that the Toeplitz matrix generated from a 16-bit seed is invertible if and only if the seed has odd Boolean parity. Invertibility is necessary and sufficient for the stoeplitz hash to take all 65536 possible values. Generate a system stoeplitz seed of odd parity uniformly at random. This is done by generating a random 16-bit number and then flipping its last bit if it's of even parity. This works since flipping the last bit swaps the numbers of even and odd parity, so we obtain a 2:1 mapping from all 16-bit numbers onto those with odd parity. Implementation of parity via popcount provided by naddy; input from miod, David Higgs, Matthew Martin, Martin Vahlensieck and others. ok dlg
*	Use interface index instead of pointer to corresponding interface	mvs	2020-07-17	4	-32/+68
\| \| \| \| \| \|	within pipex(4) layer. ok mpi@
*	Check destruction ability before search instance of clone interface.	mvs	2020-07-17	1	-4/+4
\| \| \| \|	ok mpi@
*	Fix races in pppacopen() caused by malloc(9).	mvs	2020-07-15	1	-4/+5
\| \| \| \|	ok mpi@
*	Add sizes to free(9) calls	kn	2020-07-15	1	-6/+6
\| \| \| \| \| \| \|	All of these buffers are cleared with explicit sizes before free(), so reuse the given sizes. tested and OK tobhe
*	Unbreak wg(4).	tb	2020-07-13	1	-1/+2
\| \| \| \| \|	Previous may have fixed the build without pf(4), but broke wireguard in normal kernels: the condition NPF > 0 is false if pf.h is not in scope.
*	let's be explicit about only supporting Ethernet ports as members.	dlg	2020-07-13	1	-7/+7
\| \| \| \| \| \| \|	the packet parsing code expects Ethernet packets, so only allow Ethernet interfaces to be added. ok sthen@
*	when adding a non-existent interface as a port, don't try create missing ones.	dlg	2020-07-13	1	-9/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	this was annoying if i made a typo like "ifconfig bridge0 add gre0" instead of "ifconfig bridge0 add egre0" because it would create gre0 and then get upset cos it's not an Ethernet interface. also, it left gre0 lying around. this used to be useful when configuring a bridge on boot because interfaces used to be created when they were configured, and bridges could be configured before some virtual interfaces. however, netstart now creates all necessary interfaces before configuring any of them, so bridge being helpful isn't necessary anymore. ok kn@
*	Fix build without pf	kn	2020-07-12	1	-1/+3
\|
*	Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.	patrick	2020-07-10	22	-55/+48
\| \| \| \|	ok dlg@ tobhe@
*	Change users of IFQ_PURGE() to use the "new" API.	patrick	2020-07-10	7	-22/+17
\| \| \| \|	ok dlg@ tobhe@
*	Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use the	patrick	2020-07-10	11	-34/+23
\| \| \| \| \| \|	"new" API. ok dlg@ tobhe@
*	Kill `pppx_devs_lk' rwlock. It used only to prevent races caused by	mvs	2020-07-10	1	-42/+12
\| \| \| \| \| \| \|	malloc(9) in pppxopen(). We can avoid these races without rwlock. Also we move malloc(9) out of rwlock. ok mpi@
*	Set missing `IFXF_CLONED' flag to pppx(4) related `ifnet'. That should	mvs	2020-07-10	1	-1/+2
\| \| \| \| \| \|	prevent collecting entropy from pppx(4). ok mpi@
*	add kstats for rx queues (ifiqs) and transmit queues (ifqs).	dlg	2020-07-07	2	-2/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	this means you can observe what the network stack is trying to do when it's working with a nic driver that supports multiple rings. a nic with only one set of rings still gets queues though, and this still exports their stats. here is a small example of what kstat(8) currently outputs for these stats: em0:0:rxq:0 packets: 2292 packets bytes: 229846 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets em0:0:txq:0 packets: 1297 packets bytes: 193413 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets maxqlen: 511 packets oactive: false
*	Protect the whole pipex(4) layer by NET_LOCK(). pipex(4) was	mvs	2020-07-06	4	-39/+47
\| \| \| \| \| \| \|	simultaneously protected by KERNEL_LOCK() and NET_LOCK() and now we have the only lock for it. This step reduces locking mess in this layer. ok mpi@
*	pipex_rele_session() frees memory pointed by `old_session_keys'. Use it in	mvs	2020-07-06	1	-2/+2
\| \| \| \| \| \|	pipex_destroy_session() instead of pool_put(9) to prevent memory leak. ok mpi@
*	It's been agreed upon that global locks should be expressed using	anton	2020-07-04	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	capital letters in locking annotations. Therefore harmonize the existing annotations. Also, if multiple locks are required they should be delimited using commas. ok mpi@
*	Permit the stack to check transport and network checksums. Although the link	procter	2020-07-04	1	-9/+3
\| \| \| \| \| \| \| \| \| \|	provides stronger integrity checks, it needn't cover the end-to-end transport path. And it is in any case a layer violation for one layer to disable the checks of another. Skipping the network check saved ~2.4% +/- ~0.2% of cp_time (sys+intr) on the forwarding path of a 1Ghz AMD G-T40N (apu1). Other checksum speedups exist which do not skip the check. ok claudio@ kn@ stsp@
*	Remove unused declaration.	mvs	2020-06-30	1	-4/+1
\| \| \| \|	ok deraadt yasuoka
*	Add size to free(9) call	kn	2020-06-30	1	-2/+2
\| \| \| \| \|	Size taken from if_creategroup(); OK mvs
*	state import should accept AF_INET/AF_INET6 only	sashan	2020-06-28	1	-3/+12
\| \| \| \| \| \|	Reported-by: syzbot+6fef0091252d57113bfb@syzkaller.appspotmail.com ok kn@
*	kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)	cheloha	2020-06-24	13	-102/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not too bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
*	Fix `IFF_RUNNING' bit handling for pppx(4) and pppac(4).	mvs	2020-06-24	1	-3/+7
\| \| \| \|	ok mpi@
*	Enable MPSAFE start routine to keep encryption workers more active.	tobhe	2020-06-23	1	-12/+12
\| \| \| \| \|	From Jason A. Donenfeld" <Jason (at) zx2c4.com> ok patrick@
*	Increase TX mitigation backlog size for increased throughput.	tobhe	2020-06-23	1	-1/+2
\| \| \| \| \|	From Jason A. Donenfeld" <Jason (at) zx2c4.com> ok patrick@
*	add missing rcs id	jasper	2020-06-22	2	-0/+4
\|
*	Rework checks for `pppx_ifs' tree modification.	mvs	2020-06-22	1	-8/+4
\| \| \| \| \| \| \| \|	- There is no panic() condition while inserting `pxi' to tree so drop RBT_FIND() to avoid two lookups. - Modify text in panic() message in delete case. ok yasuoka@ claudio@
*	The interface if_ioctl routine must be called with the NET_LOCK() held.	claudio	2020-06-22	2	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \|	For example the bridge_ioctl() function calls NET_UNLOCK() unconditionally and so calling if_ioctl() without netlock will trigger an assert because of not holding the netlock. Make sure the ioctl handlers are called with the netlock held and drop the lock for the wg(4) specific ioctls in the wg_ioctl handler. This fixes a panic in bridge_ioctl() triggered by ifconfig(8) issuing a SIOCGWG ioctl against bridge(4). This is just a workaround this needs more cleanup but at least this way the panic can not be triggered anymore. OK stsp@, tested by semarie@
*	Prevent potencial `state_list' corruption while pppac(4) destroys pipex(4)	mvs	2020-06-22	1	-2/+4
\| \| \| \| \| \|	sessions by pipex_iface_fini() or by pipex_ioctl() with `PIPEXSMODE' command. ok yasuoka@
*	deprecate network livelock detection using the softclock.	dlg	2020-06-22	1	-38/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	livelock detection used to rely on code running at softnet blocking the softclock handling at a lower interrupt priority level. if the hard clock interrupt count diverged from one kept by a timeout, we assumed the network stack was doing too much work and we should apply backpressure to the receptions of packets. the network stack doesnt really block timeouts from firing anymore though. this is especially true on MP systems, because timeouts fire on cpu0 and the nettq thread could be somewhere else entirely. this means network activity doesn't make the softclock lose ticks, which means we aren't scaling rx ring activity like we think we are. the alternative way to detect livelock is when a driver queues packets for the stack to process, if there's too many packets built up then the input routine return value tells the driver to slow down. this enables finer grained livelock detection too. the rx ring accounting is done per rx ring, and each rx ring is tied to a specific nettq. if one of them is going too fast it shouldn't affect the others. the tick based detection was done system wide and punished all the drivers. ive converted all the drivers to the new mechanism. let's see how we go with it. jmatthew@ confirms rings still shrink, so some backpressure is being applied.
*	add wg(4), an in kernel driver for WireGuard vpn communication.	dlg	2020-06-21	7	-1/+5218
\| \| \| \| \| \| \| \| \| \| \|	thanks to Matt Dunwoodie and Jason A. Donenfeld for their effort. it's at least as functional as the go implementation, and maybe more so since this one works on more architectures. i'm sure there's further development that can be done, but you can say that about anything and everything that's in the tree. ok deraadt@
*	add IFT_WIREGUARD.	dlg	2020-06-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	i'm still not a fan of the peer semantics of wireguard interfaces where each interface can have multiple peers and each peer has a set of the allowed ips configurred, aka cryptokey routing. traditionally we would use a tunnel (IFT_TUNNEL) style interface per peer, which means there's a 1:1 mapping between a peer and an interface. in turn that means you can apply policy with things like pf to the interface and it implies policy on the peer. so allowed ips inside a wg interface feels like a bandaid for a self inflicted wound to some degree. however, deraadt@ points out that the boat has sailed, and being compatible with the larger ecosystem has benefits. admins can choose to setup an interface per peer if they want too, so we get the best of both worlds. i will admit an interface per peer sucks in a concentrator situation though. that's why we still have pppac(4) as well as pppx(4). i also don't have any better ideas for how to scale or even express this kind of policy in a concentrator setting either. apologies for the teary. from Matt Dunwoodie and Jason A. Donenfeld ok deraadt@