wireguard-openbsd - WireGuard implementation for the OpenBSD kernel

	Commit message (Collapse)	Author	Age	Files	Lines
*	Push kernel lock within rtable_add(9) and rework it to return 0 in the	mvs	2021-03-26	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	case when requested table is already exists. Except initialization time, route_output() and if_createrdomain() are the only paths where we call rtable_add(9). We check requested table existence by rtable_exists(9) and it's not the error condition if the table exists. Otherwise we are trying to create requested table by rtable_add(9). Those paths are kernel locked so concurrent thread can't create requested table just after rtable_exists(9) check. Also rtable_add(9) has internal rtable_exists(9) check and in this case the table existence assumed as EEXIST error. This error path is never reached. We are going to unlock PF_ROUTE sockets. This means route_output() will not be serialized with if_createrdomain() and concurrent thread could create requested table. Table existence check and creation should be serialized and it makes sense to do this within rtable_add(9). This time kernel lock is used for this so it pushed down to rtable_add(9). The internal rtable_exists(9) check was modified and table existence is not error now. Since the external rtable_exists(9) check is useless it was removed from if_createrdomain(). It still exists in route_output() path because the logic is more complicated here. ok mpi@
*	RFC 8981 allows the configuration of only temporary IPv6 addresses.	florian	2021-03-20	1	-4/+8
\| \| \| \| \|	Make the interface come up when the IFXF_AUTOCONF6TEMP is set. OK kn
*	When changing the link local address send a RTM_IFINFO message out.	claudio	2021-03-18	1	-2/+4
\| \| \| \| \|	Also prefer if (error == 0) over if (!error). OK florian@ bluhm@
*	Do not call rtm_ifchg() if IFF_UP changed. The code in if_up() and if_down()	claudio	2021-03-18	1	-3/+6
\| \| \| \| \|	already call rtm_ifchg() and so this would just result in a duplicate message. Noticed by deraadt@. OK florian@ bluhm@
*	There is no need to try to attach IPv6 to an interface when the	florian	2021-03-11	1	-2/+3
\| \| \| \| \| \| \| \|	AUTOCONF6 flag is already set. This is likely a leftover from when we sent router solicitations from the kernel. This was a way to trigger sending a solicitation from userland. OK kn
*	If the AUTOCONF4 or AUTOCONF6 flags get enabled, force the interface up.	deraadt	2021-03-11	1	-23/+34
\| \| \| \|	ok florian claudio
*	spelling	jsg	2021-03-10	1	-2/+2
\| \| \| \|	ok gnezdo@ semarie@ mpi@
*	Shorten the if_cloners_lock name preventing it from being truncated in	anton	2021-03-09	1	-2/+2
\| \| \| \| \| \|	the top(1) wait column. ok mvs@
*	add p2p_input, like ether_input but for l3 tunnel interfaces.	dlg	2021-02-20	1	-1/+41
\| \| \| \| \| \| \| \|	the l3 protocol input to push the packet is based on a value in m->m_pkthdr.ph_family, which tunnel drivers should set before calling if_vinput. add p2p_bpf_mtap to call bpf_mtap_af also using m->m_pkthdr.ph_family.
*	default interfaces to bpf_mtap_ether for their if_bpf_mtap handler.	dlg	2021-02-20	1	-2/+6
\| \| \| \| \|	call (*ifp->if_bpf_mtap) instead of bpf_mtap_ether in ifiq_input and if_vinput.
*	add a MONITOR flag to ifaces to say they're only used for watching packets.	dlg	2021-02-20	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	an example use of this is when you have a span port on a switch and you want to be able to see the packets coming out of it with tcpdump, but do not want these packets to enter the network stack for processing. this is particularly important if the span port is pushing a copy of any packets related to the machine doing the monitoring as it will confuse pf states and the stack. ok benno@
*	We link `ifp' to `if_list' before we perform if_attachsetup(). It is not	mvs	2021-02-11	1	-3/+2
\| \| \| \| \| \| \| \| \|	fully initialized because we initialize `if_groups' after linking. It's not triggered because if_attach() and if_unit(9) are serialized by kernel lock and `ifp' is often filled by nulls. Move `if_groups' initialization to if_attach_common() to prevent this. ok bluhm@ claudio@ deraadt@
*	Interface group names must fit into IFNAMSIZ and be unique. But	bluhm	2021-02-10	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	the kernel made the unique check before trunkating with strlcpy(). So there could be two interface groups with the same name. The kif is created by a name lookup. The trunkated names are equal, so there was only one kif owned by both groups. When the groups got destroyed, the single kif was removed twice from the RB tree. Check length of group name before doing the unique check. The empty group name was allowed and is now invalid. Reported-by: syzbot+f47e8296ebd559f9bbff@syzkaller.appspotmail.com OK deraadt@ gnezdo@ anton@ mvs@ claudio@
*	Start refcounting interface groups with 1. if_creategroup() returns	bluhm	2021-02-08	1	-8/+13
\| \| \| \| \| \| \|	a new object that is already refcounted, so carp attach does not reach into internal structures. Add kasserts to detect counter overflow or underflow. OK mvs@
*	ifunit() was fully replaced by if_unit(9) and should go away.	mvs	2021-02-01	1	-18/+7
\| \| \| \|	ok bluhm@ dlg@
*	Introduce new function if_unit(9). This function returns a pointer the	mvs	2021-01-18	1	-18/+53
\| \| \| \| \| \| \| \| \| \| \|	interface descriptor corresponding to the unique name. This descriptor is guaranteed to be valid until if_put(9) is called on the returned pointer. if_unit(9) should replace already existent ifunit() which returns descriptor not safe for dereference when context was switched. This allow us to avoid some use-after-free issues in ioctl(2) path. Also this unifies interface descriptor usage. ok claudio@ sashan@
*	Syzkaller has found a stack overflow in socket splicing. Broadcast	bluhm	2021-01-09	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	packets were resent through simplex broadcast delivery and socket splicing. Although there is an M_LOOP check in somove(9), it did not take effect. if_input_local() cleared the M_BCAST and M_MCAST flags with m_resethdr(). As if_input_local() is used for broadcast and multicast delivery, it was a mistake to delete them. Keep the M_BCAST and M_MCAST mbuf flags when packets are reinjected into the network stack. Reported-by: syzbot+a43ace363f1b663238f8@syzkaller.appspotmail.com OK anton@; discussed with claudio@
*	Process pppoe(4) packets directly, do not queue through netis	kn	2021-01-04	1	-7/+1
\| \| \| \| \| \| \| \| \| \|	Less scheduling, lock contention and queues. Previously, if_netisr() handled the net lock around those calls, now if_input_process() does it before calling ether_input(), so no need to add or remove NET_*LOCK() anywhere. OK mvs claudio
*	Remove kernel lock from pppoe(4) input path	kn	2021-01-04	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \|	"struct pppoe_softc" documents no member being protected by the kernel lock (alone); further review of the code paths starting from pppoeintr() shows no sleeping points which must be avoided in the softnet thread. Everything is fine as is to run without the big lock, so remove it. Tests sthen Feedback mpi mvs OK mvs claudio
*	clear M_TIMESTAMP in if_enqueue.	dlg	2020-12-15	1	-1/+3
\| \| \| \| \| \|	this is to avoid a timestamp being used on the way out of the stack (eg, in bpf), or if it reenters the stack (eg, if it goes between rdomains with pair(4)).
*	Introduce `if_cloners_lock' rwlock and use it to serialize	mvs	2020-10-03	1	-5/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	if_clone_{create,destroy}(). This fixes the races described below. if_clone_{create,destroy}() are kernel locked, but since they touch various sleep points introduced by rwlocks and M_WAITOK allocations, without serialization they can intersect due to race condition. The avoided races are: 1. While performing if_clone_create(), concurrent thread which performing if_clone_create() can attach `ifp' with the same `if_xname' and made inconsistent `if_list' where all attached interfaces linked. 2. While performing if_clone_create(), concurrent thread which performing if_clone_destroy() can kill this incomplete `ifp'. 3. While performing if_clone_destroy(), concurrent thread which performing if_clone_destroy() can kill this dying `ifp'. ok claudio@ kn@ mpi@ sashan@
*	Allow SIOCSWGDPID and SIOCSWGMAXFLOW ioctls for non-root	kn	2020-08-19	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ifconfig(8) detects switch(4) through its unique SIOCSWGDPID ioctl(2) and further does another switch specific ioctl for the default output regardless of configuration and/or members. But since these two ioctls are limited to root, running ifconfig as unprivileged user makes switch interfaces partially appear as bridge devices because the detection fails, e.g. STP parameters are shown instead of datapath id and flow parameters. ifioctl() limits a list of set/write ioctls to root, but these two read-only ioctls seem to have been listed by mistake, so remove them to omit the root check and fix "ifconfig switch" output for unprivileged users. Feedback from dlg
*	Revert r1.614 of net/if.c. Modifications of `ifnet' list must still be	mvs	2020-08-05	1	-6/+3
\| \| \| \|	done under both the KERNEL_LOCK() and NET_LOCK().
*	We have `pipexinq' and `pipexoutq' mbuf(9) queues to store pipex(4)	mvs	2020-08-04	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	related mbufs. Each mbuf(9) passed to these queues stores the pointer to corresponding pipex(4) session referenced as `m_pkthdr.ph_cookie'. When session was destroyed its reference can still be in these queues so we have use after free issue while pipexintr() dereference it. I removed `pipexinq', `pipexoutq' and pipexintr(). This not only allows us to avoid issue described above, but also removes unnecessary context switch in packet processing. Also it makes code simpler. ok mpi@ yasuoka@
*	Use interface index instead of pointer to `ifnet' in carp(4).	mvs	2020-07-24	1	-3/+4
\| \| \| \|	ok sashan@
*	deprecate interface input handler lists, just use one input function.	dlg	2020-07-22	1	-108/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the interface input handler lists were originally set up to help us during the intial mpsafe network stack work. at the time not all the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc) were mpsafe, so we wanted a way to avoid them by default, and only take the kernel lock hit when they were specifically enabled on the interface. since then, they have been fixed up to be mpsafe. i could leave the list in place, but it has some semantic problems. because virtual interfaces filter packets based on the order they were attached to the parent interface, you can get packets taken away in surprising ways, especially when you reboot and netstart does something different to what you did by hand. by hardcoding the order that things like vlan and bridge get to look at packets, we can document the behaviour and get consistency. it also means we can get rid of a use of SRPs which were difficult to replace with SMRs. the interface input handler list is an SRPL, which we would like to deprecate. it turns out that you can sleep during stack processing, which you're not supposed to do with SRPs or SMRs, but SRPs are a lot more forgiving and it worked. lastly, it turns out that this code is faster than the input list handling, so lots of winning all around. special thanks to hrvoje popovski and aaron bieber for testing. this has been in snaps as part of a larger diff for over a week.
*	Move insertions to `if_list' out of NET_LOCK() because KERNEL_LOCK()	mvs	2020-07-20	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \|	protects this list. Also corresponding assertion added to be sure the required lock was held. This is the step to clean locking mess around `if_list'. Also we are going to protect `if_list' by it's own lock and this will allow us to avoid lock order issues in future. ok dlg@
*	Check destruction ability before search instance of clone interface.	mvs	2020-07-17	1	-4/+4
\| \| \| \|	ok mpi@
*	Change users of IFQ_PURGE() to use the "new" API.	patrick	2020-07-10	1	-2/+2
\| \| \| \|	ok dlg@ tobhe@
*	Add size to free(9) call	kn	2020-06-30	1	-2/+2
\| \| \| \| \|	Size taken from if_creategroup(); OK mvs
*	The interface if_ioctl routine must be called with the NET_LOCK() held.	claudio	2020-06-22	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \|	For example the bridge_ioctl() function calls NET_UNLOCK() unconditionally and so calling if_ioctl() without netlock will trigger an assert because of not holding the netlock. Make sure the ioctl handlers are called with the netlock held and drop the lock for the wg(4) specific ioctls in the wg_ioctl handler. This fixes a panic in bridge_ioctl() triggered by ifconfig(8) issuing a SIOCGWG ioctl against bridge(4). This is just a workaround this needs more cleanup but at least this way the panic can not be triggered anymore. OK stsp@, tested by semarie@
*	deprecate network livelock detection using the softclock.	dlg	2020-06-22	1	-38/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	livelock detection used to rely on code running at softnet blocking the softclock handling at a lower interrupt priority level. if the hard clock interrupt count diverged from one kept by a timeout, we assumed the network stack was doing too much work and we should apply backpressure to the receptions of packets. the network stack doesnt really block timeouts from firing anymore though. this is especially true on MP systems, because timeouts fire on cpu0 and the nettq thread could be somewhere else entirely. this means network activity doesn't make the softclock lose ticks, which means we aren't scaling rx ring activity like we think we are. the alternative way to detect livelock is when a driver queues packets for the stack to process, if there's too many packets built up then the input routine return value tells the driver to slow down. this enables finer grained livelock detection too. the rx ring accounting is done per rx ring, and each rx ring is tied to a specific nettq. if one of them is going too fast it shouldn't affect the others. the tick based detection was done system wide and punished all the drivers. ive converted all the drivers to the new mechanism. let's see how we go with it. jmatthew@ confirms rings still shrink, so some backpressure is being applied.
*	add wg(4), an in kernel driver for WireGuard vpn communication.	dlg	2020-06-21	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \|	thanks to Matt Dunwoodie and Jason A. Donenfeld for their effort. it's at least as functional as the go implementation, and maybe more so since this one works on more architectures. i'm sure there's further development that can be done, but you can say that about anything and everything that's in the tree. ok deraadt@
*	make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.	dlg	2020-06-17	1	-3/+3
\| \| \| \| \|	i've been wanting to do this for a while, and now that we've got stoeplitz and it gives us 16 bits, it seems like the right time.
*	dev/rndvar.h no longer has statistical interfaces (removed during various	deraadt	2020-05-29	1	-3/+1
\| \| \| \| \| \|	conversion steps). it only contains kernel prototypes for 4 interfaces, all of which legitimately belong in sys/systm.h, which are already included by all enqueue_randomness() users.
*	Document the various flavors of NET_LOCK() and rename the reader version.	mpi	2020-05-27	1	-12/+12
\| \| \| \| \| \| \| \| \| \|	Since our last concurrency mistake only ioctl(2) ans sysctl(2) code path take the reader lock. This is mostly for documentation purpose as long as the softnet thread is converted back to use a read lock. dlg@ said that comments should be good enough. ok sashan@
*	Extract more randomness from mbuf flow	visa	2020-05-26	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	Input bits of the mbuf list head to enqueue_randomness(). While the set of mbufs in circulation is relatively stable, the order in which they reach if_input_process() is unpredictable. Shuffling can happen in many subsystems, such as the network stack, device drivers, and memory management. OK deraadt@ mpi@
*	make ifpromisc assert that the caller is holding the NET_LOCK.	dlg	2020-04-12	1	-1/+3
\| \| \| \| \| \|	it needs NET_LOCK because it modifies if_flags and if_pcount. ok visa@
*	Grab the exclusive NET_LOCK() in the softnet thread.	mpi	2020-04-11	1	-7/+7
\| \| \| \| \| \| \| \| \|	Prevent a data corruption on a UDP receive socket buffer reported by procter@ who triggered it with wireguard-go. The symptoms are underflow of sb_cc/sb_datacc/sb_mcnt. ok visa@
*	Properly exit loop at end of hooks TAILQ.	tobhe	2020-03-10	1	-5/+4
\| \| \| \| \|	Feedback from and ok dlg@ ok kn@ todd@
*	cleanup unused headers generated by config	jsg	2020-01-24	1	-2/+1
\| \| \| \|	ok tedu@ krw@ deraadt@
*	avoid a use after free in if_delgroup.	dlg	2019-11-14	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	if_detach passes the groupname from an ifg_list struct to if_delgroup, if_delgroup then uses the name to find the same ifg_list struct so it can free it, and then passes the name from the struct to pfi_group_change(). at worst this can cause a fault if malloc(9) actually unmaps the page the struct was on, and at best it causes pf interfaces with garbage names to be created. ok sashan@ bluhm@
*	Non root user must not use ioctl(2) to mess around with the address	bluhm	2019-11-13	1	-5/+24
\| \| \| \| \|	of a network interface. OK deraadt@ claudio@
*	unbreak ramdisks	deraadt	2019-11-13	1	-1/+3
\|
*	check for privileged bridges ioctls next to the other privileged ioctls.	dlg	2019-11-12	1	-1/+26
\| \| \| \| \| \| \| \| \|	there's now a bunch of drivers that implement the bridge ioctls, but they're inconsistent at checking privilege. doing it up front once means less code duplication, and more consistent application of the checks. ok bluhm@ deraadt@
*	SIOCDVNETID mutates state, so should only be run by root.	dlg	2019-11-12	1	-1/+2
\| \| \| \| \|	found by Ilja Van Sprundel ok deraadt@ mpi@ bluhm@
*	add linkstate hooks at the head of the tailq.	dlg	2019-11-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when vxlans parent interface has a link state change event, vxlan reconfigures the parent to cope with things not being as it expects when the interface comes back. it does this by removing its config and then adding it again. part of it's config removal is to take the link state hook away, and part of putting the config on is is adding the link state hook. if we're running an interfaces link state hooks from head to tail, and the vxlan hook adds itself back to the tail, we end up running the vxlan hook forever cos it always ends up at the tail. bluhm@ hit this infinite loop while running regress tests. if turns out we need to run link state hooks in the same order they were added, i have a way to avoid this situation, but this is simple.
*	whitespace fixes, no functional change.	dlg	2019-11-09	1	-6/+6
\|
*	convert interface address change hooks to tasks and a task_list.	dlg	2019-11-08	1	-14/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	this follows what's been done for detach and link state hooks, and makes handling of hooks generally more robust. address hooks are a bit different to detach/link state hooks in that there's only a few things that register hooks (carp, pf, vxlan), but a lot of places to run the hooks (lots of ipv4 and ipv6 address configuration). an address hook cookie was in struct pfi_kif, which is part of the pf abi. rather than break pfctl -sI, this maintains the void * used for the cookie and uses it to store a task, which is then used as intended with the new api.
*	serialise hook adds and dels with a mutex instead of an implicit NET_LOCK.	dlg	2019-11-07	1	-13/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	i had NET_ASSERT_LOCKED() in the hook add and remove operations, because that's what's held when the hooks are run. some callers do not hold the NET_LOCK when calling them though, eg, bridge(4). aggr and tpmr used to not hold NET_LOCK while being destroyed, which also caused the asserts to fire, so i moved the port destroys inside NET_LOCK, but now I have deadlocks with some barrier calls. the hooks having their own lock means callers don't have to hold NET_LOCK and the list will stay sane. the code that runs the hooks gives up the mutex when calling the hook, but keeps track of where it's up to bey putting a cursor in the list. there's a single global mutex for all the interface linkstate and detach hooks, but this stuff isn't a hot path by any stretch of the imagination. based on (a lot of) testing by hrvoje popovski. thank you.