wireguard-openbsd - WireGuard implementation for the OpenBSD kernel

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	More pmap bits, mostly from powerpc andd arm64.	kettenis	2020-06-17	2	-19/+492
\|
*	Instead of performing three distinct allocations per created pipe,	anton	2020-06-17	2	-52/+57
\| \| \| \| \| \| \| \| \| \| \| \| \|	reduce it to a single one. Not only should this be more performant, it also solves a kqueue related issue found by visa@ who also requested this change: if you attach an EVFILT_WRITE filter to a pipe fd, the knote gets added to the peer's klist. This is a problem for kqueue because if you close the peer's fd, the knote is left in the list whose head is about to be freed. knote_fdclose() is not able to clear the knote because it is not registered with the peer's fd. FreeBSD also takes a similar approach to pipe allocations. ok mpi@ visa@
*	Explicitly unmap DMA memory using pmap_kremove(9).	kettenis	2020-06-17	1	-1/+2
\|
*	We are no longer using the "keep" file as a flag.	florian	2020-06-17	1	-3/+1
\| \| \| \|	Pointed out by Martin Vahlensieck, thanks!
*	needs param.h, not types.h	deraadt	2020-06-17	1	-2/+2
\|
*	Document that rand() returns non-deterministic random numbers unless a	tim	2020-06-17	1	-3/+12
\| \| \| \| \| \|	seed is explicitly set. OK millert@
*	Expose SMR list and pointer macros to userspace. This enables the use	visa	2020-06-17	1	-3/+3
\| \| \| \| \| \| \|	of SMR lists in userspace-visible parts of system headers. In addition, the macros allow libkvm to examine SMR data structures. Initial diff by and OK claudio@
*	ddb(4); be explicit that the parameter to trace /t uses the radix	sthen	2020-06-17	1	-2/+4
\| \| \| \| \| \|	prefix, and show how to use 0t for decimal (slight duplication from the table in EXPRESSIONS but easier for the reader than sending them off to look in a different part of the manual). ok mpi claudio jmc
*	Remove the bus specific sc_ih (interrup handle) variable and use the common	claudio	2020-06-17	3	-20/+17
\| \| \| \| \| \|	sc_ih value of struct rl_softc. This fixes a crash in re(4) because intr_barrier(9) is called with the rl_softc sc_ih which was NULL. OK kettenis@
*	put pci_intr_establish_cpu() in, but commented out for now.	dlg	2020-06-17	1	-4/+25
\| \| \| \| \|	it's only available on amd64 (and i386), so don't really want to encourage it's use just yet.
*	Let iwx(4) firmware decide which Tx rate to use.	stsp	2020-06-17	2	-7/+309
\| \| \| \| \| \| \| \| \| \| \| \|	The firmware will notify the driver when it decides to change Tx rate. Based on those notifications the driver updates the value displayed by ifconfig. This is similar to how bwfm(4) and urtwn(4) handle this. Offloading Tx rate selection should allow us to eventually delete AMRR/MiRA support code from iwx(4). That code is disabled for now, not yet deleted. For now, the driver restricts firmware Tx rate selection to 11n/20MHz mode because that's what net80211 can support.
*	Attach secondary CPUs early. Since on most machine we need psci(4) to	kettenis	2020-06-17	1	-4/+21
\| \| \| \| \| \| \| \| \|	spin op the secondary CPUs, explicitly probe and attach that driver before we attach the CPUs. This should help with distributing interrupts across CPUs on arm64. ok patrick@, deraadt@, dlg@
*	mark up an argument with Fa, not Va	dlg	2020-06-17	1	-7/+5
\|
*	have a go at documenting pci_intr_map_msix.	dlg	2020-06-17	1	-3/+29
\| \| \| \| \|	i feel like ive used the word vector too much, but hopefully someone who is good with english will check this and fix it.
*	replace a long and wrapped Fn line with Fo Fa Fc	dlg	2020-06-17	1	-4/+10
\|
*	if the chip did rss, use the hash from the chip as an mbuf flowid.	dlg	2020-06-17	2	-2/+8
\| \| \| \|	another sniped commit from jmatthew@
*	make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.	dlg	2020-06-17	12	-54/+51
\| \| \| \| \|	i've been wanting to do this for a while, and now that we've got stoeplitz and it gives us 16 bits, it seems like the right time.
*	Remove some of the unnecessary complications in the calculation of the	tb	2020-06-17	1	-24/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stoeplitz_cache and bring them into a form more suitable for mathematical reasoning. Add a comment explaining the full construction which will also help justifying upcoming diffs. The observations for the code changes are the following: First, scache->bytes[val] is a uint16_t, and we only need the lower 16 bits of res in the second nested pair of for loops. The values of key[b] are only xored together to compute res, so we only need the lower 16 bits of those, too. Second, looking at the first nested for loop, we see that the values 0..15 of j only touch the top 16 bits of key[b], so we can skip them. For b = 0, the inner loop for j in 16..31 scans backwards through skey and sets the corresponding bits of key[b], so key[0] = skey. A bit of pondering then leads to key[b] = skey << b \| skey >> (NBSK - b). The key array is renamed into column since it stores columns of the Toeplitz matrix. It's not very expensive to brute-force verify that scache->bytes[val] remains the same for all values of val and all values of skey. I did this on amd64, sparc64 and powerpc. ok dlg
*	enable multiple queues (and interrupts on multiple cpus) on vmx(4).	dlg	2020-06-17	2	-33/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	im doing this with vmx(4) because it only exists on two archs (well, one and a half archs really) so any impact is localised. most other drivers i'm working on are enabled on 3 or 4 archs, and we're still working on the interrupt code on those archs. in the meantime vmx(4) can be used as a reference driver on how to implement multiq. it shows the use of rss, toeplitz, intrmap, and interrupts on multiple cpus. it's also a relatively simple device, which makes it easier to understand the above features. note that vmx(4) seems to advertise 25 msi-x vectors. it appears that the intention is that 16 of these vectors are supposed to be used for rx, 8 for tx, and 1 for events (eg, link up and down). we're keeping things simple for now and using a maximum of 8 vectors for both tx and rx, and one for events. this is mostly based on work that jmatthew@ did, but it's simplified now cos intrmap makes things easier.
*	tweak previous;	jmc	2020-06-17	1	-3/+3
\|
*	add a dumb pci_intr_establish_cpu().	dlg	2020-06-17	2	-2/+16
\| \| \| \| \| \| \| \| \|	i386 doesnt support msix, and the interrupt code assumes that it only ties stuff to cpu0. this mostly exists so the api exists for multiq drivers to compile against, but fail with when they try to use it. tested with a hacked up vmx(4).
*	pci_intr_establish_cpu() for establishing an interrupt no a specific cpu.	dlg	2020-06-17	6	-18/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the cpu is specified by a struct cpu_info *, which should generally come from an intrmap. this is adapted from a diff that patrick@ sent round a few years ago for a pci_intr_map_msix_cpuid, where you asked for an msi vector on a specific cpu, and then called pci_intr_establish with the handle you get. kettenis pointed out that it's hard on some archs to carry cpu on a pci interrupt handle, so i tweaked it to turn it into a pci_intr_establish_cpu instead. jmatthew@ and i (but mostly jmatthew@ to be honest) have been experimenting with this api on multiple archs and it is working out well. i'm putting this diff in now on amd64 so people can kick the tyres a bit. tested with hacked up vmx(4), ix(4), and mcx(4)
*	sync	deraadt	2020-06-17	1	-0/+2
\|
*	wire intrmap into the build	dlg	2020-06-17	1	-1/+3
\|
*	make intrmap_cpu return a struct cpu_info *, not a "cpuid number" thing.	dlg	2020-06-17	3	-14/+14
\| \| \| \| \|	requested by kettenis@ discussed with jmatthew@
*	use atomic_set() in kref_init()	jsg	2020-06-17	1	-2/+2
\|
*	kref_sub() interface was removed from linux and is unused	jsg	2020-06-17	1	-8/+1
\|
*	add pci_intr_msix_count(), to get the msi-x table size for a device.	dlg	2020-06-17	2	-3/+22
\| \| \| \| \| \| \| \| \| \| \| \|	this basically tells us the number of interrupt vectors a pci device is able to support. it relies on the arch having __HAVE_PCI_MSIX defined. without that define it always returns 0. i think this originally came from haesbart via patrick@ as amd64 md code in the middle of a diff from 2018(!), but i've tweaked it to make it MI. tested on sparc64 and amd64 with various drivers.
*	sparc64 should define __HAVE_PCI_MSIX	dlg	2020-06-17	2	-2/+48
\|
*	use WRITE_ONCE and READ_ONCE for set and read	jsg	2020-06-17	1	-6/+6
\| \| \| \|	ok kettenis@
*	manpage for the bits of intrmap we're using at the moment.	dlg	2020-06-17	2	-2/+128
\|
*	add intrmap, an api that picks cpus for devices to attach interrupts to.	dlg	2020-06-17	2	-0/+385
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	there's been discussions for years (and even some diffs!) about how we should let drivers establish interrupts on multiple cpus. the simple approach is to let every driver look at the number of cpus in a box and just pin an interrupt on it, which is what pretty much everyone else started with, but we have never seemed to get past bikeshedding about. from what i can tell, the principal objections to this are: 1. interrupts will tend to land on low numbered cpus. ie, if drivers try to establish n interrupts on m cpus, they'll start at cpu 0 and go to cpu n, which means cpu 0 will end up with more interrupts than cpu m-1. 2. some cpus shouldn't be used for interrupts. why a cpu should or shouldn't be used for interrupts can be pretty arbitrary, but in practical terms i'm going to borrow from the scheduler and say that we shouldn't run work on hyperthreads. 3. making all the drivers make the same decisions about the above is a lot of maintenance overhead. either we will have a bunch of inconsistencies, or we'll have a lot of untested commits to keep everything the same. my proposed solution to the above is this diff to provide the intrmap api. drivers that want to establish multiple interrupts ask the api for a set of cpus it can use, and the api considers the above issues when generating a set of cpus for the driver to use. drivers then establish interrupts on cpus with the info provided by the map. it is based on the if_ringmap api in dragonflybsd, but generalised so it could be used by something like nvme(4) in the future. this version provides numeric ids for CPUs to drivers, but as kettenis@ has been pointing out for a very long time, it makes more sense to use cpu_info pointers. i'll be updating the code to address that shortly. discussed with deraadt@ and jmatthew@ ok claudio@ patrick@ kettenis@
*	Do not do logical negation of a bitshifted field.	mortimer	2020-06-17	1	-2/+2
\| \| \| \| \| \|	Prompted by warning from clang 10. ok patrick@
*	make intr_barrier run sched_barrier on the cpu the interrupt pinned to.	dlg	2020-06-16	1	-3/+4
\| \| \| \| \| \|	intr_barrier passed NULL to sched_barrier before this, which ends up being the primary cpu. that's been mostly right until this point, but is set to change.
*	Remove old commented out line and fix indent.	mortimer	2020-06-16	1	-3/+2
\| \| \| \| \| \|	clang-10 complains about the misleading indentation. ok patrick@
*	Some simplifications.	kettenis	2020-06-16	1	-50/+90
\|
*	Add missing dependeny.	kettenis	2020-06-16	1	-2/+2
\|
*	Fix strlcpy() size parameter in refldbld(), it was a byte too small.	millert	2020-06-16	2	-17/+21
\| \| \| \| \|	While here, add proper bounds checking for the partial match case in refldbld() too and check strlcpy() return values throughout.
*	typos	naddy	2020-06-16	1	-5/+5
\|
*	remove some unused defines	jsg	2020-06-16	1	-14/+6
\|
*	implement atomic_inc_not_zero() by way of atomic_add_unless()	jsg	2020-06-16	1	-11/+3
\|
*	remove a dead store	jsg	2020-06-16	1	-2/+2
\|
*	rework SYNOPSIS/usage() to show better the various use formats,	jmc	2020-06-16	2	-58/+61
\| \| \| \| \| \| \|	and rework the man text to reflect this; guenther supplied the details on the various modes; deraadt suggested __progname be banished from usage();
*	tweak previous; ok dlg	jmc	2020-06-16	1	-3/+3
\|
*	Release the rx node if we were unable to allocate a new rx buffer.	jmatthew	2020-06-16	1	-2/+2
\| \| \| \| \| \| \| \|	The node here is always ic_bss, for which the reference count isn't actually used (it's always freed when the interface detaches), so not releasing it in this case wasn't really a problem. ok stsp@
*	sync again, oops wrong file	sthen	2020-06-16	2	-1/+1
\|
*	sync	sthen	2020-06-16	2	-0/+2
\|
*	vmd(8): backout previous commit to ns8250.c as it reintroduced the bug where the	pd	2020-06-16	1	-17/+9
\| \| \| \| \| \| \|	vm would get stuck if disconnected from console and get unstuck once console is attached. Spotted by tb@
*	d and D keys to reset to default in customize mode.	nicm	2020-06-16	8	-63/+231
\|
*	Correctly move to previous line when looking for previous word, from	nicm	2020-06-16	1	-5/+5
\| \| \| \|	Derry Jing.