wireguard-openbsd - WireGuard implementation for the OpenBSD kernel

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	KCOV_BUF_MAX_NMEMB is defined under _KERNEL in sys/kcov.h but only used	anton	2020-09-26	1	-3/+1
\| \| \| \|	in dev/kcov.c; therefore move it to dev/kcov.c.
*	Move duplicated code to send an uncatchable SIGABRT into a function.	mpi	2020-09-16	1	-2/+2
\| \| \| \|	ok claudio@
*	Document that `p_siglist' and `p_sigmask' are updated via atomics.	mpi	2020-09-16	1	-3/+3
\| \| \| \|	ok claudio@
*	Fix comment, ktrace flags are per-process.	mpi	2020-09-14	1	-3/+3
\|
*	Unbreak tree. Instead of passing struct process to siginit() just pass the	claudio	2020-09-13	1	-2/+2
\| \| \| \|	struct sigacts since that is the only thing that is modified by siginit.
*	Introduce a helper to check if a signal is ignored or masked by a thread.	mpi	2020-09-09	1	-1/+2
\| \| \| \|	ok claudio@, pirofti@
*	Remove unused sysctl_int_arr(9)	gnezdo	2020-09-01	1	-2/+1
\|
*	crank to 6.8-beta	deraadt	2020-08-31	1	-3/+3
\|
*	Declare hw_{prod,serial,uuid,vendor,ver} in <sys/systm.h>.	visa	2020-08-26	1	-1/+7
\| \| \| \|	OK deraadt@, mpi@
*	Annotate locking of ps_single.	visa	2020-08-26	1	-2/+3
\| \| \| \|	Prompted by mpi@
*	Remove unused debug_syncprt, improve debug sysctl handling	kn	2020-08-23	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008. Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c the only visible difference between used and stub ctldebug structs in the debugvars[] array is their extern keyword, indicating that it is defined elsewhere. sys/sysctl.h declares all debugN members as extern upfront, but these declarations are not needed. Remove the unused debug sysctl, rename the only remaining one to something meaningful and remove forward declarations from /sys/sysctl.h; this way, adding new debug sysctls is a matter of adding extern and coming up with a name, which is nicer to read on its own and better to grep for. OK mpi
*	Allow userland to use EVFILT_EXCEPT.	mpi	2020-08-23	1	-2/+2
\| \| \| \|	ok mvs@, visa@
*	Move sysctl(2) CTL_DEBUG from DEBUG to new DEBUG_SYSCTL	kn	2020-08-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding "debug.my-knob" sysctls is really helpful to select different code paths and/or log on demand during runtime without recompile, but as this code is under DEBUG, lots of other noise comes with it which is often undesired, at least when looking at specific subsystems only. Adding globals to the kernel and breaking into DDB to change them helps, but that does not work over SSH, hence the need for debug sysctls. Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general option for all of sysctl(2). OK gnezdo
*	Remove an unnecessary field from struct msgbuf.	visa	2020-08-18	1	-2/+1
\| \| \| \|	OK mvs@
*	Add sysctl_bounded_arr as a replacement for sysctl_int_arr	gnezdo	2020-08-18	1	-1/+15
\| \| \| \| \| \|	Design by deraadt@ ok deraadt@
*	struct process: annotate locking for getitimer(2), setitimer(2)	cheloha	2020-08-11	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	The ITIMER_REAL itimerspec (ps_timer[0]) and timeout (ps_realit_to) are protected by the kernel lock. Annotate them with "K". The ITIMER_VIRTUAL and ITIMER_PROF itimerspecs (ps_timer[1], ps_timer[2]) are protected by itimer_mtx. Annotate them with "T", for "timer". With input from kettenis@ and anton@. ok kettenis@, anton@
*	Remove now unused M_ACAST flag.	florian	2020-08-08	1	-5/+3
\| \| \| \|	Reminded by, input & OK jca
*	timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)	cheloha	2020-08-07	1	-5/+1
\| \| \| \| \| \| \| \| \| \|	These two interfaces have been entirely unused since introduction. Remove them and thin the "timeout" namespace a bit. Discussed with mpi@ and ratchov@ almost a year ago, though I blocked the change at that time. Also discussed with visa@. ok visa@, mpi@
*	Move range check inside sysctl_int_arr	gnezdo	2020-08-01	1	-2/+2
\| \| \| \| \| \| \|	Range violations are now consistently reported as EOPNOTSUPP. Previously they were mixed with ENOPROTOOPT. OK kn@
*	Add support for remote coverage to kcov. Remote coverage is collected	anton	2020-08-01	3	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from threads other than the one currently having kcov enabled. A thread with kcov enabled occasionally delegates work to another thread, collecting coverage from such threads improves the ability of syzkaller to correlate side effects in the kernel caused by issuing a syscall. Remote coverage is divided into subsystems. The only supported subsystem right now collects coverage from scheduled tasks and timeouts on behalf of a kcov enabled thread. In order to make this work `struct task' and `struct timeout' must be extended with a new field keeping track of the process that scheduled the task/timeout. Both aforementioned structures have therefore increased with the size of a pointer on all architectures. The kernel API is documented in a new kcov_remote_register(9) manual. Remote coverage is also supported by kcov on NetBSD and Linux. ok mpi@
*	timeout(9): remove TIMEOUT_SCHEDULED flag	cheloha	2020-07-25	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate between wheel timeouts and new timeouts during softclock(). The distinction is useful when incrementing the "rescheduled" stat and the "late" stat. Now that we have an intermediate queue for new timeouts, timeout_new, we don't need the flag. The distinction between wheel timeouts and new timeouts can be made computationally. Suggested by procter@ several months ago.
*	The "unsupported compiler" checks were added back in December when	daniel	2020-07-21	2	-13/+2
\| \| \| \| \| \| \|	MD versions of these headers were unhooked. As nothing has hit those checks we can drop them at this point. ok visa@ and "makes sense" to millert@
*	POWE9 CPUs provide an energy sensor that accumulates the emount of energy	kettenis	2020-07-15	1	-1/+3
\| \| \| \| \| \| \| \|	used by the processor chip. Although we have a SENSOR_WATTHOUR sensor type its units are not really suitable for this sensor. So add a SENSOR_ENERGY type that uses micro Joules as its unit. ok deraadt@
*	A pty write containing VDISCARD, VREPRINT, or various retyping cases of	deraadt	2020-07-14	1	-3/+3
\| \| \| \| \| \| \| \| \|	VERASE would perform (sometimes irrelevant) compute in the kernel which can be heavy (especially with our insufficient tty subsystem locking). Use tsleep_nsec for 1 tick in such circumstances to yield cpu, and also bring interruptability to ptcwrite() https://syzkaller.appspot.com/bug?extid=462539bc18fef8fc26cc ok kettenis millert, discussions with greg and anton
*	Add support for timeconting in userland.	pirofti	2020-07-06	4	-7/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This diff exposes parts of clock_gettime(2) and gettimeofday(2) to userland via libc eliberating processes from the need for a context switch everytime they want to count the passage of time. If a timecounter clock can be exposed to userland than it needs to set its tc_user member to a non-zero value. Tested with one or multiple counters per architecture. The timing data is shared through a pointer found in the new ELF auxiliary vector AUX_openbsd_timekeep containing timehands information that is frequently updated by the kernel. Timing differences between the last kernel update and the current time are adjusted in userland by the tc_get_timecount() function inside the MD usertc.c file. This permits a much more responsive environment, quite visible in browsers, office programs and gaming (apparently one is are able to fly in Minecraft now). Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others! OK from at least kettenis@, cheloha@, naddy@, sthen@
*	kstat does open, close, and ioctl.	dlg	2020-07-06	1	-1/+9
\|
*	add kstat(4), a subsystem to let the kernel expose statistics to userland.	dlg	2020-07-06	1	-0/+193
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a kstat is an arbitrary chunk of data that a part of the kernel wants to expose to userland. data could mean just a chunk of raw bytes, but generally a kernel subsystem will provide a series of kstat key/value chunks. this code is loosely modelled on kstat in solaris, but with a bunch of simplifications (we don't want to provide write support for example). the named or key/value structure is significantly richer in this version too. eg, ssolaris kstat named data supports integer types, but this version offers differentiation between counters (like the number of packets transmitted on an interface) and gauges (like how long the transmit queue is) and lets kernel providers say what the units are (eg, packets vs bytes vs cycles). the main motivation for this is to improve the visibility of what the kernel is doing while it's running. i wrote this as part of the recent work we've been doing on multiqueue and rss/toeplitz so i could verify that network load is actually spread across multiple rings on a single nic. without this we would be wasting memory and interrupt vectors on multiple rings and still just using the 1st one, and noone would know cos there's no way to see what rings are being used. another thing that can become visible is the different counters that various network cards provide. i'm particularly interested in seeing if packets get dropped because the rings aren't filled fully, which is an effect we've never really observed directly. a small part of wanting this is cos i spend an annoying amount of time instrumenting the kernel when hacking code in it. if most of the scaffolding for the instrumentation is already there, i can avoid repeatedly writing that code and save time. iterated a few times with claudio@ and deraadt@
*	It's been agreed upon that global locks should be expressed using	anton	2020-07-04	3	-24/+24
\| \| \| \| \| \| \| \| \| \|	capital letters in locking annotations. Therefore harmonize the existing annotations. Also, if multiple locks are required they should be delimited using commas. ok mpi@
*	Bring back revision 1.122 with a fix preventing a use-after-free by	anton	2020-06-29	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	serializing calls to pipe_buffer_free(). Repeating the previous commit message: Instead of performing three distinct allocations per created pipe, reduce it to a single one. Not only should this be more performant, it also solves a kqueue related issue found by visa@ who also requested this change: if you attach an EVFILT_WRITE filter to a pipe fd, the knote gets added to the peer's klist. This is a problem for kqueue because if you close the peer's fd, the knote is left in the list whose head is about to be freed. knote_fdclose() is not able to clear the knote because it is not registered with the peer's fd. FreeBSD also takes a similar approach to pipe allocations. once again ok mpi@ visa@
*	ipmi: add a matching kqfilter filter for `seltrue' as well, allowing us	sthen	2020-06-29	1	-3/+3
\| \| \| \| \|	to keep the behavior when switching poll(2) to use kqueue filters. From mpi@
*	fix /dev/ipmi. conf.h r1.150 changed from enodev->selfalse for the poll	sthen	2020-06-29	1	-2/+2
\| \| \| \| \| \|	function but actually a 'true' value is needed; use seltrue instead. Problem reported, kenel bisected and diff tested by Jens A. Griepentrog. ok deraadt@ mpi@
*	Add MID_POWERPC64. These identifiers are only used for kernel core dumps	kettenis	2020-06-28	1	-1/+2
\| \| \| \| \| \|	these days, so inventing our own numbers is fine. From drahn@
*	timecounting: deprecate time_second(9), time_uptime(9)	cheloha	2020-06-26	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	time_second(9) has been replaced in the kernel by gettime(9). time_uptime(9) has been replaced in the kernel by getuptime(9). New code should use the replacement interfaces. They do not suffer from the split-read problem inherent to the time_* variables on 32-bit platforms. The variables remain in sys/kern/kern_tc.c for use via kvm(3) when examining kernel core dumps. This commit completes the deprecation process: - Remove the extern'd definitions for time_second and time_uptime from sys/time.h. - Replace manpage cross-references to time_second(9)/time_uptime(9) with references to microtime(9) or a related interface. - Move the time_second.9 manpage to the attic. With input from dlg@, kettenis@, visa@, and tedu@. ok kettenis@
*	add USEC_TO_TIMEVAL()	jsg	2020-06-26	1	-1/+8
\| \| \| \|	discussed with cheloha@
*	add intrmap_one, some temp code to help us write pci_intr_establish_cpu.	dlg	2020-06-23	1	-1/+2
\| \| \| \| \| \| \|	it means we can do quick hacks to existing drivers to test interrupts on multiple cpus. emphasis on quick and hacks. ok jmatthew@, who will also ok the removal of it at the right time.
*	timecounting: add gettime(9), getuptime(9)	cheloha	2020-06-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	time_second and time_uptime are used widely in the tree. This is a problem on 32-bit platforms because time_t is 64-bit, so there is a potential split-read whenever they are used at or below IPL_CLOCK. Here are two replacement interfaces: gettime(9) and getuptime(9). The "get" prefix signifies that they do not read the hardware timecounter, i.e. they are fast and low-res. The lack of a unit (e.g. micro, nano) signifies that they yield a plain time_t. As an optimization on LP64 platforms we can just return time_second or time_uptime, as a single read is atomic. On 32-bit platforms we need to do the lockless read loop and get the values from the timecounter. In a subsequent diff these will be substituted for time_second and time_uptime almost everywhere in the kernel. With input from visa@ and dlg@. ok kettenis@
*	Extend kqueue interface with EVFILT_EXCEPT filter.	mpi	2020-06-22	1	-1/+8
\| \| \| \| \| \| \| \| \| \|	This filter, already implemented in macOS and Dragonfly BSD, returns exceptional conditions like the reception of out-of-band data. The functionnality is similar to poll(2)'s POLLPRI & POLLRDBAND and it can be used by the kqfilter-based poll & select implementation. ok millert@ on a previous version, ok visa@
*	let userland read vpd info from a pci device.	dlg	2020-06-22	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reading vpd stuff is useful when you're trying to get support information about a pci device, eg, if you want a serial number, or firmware versions, or specific part name or number, it's likely available via vpd. also, im sick of having the diff in my tree. the vpd info is not accessed as bytes read from a capability, but is read via a register in the capability. the same register also supports updating or writing vpd info, which sounds like a bad idea to let userland have raw access to. this adds an ioctl so that userland can ask the kernel to read via the vpd register on its behalf. this ensures that the only access is read access, and it's sanity checked. tested by hrvoje popovski on many devices. ok jmatthew@
*	wireguard is taking over the gif mbuf tag.	dlg	2020-06-21	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	gif used its mbuf tag to store it's interface index so it could detect loops. gre also did this, and i cut most of the drivers (including gif) over to using the gre tag. so the gif tag is unused. wireguard uses the tag to store peer information between different contexts the packet is processed in. it also needs a bit more space to do that. from Matt Dunwoodie and Jason A. Donenfeld ok deraadt@
*	add mq_push. it's like mq_enqueue, but drops from the head, not the tail.	dlg	2020-06-21	1	-1/+2
\| \| \| \|	from Matt Dunwoodie and Jason A. Donenfeld
*	backout pipe change, it crashes some arch	deraadt	2020-06-19	1	-4/+1
\|
*	Instead of performing three distinct allocations per created pipe,	anton	2020-06-17	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	reduce it to a single one. Not only should this be more performant, it also solves a kqueue related issue found by visa@ who also requested this change: if you attach an EVFILT_WRITE filter to a pipe fd, the knote gets added to the peer's klist. This is a problem for kqueue because if you close the peer's fd, the knote is left in the list whose head is about to be freed. knote_fdclose() is not able to clear the knote because it is not registered with the peer's fd. FreeBSD also takes a similar approach to pipe allocations. ok mpi@ visa@
*	Expose SMR list and pointer macros to userspace. This enables the use	visa	2020-06-17	1	-3/+3
\| \| \| \| \| \| \|	of SMR lists in userspace-visible parts of system headers. In addition, the macros allow libkvm to examine SMR data structures. Initial diff by and OK claudio@
*	make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.	dlg	2020-06-17	1	-6/+3
\| \| \| \| \|	i've been wanting to do this for a while, and now that we've got stoeplitz and it gives us 16 bits, it seems like the right time.
*	make intrmap_cpu return a struct cpu_info *, not a "cpuid number" thing.	dlg	2020-06-17	1	-2/+2
\| \| \| \| \|	requested by kettenis@ discussed with jmatthew@
*	add intrmap, an api that picks cpus for devices to attach interrupts to.	dlg	2020-06-17	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	there's been discussions for years (and even some diffs!) about how we should let drivers establish interrupts on multiple cpus. the simple approach is to let every driver look at the number of cpus in a box and just pin an interrupt on it, which is what pretty much everyone else started with, but we have never seemed to get past bikeshedding about. from what i can tell, the principal objections to this are: 1. interrupts will tend to land on low numbered cpus. ie, if drivers try to establish n interrupts on m cpus, they'll start at cpu 0 and go to cpu n, which means cpu 0 will end up with more interrupts than cpu m-1. 2. some cpus shouldn't be used for interrupts. why a cpu should or shouldn't be used for interrupts can be pretty arbitrary, but in practical terms i'm going to borrow from the scheduler and say that we shouldn't run work on hyperthreads. 3. making all the drivers make the same decisions about the above is a lot of maintenance overhead. either we will have a bunch of inconsistencies, or we'll have a lot of untested commits to keep everything the same. my proposed solution to the above is this diff to provide the intrmap api. drivers that want to establish multiple interrupts ask the api for a set of cpus it can use, and the api considers the above issues when generating a set of cpus for the driver to use. drivers then establish interrupts on cpus with the info provided by the map. it is based on the if_ringmap api in dragonflybsd, but generalised so it could be used by something like nvme(4) in the future. this version provides numeric ids for CPUs to drivers, but as kettenis@ has been pointing out for a very long time, it makes more sense to use cpu_info pointers. i'll be updating the code to address that shortly. discussed with deraadt@ and jmatthew@ ok claudio@ patrick@ kettenis@
*	Implement a simple kqfilter for deadfs matching its poll handler.	mpi	2020-06-15	1	-1/+2
\| \| \| \|	ok visa@, millert@
*	Set __EV_HUP when the conditions matching poll(2)'s POLLUP are found.	mpi	2020-06-15	1	-2/+5
\| \| \| \| \| \|	This is only done in poll-compatibility mode, when __EV_POLL is set. ok visa@, millert@
*	Revert addition of double underbars for filter-specific flag.	mpi	2020-06-12	1	-2/+2
\| \| \| \|	Port breakages reported by naddy@
*	Rename poll-compatibility flag to better reflect what it is.	mpi	2020-06-11	1	-3/+3
\| \| \| \| \| \|	While here prefix kernel-only EV flags with two underbars. Suggested by kettenis@, ok visa@