wireguard-openbsd - WireGuard implementation for the OpenBSD kernel

	Commit message (Collapse)	Author	Age	Files	Lines
*	When tcp_close() is running in parallel with fill_file(), the kernel	bluhm	2019-06-13	1	-2/+17
\| \| \| \| \| \| \|	could crash due to missing inp_ppcb. This happend when fstat(1) was called often and TCP was aborted with reset. Protect the sysctl path with the net lock. OK mpi@
*	Revert to using the SCHED_LOCK() to protect time accounting.	mpi	2019-06-01	1	-3/+1
\| \| \| \| \| \| \| \| \|	It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
*	Use a per-process mutex to protect time accounting instead of SCHED_LOCK().	mpi	2019-05-31	1	-1/+3
\| \| \| \| \| \| \|	Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
*	Read and assign the integer value only once. With this sysctl_int() will	claudio	2019-05-22	1	-3/+7
\| \| \| \| \| \| \|	do word loads and stores and so partial updates should no longer be observed. With this accessing global variables set by sysctl_int() should be mostly MP save. OK dlg@ mpi@
*	Add a sysctl accessor to struct pf_status. The pf_status only holds the	claudio	2019-05-09	1	-1/+6
\| \| \| \| \| \|	current status and statistics and can be exported without super-user rights via sysctl to make it easier for tools like systat to access those. OK deraadt@, sashan@
*	Add a dedicated sysctl(2) node for witness(4).	visa	2019-01-29	1	-1/+5
\| \| \| \| \| \| \| \|	The new node contains the subsystem's main control variable, kern.witness.watch. It is aliased by the old name, kern.witnesswatch. The alias will be removed in the future. OK anton@ mpi@
*	Move boottime into the timehands.	cheloha	2019-01-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To protect the timehands we first need to protect the basis for all UTC time in the kernel: the boottime. Because the boottime can be changed at any time it needs to be versioned along with the other members of the timehands to enable safe lockless reads when using it for anything. So the global boottime timespec goes away and the static boottimebin becomes a member of the timehands. Instead of reading the global boottime you use one of two interfaces: binboottime(9) or microboottime(9). nanoboottime(9) can trivially be added later, though there are no consumers for it at the moment. This introduces one small change in behavior. We used to advance the reported boottime just before launching kernel threads from main(). This makes it look to userland like we "booted" moments before those threads were launched. Because there is no longer a boottime global we can no longer trivially do this from main(), so the boottime we report to userspace via e.g. kern.boottime will now reflect whatever the time was when we bootstrapped the timehands via inittodr(9). This is usually no more than a minute before the kernel threads are launched from main(). The prior behavior can be restored by adding a new interface to the timecounter layer in a future commit. Based on FreeBSD r303387. Discussed with mpi@ and visa@. ok visa@
*	delete the dns jackport experiment. it has no future.	tedu	2018-11-19	1	-11/+1
\|
*	Add new KERN_CPUSTATS sysctl(2) so we can identify offline CPUs.	cheloha	2018-11-17	1	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because of hw.smt we need a way to determine whether a given CPU is "online" or "offline" from userspace. KERN_CPTIME2 is an array, and so cannot be cleanly extended for this purpose, so add a new sysctl(2) KERN_CPUSTATS with an extensible struct. At the moment it's just KERN_CPTIME2 with a flags member, but it can grow as needed. KERN_CPUSTATS appears to have been defined by BSDi long ago, but there are few (if any) packages in the wild still using the symbol so breakage in ports should be near zero. No other system inherited the symbol from BSDi, either. Then, use the new sysctl(2) in systat(1) and top(1): - systat(1) draws placeholder marks ('-') instead of percentages for offline CPUs in the cpu view. - systat(1) omits offline CPU ticks when drawing the "big bar" in the vmstat view. The upshot is that the bar isn't half idle when half your logical CPUs are disabled. - top(1) does not draw lines for offline CPUs; if CPUs toggle on or offline in interactive mode we redraw the display to expand/reduce space for the new/missing CPUs. This is consistent with what some top(1) implementations do on Linux. - top(1) omits offline CPUs from the totals when CPU totals are combined into a single line (the '-1' flag). Originally prompted by deraadt@. Discussed endlessly with deraadt@, ketennis@, and sthen@. Tested by jmc@ and jca@. Earlier versions also discussed with jca@. Earlier versions tested by jmc@, tb@, and many others. docs ok jmc@, kernel bits ok ketennis@, everything ok sthen@, "Is your stuff in yet?" deraadt@
*	Revert KERN_CPTIME2 ENODEV changes in kernel and userspace.	cheloha	2018-10-05	1	-3/+1
\| \| \| \|	ok kettenis deraadt
*	Revert the inpcb table mutex commit. It triggers a witness panic	bluhm	2018-10-04	1	-26/+11
\| \| \| \| \| \| \|	in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
*	KERN_CPTIME2: set ENODEV if the CPU is offline.	cheloha	2018-09-26	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This lets userspace distinguish between idle CPUs and those that are not schedulable because hw.smt=0. A subsequent commit probably needs to add documentation for this to sysctl.2 (and perhaps elsewhere) after the dust settles. Also included here are changes to systat(1) and top(1) that account for the ENODEV case and adjust behavior accordingly: - systat(1)'s cpu view prints placeholder marks ('-') instead of percentages for each state if the given CPU is offline. - systat(1)'s vmstat view checks for offline CPUs when computing the machine state total and excludes them, so the CPU usage graph only represents the states for online CPUs. - top(1) does not draw CPU rows for offline CPUs when the view is redrawn. If CPUs "go offline", percentages for each state are replaced by placeholder marks ('-'); the view will need to be redrawn to remove these rows. If CPUs "go online" the view will need to be redrawn to show these new CPUs. In "combined CPU" mode, the count and the state totals only represent online CPUs. Ports using KERN_CPTIME2 will need to be updated. The changes described above to make systat(1) and top(1) aware of the ENODEV case and gracefully handle a changing HW_NCPUONLINE while the application is running are not necessarily appropriate for each and every port. The changes described above are so extensive in part to demonstrate one way a program might be made robust to changing CPU availability. In particular, changing hw.smt after boot is an extremely rare event, and this needs to be weighed when updating ports. The logic needed to account for the KERN_CPTIME2 ENODEV case is very roughly: if (sysctl(...) == -1) { if (errno != ENODEV) { /* Actual error occurred. / } else { / CPU is offline. / } } else { / CPU is online and CPU states were set by sysctl(2). */ } Prompted by deraadt@. Basic idea for ENODEV from kettenis@. Discussed at length with kettenis@. Additional testing by tb@. No complaints from hackers@ after a week. ok kettenis@, "I think you should commit [now]" deraadt@
*	As a step towards per inpcb or socket locks, remove the net lock	bluhm	2018-09-20	1	-11/+26
\| \| \| \| \| \| \| \| \| \| \| \|	for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
*	Add hw.ncpuonline to count the number of online CPUs.	cheloha	2018-07-12	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The introduction of hw.smt means that logical CPUs can be disabled after boot and prior to suspend/resume. If hw.smt=0 (the default), there needs to be a way to count the number of hardware threads available on the system at any given time. So, import HW_NCPUONLINE/hw.ncpuonline from NetBSD and document it. hw.ncpu becomes equal to the number of CPUs given to sched_init_cpu() during boot, while hw.ncpuonline is equal to the number of CPUs available to the scheduler in the cpuset "sched_all_cpus". Set_SC_NPROCESSORS_ONLN equal to this new sysctl and keep _SC_NPROCESSORS_CONF equal to hw.ncpu. This is preferable to adding a new sysctl to count the number of configured CPUs and keeping hw.ncpu equal to the number of online CPUs because such a change would break software in the ecosystem that relies on HW_NCPU/hw.ncpu to measure CPU usage and the like. Such software in base includes top(1), systat(1), and snmpd(8), and perhaps others. We don't need additional locking to count the cardinality of a cpuset in this case because the only interfaces that can modify said cardinality are sysctl(2) and ioctl(2), both of which are under the KERNEL_LOCK. Software using HW_NCPU/hw.ncpu to determine optimal parallism will need to be updated to use HW_NCPUONLINE/hw.ncpuonline. Until then, such software may perform suboptimally. However, most changes will be similar to the change included here for libcxx's std::thread:hardware_concurrency(): using HW_NCPUONLINE in lieu of HW_NCPU should be sufficient for determining optimal parallelism for most software if the change to _SC_NPROCESSORS_ONLN is insufficient. Prompted by deraadt. Discussed at length with kettenis, deraadt, and sthen. Lots of patch tweaks from kettenis. ok kettenis, "proceed" deraadt
*	Update the file reference count field `f_count' using atomic operations	visa	2018-07-02	1	-3/+1
\| \| \| \| \| \| \| \| \|	instead of using a mutex for update serialization. Use a per-fdp mutex to manage updating of file instance pointers in the `fd_ofiles' array to let fd_getfile() acquire file references safely with concurrent file reference releases. OK mpi@
*	Lock the file descriptor table when accessing the `fd_ofileflags' array.	visa	2018-07-01	1	-2/+5
\| \| \| \| \| \| \| \|	This prevents the array from being freed too early. In the function unp_internalize(), the locking also ensures the per-fdp flags stay coherent with the file instance. OK mpi@
*	Unlock sendmsg(2) and sendto(2).	mpi	2018-06-20	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	These syscalls can now be executed w/o the KERNEL_LOCK() depending on the kind of socket. The current solution uses a single global mutex to serialize access to, and reference count, 'struct file'. ok visa@, kettenis@
*	SMT (Simultanious Multi Threading) implementations typically share	kettenis	2018-06-19	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TLBs and L1 caches between threads. This can make cache timing attacks a lot easier and we strongly suspect that this will make several spectre-class bugs exploitable. Especially on Intel's SMT implementation which is better known as Hypter-threading. We really should not run different security domains on different processor threads of the same core. Unfortunately changing our scheduler to take this into account is far from trivial. Since many modern machines no longer provide the ability to disable Hyper-threading in the BIOS setup, provide a way to disable the use of additional processor threads in our scheduler. And since we suspect there are serious risks, we disable them by default. This can be controlled through a new hw.smt sysctl. For now this only works on Intel CPUs when running OpenBSD/amd64. But we're planning to extend this feature to CPUs from other vendors and other hardware architectures. Note that SMT doesn't necessarily have a posive effect on performance; it highly depends on the workload. In all likelyhood it will actually slow down most workloads if you have a CPU with more than two cores. ok deraadt@
*	Move the declarations of the raw ip and ip6 pcb tables into the	bluhm	2018-06-02	1	-5/+1
\| \| \| \| \|	in_pcb.h header file. OK mpi@ visa@
*	Add missing #include "audio.h" needed for the NAUDIO macro.	ratchov	2018-05-27	1	-1/+3
\| \| \| \|	suggested by jsg, ok sthen.
*	Condition the new audio_record_enable pieces on NAUDIO > 0, fixing	sthen	2018-05-26	1	-2/+9
\| \| \| \|	kernel builds without audio (for example, ramdisks). ok florian@
*	In addition to "on" and "off", allow the audio "record.enable" mixer	ratchov	2018-05-26	1	-1/+20
\| \| \| \| \| \| \| \|	knob to take the new "sysctl" value, which is the default. In this case, the device behavior is determined by the new "kern.audio.record" sysctl(2), which defaults to zero. ok florian
*	Add kern.witnesswatch sysctl for controlling witness(4). By default,	visa	2018-05-16	1	-1/+6
\| \| \| \| \| \| \|	lock order checking is disabled but it can be enabled at runtime. Suggested by deraadt@ / mpi@ OK mpi@
*	Use fd_getfile() in sysctl_file() instead of rewriting it.	mpi	2018-05-08	1	-7/+5
\| \| \| \| \| \|	This gives use refcounting for free which is what we need for MP. ok bluhm@, visa@
*	Change fd_iterfile() to not return imature fps instead of skipping them	mpi	2018-05-08	1	-4/+2
\| \| \| \| \| \|	later. ok bluhm@, visa@
*	Protect per-file counters and document which lock is used to protect	mpi	2018-05-08	1	-1/+3
\| \| \| \| \| \| \| \| \|	the other fields. Once we no longer have any [k] (kernel lock) protections, we'll be able to unlock almost all network related syscalls. Inputs from and ok bluhm@, visa@
*	Introduce fd_iterfile() a new helper function to iterate over `filehead'.	mpi	2018-04-25	1	-18/+5
\| \| \| \| \| \| \|	This turns `filehead' into a local variable, that will make it easier to protect it. ok visa@
*	Remove almost unused `flags' argument of suser().	mpi	2018-02-19	1	-12/+12
\| \| \| \| \| \| \|	The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
*	Stop assuming <sys/file.h> will pull in fcntl.h when _KERNEL is defined.	guenther	2018-01-02	1	-1/+2
\| \| \| \|	ok millert@ sthen@
*	Remove NET_LOCK()'s argument.	mpi	2017-08-11	1	-4/+3
\| \| \| \|	Tested by Hrvoje Popovski, ok bluhm@
*	Do not touch file pointers for which FILE_IS_USABLE() is false.	gerhard	2017-06-20	1	-1/+2
\| \| \| \| \| \|	They're might not be fully constructed. ok mpi@ deraadt@ bluhm@
*	tweak sysctl_string and sysctl_tstring to use size_t for lengths, not int	dlg	2017-06-14	1	-5/+6
\| \| \| \| \|	theyre both wrappers around sysctl__string, which is where half the fix is too.
*	use size_t for the size of things in memory, not int.	dlg	2017-06-13	1	-4/+5
\| \| \| \| \| \| \| \| \|	this tweaks the len argument to sysctl_rdstring, sysctl_struct, and sysctl_rdstruct. there's probably more to fix. ok millert@
*	Do not export the protocol PCB pointer from kernel to non-root users	bluhm	2017-05-06	1	-2/+3
\| \| \| \| \| \|	also in the IPv6 case. This fixes "netstat -An -f inet6 -p tcp" and shows 0x0. report and OK dhill@
*	Enforce that sysctl kern.somaxconn and sominconn can only be set	bluhm	2017-04-27	1	-5/+21
\| \| \| \| \|	to valid values. The so_qlimit is type short. report Dillon Jay Pena; OK deraadt@
*	timeval has trailing padding on powerpc and m88k, so memset it before	guenther	2017-04-05	1	-1/+2
\| \| \| \| \| \|	copyout to avoid leaking kernel stack ok deraadt@
*	Here at OpenBSD we change ABIs at the fling of a hat. Just in case a	deraadt	2017-04-05	1	-3/+3
\| \| \| \| \|	future disk info sysctl has pads in the structures, use M_ZERO when allocating the storage to avoid leaking kernel memory.
*	Enforce that tcbtable and udbtable must be accessed with the NET_LOCK().	mpi	2017-03-07	1	-3/+3
\| \| \| \| \| \| \| \|	Get rid of the old splnet()/splx() dances. What's protecting them right now is the KERNEL_LOCK(). but since pf(4) look at these tables we want to protect them in another way, hence the NET_LOCK(), at least as hint. ok bluhm@
*	p_comm is the process's command and isn't per thread, so move it from	guenther	2017-01-21	1	-3/+2
\| \| \| \| \| \|	struct proc to struct process. ok deraadt@ kettenis@
*	Export p_cpuid via sysctl for all processes; ok guenther	mikeb	2016-11-11	1	-2/+2
\|
*	Split PID from TID, giving processes a PID unrelated to the TID of their	guenther	2016-11-07	1	-3/+2
\| \| \| \| \| \|	initial thread ok jsing@ kettenis@
*	move the mbstat structure to percpu counters	dlg	2016-10-24	1	-4/+20
\| \| \| \| \| \| \|	each cpus counters still have to be protected by splnet, but this is better thana single set of counters protected by a global mutex. ok bluhm@
*	Factor out pr->ps_vmspace into a local variable for fill_kproc()	guenther	2016-10-22	1	-4/+5
\| \| \| \|	ok jsing@ kettenis@
*	upon further review, port numbers go all the way up to ushort max	tedu	2016-10-08	1	-2/+2
\|
*	initialize the port variable before sysctl, since it's also read out.	tedu	2016-10-08	1	-2/+2
\|
*	introduce a sysctl to hijack dns sockets. when set to a port number,	tedu	2016-10-07	1	-1/+11
\| \| \| \| \| \| \|	all dns socket connections will be redirected to localhost:port. this could be a sockopt on the listening socket, but sysctl is an easier interface to work with right now. ok deraadt
*	Add va_nlink information to struct kinfo_file (so bump the shlib minor)	guenther	2016-10-02	1	-1/+2
\| \| \| \|	from Sebastien Marie
*	Make a move towards ending 4 decades of kernel snooping.	deraadt	2016-09-25	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add sysctl kern.allowkmem (default 0) which controls the ability to open /dev/mem or /dev/kmem at securelevel > 0. Over 15 years we converted 99% of utilities in the tree to operate on sysctl-nodes (either by themselves or via code hiding in the guts of -lkvm). pstat -d and -v & procmap are affected and continued use of them will require kern.allowkmem=1 in /etc/sysctl.conf. acpidump (and it's buddy sendbug) are affected, but we'll work out a solution soon. There will be some impact in ports. ok kettenis guenther
*	sysctl KERN_ARND is no longer used (in ports, it only occurs in fallback	deraadt	2016-09-21	1	-14/+1
\| \| \| \| \| \|	paths of libevent). This interface was the first generation of what eventually became getentropy(2) and arc4random(3) -- june 1997! Ports scan by sthen, general agreement guenther
*	option INSECURE is obsolete	deraadt	2016-09-18	1	-5/+1
\|