wireguard-openbsd - WireGuard implementation for the OpenBSD kernel

	Commit message (Collapse)	Author	Age	Files	Lines
*	Release mbuf(9) chain with a simple m_freem(9) loop in sorflush().	mvs	2021-02-18	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Passing local copy of socket to sbrelease() is too complicated to just free receive buffer. We don't allocate large object on the stack. Also we don't pass unlocked socket to soassertlocked() within sbdrop(). This was not triggered because we lock the whole layer with one lock. Also sorflush() is now private to kern/uipc_socket.c, so it's definition was made to be in accordance. ok claudio@ mpi@
*	Move single_thread_set() out of KERNEL_LOCK().	mpi	2021-02-15	1	-2/+2
\| \| \| \| \| \| \|	Use the SCHED_LOCK() to ensure `ps_thread' isn't being modified by a sibling when entering tsleep(9) w/o KERNEL_LOCK(). ok visa@
*	Move UNIX domain sockets out of kernel lock. The new `unp_lock' rwlock(9)	mvs	2021-02-10	1	-14/+21
\| \| \| \| \| \| \| \|	used as solock()'s backend to protect the whole layer. With feedback from mpi@. ok bluhm@ claudio@
*	Revert the convertion of per-process thread into a SMR_TAILQ.	mpi	2021-02-08	1	-15/+6
\| \| \| \| \|	We did not reach a consensus about using SMR to unlock single_thread_set() so there's no point in keeping this change.
*	Simplify sleep_setup API to two operations in preparation for splitting	mpi	2021-02-08	2	-11/+4
\| \| \| \| \| \| \| \| \| \| \| \|	the SCHED_LOCK(). Putting a thread on a sleep queue is reduce to the following: sleep_setup(); /* check condition or release lock */ sleep_finish(); Previous version ok cheloha@, jmatthew@, ok claudio@
*	6.9-beta	deraadt	2021-02-06	1	-3/+3
\|
*	Remove last remnants of ASU ac_flag from accounting.	rob	2021-02-04	1	-2/+1
\| \| \| \|	OK deraadt@, bluhm@
*	Remove obsolete vnode operation vector declarations.	visa	2021-02-01	1	-6/+1
\| \| \| \|	OK bluhm@, claudio@, mpi@, semarie@
*	introduce ujoy(4), a restricted subset of uhid(4) for gamecontrollers.	thfr	2021-01-23	1	-1/+9
\| \| \| \| \| \| \| \|	This includes ujoy_hid_is_collection() to work around limitations of hid_is_collection() until this can be combined without fallout. input, testing with 8bitdo controller, and ok brynet@ PS4 controller testing, fix for hid_is_collection, and ok mglocker@
*	Mark `ps_oppid' as atomic.	mvs	2021-01-18	1	-2/+2
\| \| \| \|	ok mpi@
*	regen	mvs	2021-01-18	2	-4/+4
\|
*	Revert wrong commit.	mvs	2021-01-18	2	-4/+4
\|
*	Convert ifunit() to if_unit(9).	mvs	2021-01-18	2	-4/+4
\| \| \| \|	ok sashan@
*	Cache parent's pid as `ps_ppid' and use it instead of `ps_pptr->ps_pid'.	mvs	2021-01-17	1	-1/+2
\| \| \| \| \| \|	This allows us to unlock getppid(2). ok mpi@
*	kqueue: Revise fd close notification	visa	2021-01-17	2	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deliver file descriptor close notification for __EV_POLL knotes through struct kevent that kqueue_scan() returns. This replaces the previous way of returning EBADF from kqueue_scan(), making it easier to determine what exactly has changed. When a file descriptor is closed, its __EV_POLL knotes are turned into one-shot events and queued for delivery. These knotes are "unregistered" as they are reachable only through the queue of active events. This reduces interference with the normal workings of kqueue. However, more care is needed to avoid leaking knotes. In addition, the unregistering removes a limit on the number of issued knotes. To prevent accumulation of pending fd close notifications, kqpoll_init() flushes the active queue at the start of a kqpoll scan. OK mpi@
*	Replace SB_KNOTE and sb_flagsintr with direct checking of klist.	visa	2021-01-17	1	-6/+3
\| \| \| \|	OK mpi@ as part of a larger diff
*	kernel, sysctl(8): remove dead variable: tickadj	cheloha	2021-01-13	2	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The global "tickadj" variable is a remnant of the old NTP adjustment code we used in the kernel before the current timecounter subsystem was imported from FreeBSD circa 2004 or 2005. Fifteen years hence it is completely vestigial and we can remove it. We probably should have removed it long ago but I guess it slipped through the cracks. FreeBSD removed it in 2002: https://cgit.freebsd.org/src/commit/?id=e1d970f1811e5e1e9c912c032acdcec6521b2a6d NetBSD and DragonflyBSD can probably remove it, too. We export tickadj via the kern.clockrate sysctl(2), so update sysctl.2 and sysctl(8) accordingly. Hypothetically this change could break someone's sysctl(8) parsing script. I don't think that's very likely. ok mvs@
*	New rw_obj_init() API providing reference-counted rwlock.	mpi	2021-01-11	1	-1/+23
\| \| \| \| \| \| \|	Original port from NetBSD by guenther@, required for upcoming amap & anon locking. ok kettenis@
*	Simplify sleep signal handling a bit by introducing sleep_signal_check().	claudio	2021-01-11	1	-3/+2
\| \| \| \| \| \| \|	The common code is moved to sleep_signal_check() and instead of multiple state variables for sls_sig and sls_unwind only one sls_sigerr is set. This simplifies the checks in sleep_finish_signal() a great bit. Idea from and OK mpi@
*	pool(9): remove ticks	cheloha	2021-01-02	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the pool(9) timeouts to use the system uptime instead of ticks. - Change the timeouts from variables to macros so we can use SEC_TO_NSEC(). This means these timeouts are no longer patchable via ddb(4). dlg@ does not think this will be a problem, as the timeout intervals have not changed in years. - Use low-res time to keep things fast. Add a local copy of getnsecuptime() to subr_pool.c to keep the diff small. We will need to move getnsecuptime() into kern_tc.c and document it later if we ever have other users elsewhere in the kernel. - Rename ph_tick -> ph_timestamp and pr_cache_tick -> pr_cache_timestamp. Prompted by tedu@ some time ago, but the effort stalled (may have been my fault). Input from kettenis@ and dlg@. Special thanks to mpi@ for help with struct shuffling. This change does not increase the size of struct pool_page_header or struct pool. ok dlg@ mpi@
*	Add singly-linked tail queue macros from FreeBSD.	millert	2020-12-30	1	-1/+98
\| \| \| \| \| \|	These are essentially equivalent to the simple queue macros from NetBSD but predate them and are more widely available on other systems. OK mpi@ denis@
*	Analog to the the kern.audio.record sysctl parameter for audio(4)	mglocker	2020-12-28	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \|	devices, introduce kern.video.record for video(4) devices. By default kern.video.record will be set to zero, blanking all data delivered by device drivers which attach to video(4). The idea was initially proposed by Laurence Tratt <laurie AT tratt DOT net>. ok mpi@
*	Make NET_LOCK() assertions conditional to DIAGNOSTIC	visa	2020-12-27	1	-1/+8
\| \| \| \| \| \|	This saves about 2.5 KiB off amd64's RAMDISK after gzip compression. OK deraadt@, mpi@, cheloha@
*	Refactor klist insertion and removal	visa	2020-12-25	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Rename klist_{insert,remove}() to klist_{insert,remove}_locked(). These functions assume that the caller has locked the klist. The current state of locking remains intact because the kernel lock is still used with all klists. Add new functions klist_insert() and klist_remove() that lock the klist internally. This allows some code simplification. OK mpi@
*	Small smr_grace_wait() optimization	visa	2020-12-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Make the SMR thread maintain an explicit system-wide grace period and make CPUs observe the current grace period when crossing a quiescent state. This lets the SMR thread avoid a forced context switch for CPUs that have already entered the latest grace period. This change provides a small improvement in smr_grace_wait()'s performance in terms of context switching. OK mpi@, anton@
*	tsleep(9): add global "nowake" channel for threads avoiding wakeup(9)	cheloha	2020-12-24	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It would be convenient if there were a channel a thread could sleep on to indicate they do not want any wakeup(9) broadcasts. The easiest way to do this is to add an "int nowake" to kern_synch.c and extern it in sys/systm.h. You use it like this: #include <sys/systm.h> tsleep_nsec(&nowait, ...); There is now no need to handroll a local dead channel, e.g. int chan; tsleep_nsec(&chan, ...); which expands the stack. Local dead channels will be replaced with &nowake in later patches. One possible problem with this "one global channel" approach is sleep queue congestion. If you have lots of threads sleeping on &nowake you might slow down a wakeup(9) on a different channel that hashes into the same queue. Unsure how much of problem this actually is, if at all. NetBSD and FreeBSD have a "pause" interface in the kernel that chooses a suitable channel automatically. To keep things simple and avoid adding a new interface we will start with this global channel. Discussed with mpi@, claudio@, kettenis@, and deraadt@. Basically designed by kettenis@, who vetoed my other proposals. Bugs caught by deraadt@, tb@, and patrick@.
*	Introduce klistops	visa	2020-12-20	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch extends struct klist with a callback descriptor and an argument. The main purpose of this is to let the kqueue subsystem assert when a klist should be locked, and operate the klist lock in klist_invalidate(). Access to a knote list of a kqueue-monitored object has to be serialized somehow. Because the object often has a lock for protecting its state, and because the object often acquires this lock at the latest in its f_event callback function, it makes sense to use this lock also for the knote lists. The existing uses of NOTE_SUBMIT already show a pattern that is likely to become more prevalent. There could be an embedded lock in klist. However, such a lock would be redundant in many cases. The code cannot rely on a single lock type (mutex, rwlock, something else) because the needs of monitored objects vary. In addition, an embedded lock would introduce new lock order constraints. Note that the patch does not rule out use of dedicated klist locks. The patch introduces a way to associate lock operations with a klist. The caller can provide a custom implementation, or use a ready-made interface with a mutex or rwlock. For compatibility with old code, the new code falls back to using the kernel lock if no specific klist initialization has been done. The existing code already relies on implicit initialization of klist. Sadly, this change increases the size of struct klist. dlg@ thinks this is not fatal, though. OK mpi@
*	Add fd close notification for kqueue-based poll() and select()	visa	2020-12-18	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	When the file descriptor of an __EV_POLL-flagged knote is closed, post EBADF through the kqueue instance to the caller of kqueue_scan(). This lets kqueue-based poll() and select() preserve their current behaviour of returning EBADF when a polled file descriptor is closed concurrently. OK mpi@
*	Make knote_{activate,remove}() internal to kern_event.c.	visa	2020-12-18	1	-3/+1
\| \| \| \|	OK mpi@
*	Add helpers around rw_status(9) to help checking if a lock is held.	mpi	2020-12-15	1	-1/+24
\| \| \| \|	ok visa@
*	Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.	jan	2020-12-12	1	-2/+2
\| \| \| \| \| \|	OK dlg@, bluhm@ No Opinion mpi@ Not against it claudio@
*	Add kernel-only per-thread kqueue & helpers to initialize and free it.	mpi	2020-12-09	2	-2/+8
\| \| \| \| \| \|	This will soon be used by select(2) and poll(2). ok anton@, visa@
*	Convert the per-process thread list into a SMR_TAILQ.	mpi	2020-12-07	1	-6/+15
\| \| \| \| \| \| \|	Currently all iterations are done under KERNEL_LOCK() and therefor use the *_LOCKED() variant. From and ok claudio@
*	Refactor kqueue_scan() so it can be used by other syscalls.	mpi	2020-12-07	1	-2/+2
\| \| \| \| \| \| \|	Stop iterating in the function and instead copy the returned events to userland after every call. ok visa@
*	Hoist DTYPE_* out of #ifdef _KERNEL.	martijn	2020-12-02	1	-6/+9
\| \| \| \| \| \|	Similar to what NetBSD and FreeBSD have done. OK guenther@
*	Change kqueue_scan() to keep track of collected events in the given context.	mpi	2020-11-25	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is now possible to call the function multiple times to collect events. For that, the end marker has to be preserved between calls because otherwise the scan might collect an event more than once. If a collected event gets reactivated during scanning, it will be added at the tail of the queue, out of reach because of the end marker. This is required to implement select(2) and poll(2) on top of kqueue_scan(). Done & originally committed by visa@ in r1.143, in snap for more than 2 weeks. ok visa@, anton@
*	Fix comment _SYS_VIDEOIO_H -> _SYS_VIDEOIO_H_	mglocker	2020-11-20	1	-2/+2
\|
*	Constify dktypenames and fstypenames in libc.	guenther	2020-11-14	1	-3/+3
\| \| \| \| \| \|	Adjust variable declaration in disklabel to match. ok millert@ deraadt@
*	setitimer(2): ITIMER_REAL: protect state with per-process mutex ps_mtx	cheloha	2020-11-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To unlock getitimer(2) and setitimer(2) we need to protect the per-process ITIMER_REAL state with something other than the kernel lock. As the ITIMER_REAL timeout callback realitexpire() runs at IPL_SOFTCLOCK the per-process mutex ps_mtx is appropriate. In setitimer() we need to use ps_mtx instead of the global itimer_mtx if the given timer is ITIMER_REAL. Easy. The ITIMER_REAL timeout callback routine realitexpire() is trickier. When we enter ps_mtx during the callback we need to check if the timer was cancelled or rescheduled. A thread from the process can call setitimer(2) at the exact moment the callback is about to run from timeout_run() (see kern_timeout.c). Update the locking annotation in sys/proc.h accordingly. ok anton@
*	In case of failure, call sigexit() from trapsignal instead of sensig().	mpi	2020-11-08	1	-2/+2
\| \| \| \| \| \| \|	Simplify MD code and reduce the amount of recursion into the signal code which helps when dealing with locks. ok cheloha@, deraadt@
*	Convert ffs_sysctl to sysctl_bounded_args	gnezdo	2020-11-07	1	-2/+2
\| \| \| \| \| \| \|	Requires sysctl_bounded_arr branch to support sysctl_rdint. The read-only variables are marked by an empty range of [1, 0]. OK millert@
*	Add feature to force the selection of source IP address	denis	2020-10-29	1	-2/+4
\| \| \| \| \| \| \|	Based/previous work on an idea from deraadt@ Input from claudio@, djm@, deraadt@, sthen@ OK deraadt@
*	Serialize msgbuf access with a mutex.	visa	2020-10-25	1	-7/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a system-wide mutex that serializes msgbuf operations. The mutex controls access to all modifiable fields of struct msgbuf. It also covers logsoftc.sc_state. To avoid adding extra lock order constraints that would affect use of printf(9), the code does not take new locks when the log mutex is held. The code assumes that there is at most one thread using logread(). This keeps the logic simple. If there was more than one reader, logread() might return the same data to different readers. Also, log wakeup might not be reliable with multiple threads. Tested in snaps for two weeks. OK mpi@
*	timeout(9): basic support for kclock timeouts	cheloha	2020-10-15	1	-6/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A kclock timeout is a timeout that expires at an absolute time on one of the kernel's clocks. A timeout's absolute expiration time is kept in a new member of the timeout struct, to_abstime. The timeout's kclock is set at initialization and is kept in another new member of the timeout struct, to_kclock. Kclock timeouts are desireable because they have nanosecond resolution, regardless of the value of hz(9). The timecounter subsystem is also inherently NTP-sensitive, so timeouts scheduled against the subsystem are NTP-sensitive. These two qualities guarantee that a kclock timeout will never expire early. Currently there is support for one kclock, KCLOCK_UPTIME (the uptime clock). Support for KCLOCK_RUNTIME (the runtime clock) and KCLOCK_UTC (the UTC clock) is planned for the future. Support for these additional kclocks will allow us to implement some of the POSIX interfaces OpenBSD is missing, e.g. clock_nanosleep() and timer_create(). We could also use it to provide proper absolute timeouts for e.g. pthread_mutex_timedlock(3). Kclock timeouts are initialized with timeout_set_kclock(). They can be scheduled with either timeout_in_nsec() (relative timeout) or timeout_at_ts() (absolute timeout). They are incompatible with timeout_add(9), timeout_add_sec(9), timeout_add_msec(9), timeout_add_usec(9), timeout_add_nsec(9), and timeout_add_tv(9). They can be cancelled with timeout_del(9) or timeout_del_barrier(9). Documentation for the new interfaces is a work in progress. For now, tick-based timeouts remain supported alongside kclock timeouts. They will remain supported until we are certain we don't need them anymore. It is possible we will never remove them. I would rather not keep them around forever, but I cannot predict what difficulties we will encounter while converting tick-based timeouts to kclock timeouts. There are a lot of timeouts in the kernel. Kclock timeouts are more costly than tick-based timeouts: - Calling timeout_in_nsec() incurs a call to nanouptime(9). Reading the hardware timecounter is too expensive in some contexts, so care must be taken when converting existing timeouts. We may add a flag in the future to cause timeout_in_nsec() to use getnanouptime(9) instead of nanouptime(9), which is much cheaper. This may be appropriate for certain classes of timeouts. tcp/ip session timeouts come to mind. - Kclock timeout expirations are kept in a timespec. Timespec arithmetic has more overhead than 32-bit tick arithmetic, so processing kclock timeouts during softclock() is more expensive. On my machine the overhead for processing a tick-based timeout is ~125 cycles. The overhead for a kclock timeout is ~500 cycles. The overhead difference on 32-bit platforms is unknown. If it proves too large we may need to use a 64-bit value to store the expiration time. More measurement is needed. Priority targets for conversion are setitimer(2), *sleep_nsec(9), and the kevent(2) EVFILT_TIMER timers. Others will follow. With input from mpi@, visa@, kettenis@, dlg@, guenther@, claudio@, deraadt@, probably many others. Older version tested by visa@. Problems found in older version by bluhm@. Current version tested by Yuichiro Naito. "wait until after unlock" deraadt@, ok kettenis@
*	_exit(2), execve(2): tweak per-process interval timer cancellation	cheloha	2020-10-15	1	-2/+2
\| \| \| \| \| \| \| \|	If we fold the for-loop iterating over each interval timer into the helper function the result is slightly tidier than what we have now. Rename the helper function "cancel_all_itimers". Based on input from millert@ and kettenis@.
*	sys/kernel.h: remove dead externs: tickfix, tixfixinterval, tickdelta, ...	cheloha	2020-10-15	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \|	miod@ removed several time-related globals from the kernel with the commit "unifdef -d __HAVE_TIMECOUNTER" (see sys/kern/kern_clock.c v1.76). He neglected to remove their externs from sys/kernel.h, though. Remove the externs. With help from jsg@. ok jsg@
*	_exit(2), execve(2): cancel per-process interval timers safely	cheloha	2020-10-15	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During _exit(2) and sometimes during execve(2) we need to cancel any active per-process interval timers. We don't currently do this in an MP-safe way. Both syscalls ignore the locking assumptions documented in proc.h. The easiest way to make them MP-safe is to use setitimer(), just like the getitimer(2) and setitimer(2) syscalls do. To make things a bit cleaner I have added a helper function, cancelitimer(), so the callers don't need to fuss with an itimerval struct. While we're here we can remove the splclock/splx dance from execve(2). It is no longer necessary. ok deraadt@
*	Refactor kqueue_scan() to use a context: a "kqueue_scan_state struct".	mpi	2020-10-11	1	-1/+12
\| \| \| \| \| \| \| \| \| \|	The struct keeps track of the end point of an event queue scan by persisting the end marker. This will be needed when kqueue_scan() is called repeatedly to complete a scan in a piecewise fashion. Extracted from a previous diff from visa@. ok visa@, anton@
*	Returning a void expression is weird; ok kettenis@ daniel@	otto	2020-10-10	1	-5/+5
\|
*	Fix mistypes within sys/smr.h	mvs	2020-09-29	1	-3/+3
\| \| \| \| \| \| \|	LIST_END -> SMR_LIST_END TAILQ_END -> SMR_TAILQ_END ok visa@