wireguard-openbsd - WireGuard implementation for the OpenBSD kernel

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add refcnt_take_if_gt()	Matt Dunwoodie	2021-04-13	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function (or of similar nature) is required to safely use a refcnt and smr_entry together. Such functions exist on other platforms as kref_get_unless_zero (on Linux) and refcount_acquire_if_gt (on FreeBSD). The following diagram details the following situation with and without refcnt_take_if_gt in 3 cases, with the first showing the "invalid" use of refcnt_take. Situation: Thread #1 is removing the global referenc (o). Thread #2 wants to reference an object (r), using a thread pointer (t). Case: 1) refcnt_take after Thread #1 has released "o" 2) refcnt_take_if_gt before Thread #1 has released "o" 3) refcnt_take_if_gt after Thread #1 has released "o" Data: struct obj { struct smr_entry smr; struct refcnt refcnt; } o, r, t1, t2; Thread #1 \| Thread #2 ---------------------------------+------------------------------------ \| r = NULL; rw_enter_write(&lock); \| smr_read_enter(); \| t1 = SMR_PTR_GET_LOCKED(&o); \| t2 = SMR_PTR_GET(&o); SMR_PTR_SET_LOCKED(&o, NULL); \| \| if (refcnt_rele(&t1->refcnt) \| smr_call(&t1->smr, free, t1); \| \| if (t2 != NULL) { \| refcnt_take(&t2->refcnt); \| r = t2; \| } rw_exit_write(&lock); \| smr_read_exit(); ..... // called by smr_thread \| free(t1); \| ..... \| // use after free \| *r ---------------------------------+------------------------------------ \| r = NULL; rw_enter_write(&lock); \| smr_read_enter(); \| t1 = SMR_PTR_GET_LOCKED(&o); \| t2 = SMR_PTR_GET(&o); SMR_PTR_SET_LOCKED(&o, NULL); \| \| if (refcnt_rele(&t1->refcnt) \| smr_call(&t1->smr, free, t1); \| \| if (t2 != NULL && \| refcnt_take_if_gt(&t2->refcnt, 0)) \| r = t2; rw_exit_write(&lock); \| smr_read_exit(); ..... // called by smr_thread \| // we don't have a valid reference free(t1); \| assert(r == NULL); ---------------------------------+------------------------------------ \| r = NULL; rw_enter_write(&lock); \| smr_read_enter(); \| t1 = SMR_PTR_GET_LOCKED(&o); \| t2 = SMR_PTR_GET(&o); SMR_PTR_SET_LOCKED(&o, NULL); \| \| if (t2 != NULL && \| refcnt_take_if_gt(&t2->refcnt, 0)) \| r = t2; if (refcnt_rele(&t1->refcnt) \| smr_call(&t1->smr, free, t1); \| rw_exit_write(&lock); \| smr_read_exit(); ..... \| // we need to put our reference \| if (refcnt_rele(&t2->refcnt)) \| smr_call(&t2->smr, free, t2); ..... // called by smr_thread \| free(t1); \| ---------------------------------+------------------------------------ Currently it uses atomic_add_int_nv to atomically read the refcnt, but I'm open to suggestions for better ways. The atomic_cas_uint is used to ensure that refcnt hasn't been modified since reading `old`.
*	Permit kern.somaxconn when the unix pledge is used. Previously this was only	abieber	2021-03-25	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	allowed when inet was used. This lets Go programs use 'unix' without also including 'inet'. from Josh Rickmar ok / tree review from deraadt@, commit message cluestick from tb@
*	Make a child execute fork_return() only if PTRACE_FORK has been specified.	mpi	2021-03-23	1	-3/+6
\| \| \| \| \| \| \| \| \| \|	fork_return() does an additional check to send a SIGTRAP (for a debugger) but this signal might overwrite the SIGSTOP generated by the parent doing a PT_ATTACH before the child has a change to execute any instruction. Prevent a race visible only on SP system with regress/sys/kern/ptrace2. ok kettenis@
*	Skip first frame when saving stacktraces, it's always witness_checkorder().	mpi	2021-03-23	1	-3/+3
\| \| \| \|	ok visa@
*	makes `struct execsw' to:	semarie	2021-03-21	1	-4/+10
\| \| \| \| \| \| \|	- use C99-style initialization (grep works better with that) - use const as execsw is not modified during runtime ok mpi@
*	Use uppercases for defines.	mpi	2021-03-21	1	-6/+6
\| \| \| \| \| \|	No functional change. ok semarie@
*	namei: reorganize a bit the error path for simples cases	semarie	2021-03-20	1	-23/+17
\| \| \| \| \| \| \| \| \| \|	- move 'fail' label to end of function (instead of using the first if-condition) - merge the most simples error code paths idioms from 'cleanup+return' to 'goto-fail' ok mpi@
*	regen	mvs	2021-03-18	2	-5/+5
\|
*	Unlock sendsyslog(2). Console output still requires kernel lock to be	mvs	2021-03-18	2	-41/+58
\| \| \| \| \| \| \| \|	held but this path is only followed while `syslogf' socket is not set. New `syslogf_rwlock' used to protect `syslogf' access. ok bluhm@
*	handle theoretical case of sigfillsz not being pow2-sized on some	deraadt	2021-03-16	1	-4/+8
\| \| \| \| \|	architecture. from miod
*	Kill SINGLE_PTRACE and use SINGLE_SUSPEND which has almost the same semantic	mpi	2021-03-12	3	-15/+11
\| \| \| \| \| \| \| \|	single_thread_set() is modified to explicitly indicated when waiting until sibling threads are parked is required. This is obviously not required if a traced thread is switching away from a CPU after handling a STOP signal. ok claudio@
*	The ktrace record for recvmsg/recvfrom could contain extract bits in	deraadt	2021-03-10	1	-3/+10
\| \| \| \| \|	msg_flags (they get set internally). Correct the record to only contain what the caller requested.
*	spelling	jsg	2021-03-10	10	-25/+25
\| \| \| \|	ok gnezdo@ semarie@ mpi@
*	Early daemons like dhcpleased(8), slaacd(8), unwind(8), resolvd(8)	bluhm	2021-03-09	1	-28/+154
\| \| \| \| \| \| \| \| \| \| \| \| \|	are started before syslogd(8). This resulted in ugly sendsyslog(2) dropped logs and the real message was lost. Create a temporary stash for log messages within the kernel. It has a limited size of 100 messages, and each message is truncated to 8192 bytes. When the stash is exhausted, the well-known dropped message is generated with a counter. After syslogd(8) has setup everything, it sends a debug line through libc to flush the kernel stash. Then syslogd receives all messages from the kernel before the usual logs. OK deraadt@ visa@
*	Revert commitid: AZrsCSWEYDm7XWuv;	claudio	2021-03-08	3	-10/+14
\| \| \| \| \| \|	Kill SINGLE_PTRACE and use SINGLE_SUSPEND which has almost the same semantic. This diff did not properly kill SINGLE_PTRACE and broke RAMDISK kernels.
*	Move a KERNEL_ASSERT_LOCKED() from single_thread_clear() to cursig().	mpi	2021-03-08	1	-6/+3
\| \| \| \| \| \| \| \|	Ze big lock is currently necessary to ensure that two sibling threads are not racing against each other when processing signals. However it is not strickly necessary to unpark sibling threads. ok claudio@
*	Kill SINGLE_PTRACE and use SINGLE_SUSPEND which has almost the same semantic.	mpi	2021-03-08	3	-14/+10
\| \| \| \| \| \| \| \|	single_thread_set() is modified to explicitly indicated when waiting until sibling threads are parked is required. This is obviously not required if a traced thread is switching away from a CPU after handling a STOP signal. ok claudio@
*	Remove the workaround which identified Go executables, and permitted them	deraadt	2021-03-08	1	-7/+2
\| \| \| \| \| \| \| \|	to do syscalls directly. Go executables now use shared libc like all other dynamic binaries. This makes the "where are syscalls done from" checker strict for all binaries, and also opens the door to change the underlying syscall ABI to the kernel in the future very easily (if we find cause). ok jsing
*	ansi	jsg	2021-03-06	2	-4/+4
\|
*	Merge issignal() and CURSIG() in preparation for turning it mp-safe.	mpi	2021-03-04	2	-14/+22
\| \| \| \| \| \|	This makes appear some redundant & racy checks. ok semarie@
*	Replace stray direct call of f_event with filter_event().	visa	2021-02-27	1	-2/+2
\| \| \| \| \|	This does not change the current behaviour, but filterops should be invoked through filter_*() for consistency.
*	let m_copydata use a void * instead of caddr_t	dlg	2021-02-25	1	-2/+3
\| \| \| \| \| \| \|	i'm not a fan of having to cast to caddr_t when we have modern inventions like void *s we can take advantage of. ok claudio@ mvs@ bluhm@
*	kqueue: Revise filterops interface	visa	2021-02-24	1	-41/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend kqueue's filterops interface with new callbacks so that it becomes easier to use with fine-grained locking. The new interface delegates the serialization of kn_event access to event sources. Now kqueue uses filterops callbacks to read or write kn_event. This hides event sources' locking patterns from kqueue, and allows clean implementation of atomic read-and-clear for EV_CLEAR, for instance. There are so many existing filterops instances that converting all of them in one go is tricky. This patch adds a wrapper mechanism that kqueue uses when the new callbacks are missing. The new filterops interface has been influenced by XNU's kqueue. OK mpi@ semarie@
*	In sorflush() use m_purge() instead of handrolling it.	bluhm	2021-02-24	1	-3/+2
\| \| \| \|	no objections mvs@
*	remove unused malloc_roundup()	jsg	2021-02-23	1	-13/+1
\|
*	timecounting: use C99-style initialization for all timecounter structs	cheloha	2021-02-23	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The timecounter struct is large and I think it may change in the future. Changing it later will be easier if we use C99-style initialization for all timecounter structs. It also makes reading the code a bit easier. For reasons I cannot explain, switching to C99-style initialization sometimes changes the hash of the resulting object file, even though the resulting struct should be the same. So there is a binary change here, but only sometimes. No behavior should change in either case. I can't compile-test this everywhere but I have been staring at the diff for days now and I'm relatively confident this will not break compilation. Fingers crossed. ok gnezdo@
*	Move UNIX socket's garbage collector to `systqmp'. It touches nothing	mvs	2021-02-22	1	-3/+3
\| \| \| \| \| \|	which requires kernel lock to be held. ok mpi@
*	Release mbuf(9) chain with a simple m_freem(9) loop in sorflush().	mvs	2021-02-18	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \|	Passing local copy of socket to sbrelease() is too complicated to just free receive buffer. We don't allocate large object on the stack. Also we don't pass unlocked socket to soassertlocked() within sbdrop(). This was not triggered because we lock the whole layer with one lock. Also sorflush() is now private to kern/uipc_socket.c, so it's definition was made to be in accordance. ok claudio@ mpi@
*	Move single_thread_set() out of KERNEL_LOCK().	mpi	2021-02-15	3	-10/+8
\| \| \| \| \| \| \|	Use the SCHED_LOCK() to ensure `ps_thread' isn't being modified by a sibling when entering tsleep(9) w/o KERNEL_LOCK(). ok visa@
*	sbdrop(): use NULL instead of 0 in pointer assignment	mvs	2021-02-11	1	-2/+2
\| \| \| \|	ok bluhm@
*	"proc: table is full" actually means thread table is full; ok mpi@ sthen@	otto	2021-02-11	1	-2/+2
\|
*	In the various open functions reduce the fdplock() to only span over the	claudio	2021-02-11	1	-16/+27
\| \| \| \| \| \| \|	function which need the lock (falloc, fdinsert, fdremove). In most cases it is not correct to hold the lock while calling VFS functions or e.g. closef since those aquire or release long lived VFS locks. OK visa@ mvs@
*	Move UNIX domain sockets out of kernel lock. The new `unp_lock' rwlock(9)	mvs	2021-02-10	2	-49/+171
\| \| \| \| \| \| \| \|	used as solock()'s backend to protect the whole layer. With feedback from mpi@. ok bluhm@ claudio@
*	Revert the convertion of per-process thread into a SMR_TAILQ.	mpi	2021-02-08	12	-43/+39
\| \| \| \| \|	We did not reach a consensus about using SMR to unlock single_thread_set() so there's no point in keeping this change.
*	Do not hold onto the fdplock longer then needed. Release the lock after	claudio	2021-02-08	1	-6/+9
\| \| \| \| \| \| \|	the initial falloc() calls and then regrab it for the fdinsert() or fdremove() calls respectiviely. Also move closef() outside of the lock. This replaces the previously reverted lock order change that was reverted. OK mvs@ visa@
*	Simplify sleep_setup API to two operations in preparation for splitting	mpi	2021-02-08	6	-135/+80
\| \| \| \| \| \| \| \| \| \| \| \|	the SCHED_LOCK(). Putting a thread on a sleep queue is reduce to the following: sleep_setup(); /* check condition or release lock */ sleep_finish(); Previous version ok cheloha@, jmatthew@, ok claudio@
*	Revert previous commit. The vnode returned by ptm_vn_open() is open and	claudio	2021-02-04	1	-33/+28
\| \| \| \| \| \| \|	can not simply be vrele()-ed on error. The code currently depends on closef() to do the cleanup. Reported-by: syzbot+b0e18235e96adf81883d@syzkaller.appspotmail.com
*	Prevent a lock order issue by shuffling code around. Instead of allocating	claudio	2021-02-04	1	-28/+33
\| \| \| \| \| \|	the file descriptors early do it late. This way the fdplock is not held during the VFS operations. OK mvs@
*	Add SIOCAIFADDR_IN and SIOCDIFADDR_IN to the wroute pledge	tobhe	2021-02-03	1	-1/+3
\| \| \| \| \| \| \| \|	to allow setting and removing IPv4 addresses. Needed for future iked(8) improvements. Discussed with sthen@ and florian@ ok bluhm@ deraadt@
*	Use NULL instead of 0 to clear v_socket pointer (which actually clears all	claudio	2021-01-29	1	-2/+2
\| \| \| \| \|	of the v_un pointers). OK jsg@ mvs@
*	Whitespace.	rob	2021-01-29	1	-3/+2
\|
*	Show when witness(4) has run out of lock order data entries.	visa	2021-01-28	1	-2/+14
\| \| \| \| \| \|	This makes it clearer why lock order traces are sometimes not displayed. Prompted by a question from, and OK anton@
*	kqueue: Fix termination assert	visa	2021-01-27	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \|	When a kqueue file is closed, the kqueue can still have threads scanning it. Consequently, kqueue_terminate() can see scan markers in the event queue. These markers are removed when the scanning threads leave the kqueue. Take this into account when checking the queue's state, to avoid a panic when kqueue is closed from under a thread. OK anton@ Reported-by: syzbot+757c60a2aa1125137cce@syzkaller.appspotmail.com
*	If pledge "wroute" is missing for setsockopt SO_RTABLE, print failure	bluhm	2021-01-20	1	-2/+2
\| \| \| \| \| \|	message "wroute" into dmesg. Since revision 1.263 pledge "wroute" allows to change the routing table of a socket. OK florian@ semarie@
*	kern/subr_disk.c: convert ifunit() to if_unit(9)	mvs	2021-01-19	1	-3/+5
\| \| \| \|	ok dlg@
*	/etc/malloc.conf path-approval in pledge is no longer needed since 6.5	deraadt	2021-01-19	1	-9/+1
\| \| \| \| \|	moved option control into a sysctl. reminder that we can delete this from benjamin baier
*	regen	mvs	2021-01-18	2	-5/+5
\|
*	Unlock getppid(2).	mvs	2021-01-18	1	-2/+2
\| \| \| \|	ok mpi@
*	Cache parent's pid as `ps_ppid' and use it instead of `ps_pptr->ps_pid'.	mvs	2021-01-17	5	-8/+10
\| \| \| \| \| \|	This allows us to unlock getppid(2). ok mpi@
*	kqueue: Revise fd close notification	visa	2021-01-17	1	-30/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deliver file descriptor close notification for __EV_POLL knotes through struct kevent that kqueue_scan() returns. This replaces the previous way of returning EBADF from kqueue_scan(), making it easier to determine what exactly has changed. When a file descriptor is closed, its __EV_POLL knotes are turned into one-shot events and queued for delivery. These knotes are "unregistered" as they are reachable only through the queue of active events. This reduces interference with the normal workings of kqueue. However, more care is needed to avoid leaking knotes. In addition, the unregistering removes a limit on the number of issued knotes. To prevent accumulation of pending fd close notifications, kqpoll_init() flushes the active queue at the start of a kqpoll scan. OK mpi@