summaryrefslogtreecommitdiffstats
path: root/sys/kern (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* Replace SB_KNOTE and sb_flagsintr with direct checking of klist.visa2021-01-171-7/+1
| | | | OK mpi@ as part of a larger diff
* syncer_thread: sleep without lboltcheloha2021-01-141-6/+25
| | | | | | | | | | | | | | | | | | | | | The syncer_thread() uses lbolt to perform periodic execution. We can do this without lbolt. - Adding a local wakeup(9) channel (syncer_chan) and sleep on it. - Use a local copy of getnsecuptime() to get 1/hz resolution for time measurements. This is much better than using gettime(9), which is wholly unsuitable for this use case. Measure how long we spend in the loop and use this to calculate how long to sleep until the next execution. NB: getnsecuptime() is probably ready to be moved to kern_tc.c and documented. - Using the system uptime instead of the UTC time avoids issues with time jumps. ok mpi@
* kernel, sysctl(8): remove dead variable: tickadjcheloha2021-01-131-6/+1
| | | | | | | | | | | | | | | | | | | | The global "tickadj" variable is a remnant of the old NTP adjustment code we used in the kernel before the current timecounter subsystem was imported from FreeBSD circa 2004 or 2005. Fifteen years hence it is completely vestigial and we can remove it. We probably should have removed it long ago but I guess it slipped through the cracks. FreeBSD removed it in 2002: https://cgit.freebsd.org/src/commit/?id=e1d970f1811e5e1e9c912c032acdcec6521b2a6d NetBSD and DragonflyBSD can probably remove it, too. We export tickadj via the kern.clockrate sysctl(2), so update sysctl.2 and sysctl(8) accordingly. Hypothetically this change could break someone's sysctl(8) parsing script. I don't think that's very likely. ok mvs@
* Convert mbuf type KDASSERT() to a proper KASSERT() in m_get(9).bluhm2021-01-131-3/+3
| | | | | Should prevent to use uninitialized value as bogus counter index. OK mvs@ claudio@ anton@
* New rw_obj_init() API providing reference-counted rwlock.mpi2021-01-112-2/+124
| | | | | | | Original port from NetBSD by guenther@, required for upcoming amap & anon locking. ok kettenis@
* Simplify sleep signal handling a bit by introducing sleep_signal_check().claudio2021-01-111-16/+24
| | | | | | | The common code is moved to sleep_signal_check() and instead of multiple state variables for sls_sig and sls_unwind only one sls_sigerr is set. This simplifies the checks in sleep_finish_signal() a great bit. Idea from and OK mpi@
* Split hierarchical calls into kern_sysctl_dirsgnezdo2021-01-091-42/+46
| | | | | | Removed a rash of +/-1 and made both functions shorter and more focused. OK millert@
* Reduce case duplication in kern_sysctlgnezdo2021-01-091-108/+85
| | | | | | | This changes amd64 GENERIC.MP .text size of kern_sysctl.o from 6440 to 6400. Surprisingly, RAMDISK grows from 1645 to 1678. OK millert@, mglocker@
* Enforce range with sysctl_int_bounded in sysctl_wdoggnezdo2021-01-091-3/+5
| | | | OK millert@
* Enforce range with sysctl_int_bounded in witness_sysctl_watchgnezdo2021-01-091-10/+8
| | | | | | Makes previously explicit checking less verbose. OK millert@
* Use sysctl_int_bounded in sysctl_hwsmtgnezdo2021-01-091-6/+2
| | | | | | Prefer error reporting is to silent clipping. OK millert@
* If the loop check in somove(9) goes to release without setting anbluhm2021-01-091-3/+2
| | | | | | | error, a broadcast mbuf will stay in the socket buffer forever. This is bad as multiple mbufs can use up all the space. Better report ELOOP, dissolve splicing, and let userland handle it. OK anton@
* Replace a custom linked list with SLIST.visa2021-01-091-12/+10
|
* Replace SIMPLEQ with SLIST because the code does not need a queue.visa2021-01-091-26/+24
|
* Remove unnecessary relocking of w_mtx as panic() should not return.visa2021-01-091-10/+2
|
* Lock kernel before raising SPL in klist_lock()visa2021-01-081-3/+3
| | | | | | | | | This prevents unwanted spinning with interrupts disabled. At the moment, this code is only invoked through klist_invalidate() and the callers should already hold the kernel lock. Also, one could argue that in MP-unsafe contexts klist_lock() should only assert for the kernel lock.
* Fix boot-time crash on sparc64visa2021-01-081-4/+15
| | | | | | | | | | | | On sparc64, initmsgbuf() is invoked before curcpu() is usable on the boot processor. Consequently, it is unsafe to use mutexes during the message buffer initialization. Avoid such use by skipping log_mtx when appending a newline from initmsgbuf(). Use mbp instead of msgbufp as the buffer argument to the putchar routine for consistency. Bug reported and fix suggested by miod@
* Revert "Implement select(2) and pselect(2) on top of kqueue."visa2021-01-081-148/+58
| | | | | | | | | | The use of kqueue as backend has introduced a significant regression in the performance of select(2), so go back to using the original code. Some additional management overhead is to be expected when using kqueue. However, the overhead of the current implementation is too high. Reported by bluhm@ on bugs@
* Adjust comment about klist_invalidate()visa2021-01-071-5/+8
|
* Add dt(4) TRACEPOINTs for pool_get() and pool_put(), this is simmilar to theclaudio2021-01-061-1/+6
| | | | | | ones added to malloc() and free(). Pass the struct pool pointer as argv1 since it is currently not possible to pass the pool name to btrace. OK mpi@
* pool(9): remove tickscheloha2021-01-021-11/+26
| | | | | | | | | | | | | | | | | | | | | | | | Change the pool(9) timeouts to use the system uptime instead of ticks. - Change the timeouts from variables to macros so we can use SEC_TO_NSEC(). This means these timeouts are no longer patchable via ddb(4). dlg@ does not think this will be a problem, as the timeout intervals have not changed in years. - Use low-res time to keep things fast. Add a local copy of getnsecuptime() to subr_pool.c to keep the diff small. We will need to move getnsecuptime() into kern_tc.c and document it later if we ever have other users elsewhere in the kernel. - Rename ph_tick -> ph_timestamp and pr_cache_tick -> pr_cache_timestamp. Prompted by tedu@ some time ago, but the effort stalled (may have been my fault). Input from kettenis@ and dlg@. Special thanks to mpi@ for help with struct shuffling. This change does not increase the size of struct pool_page_header or struct pool. ok dlg@ mpi@
* copyright++;jsg2021-01-011-2/+2
|
* Add trace points for malloc(9) and free(9). This makes them traceableclaudio2020-12-311-1/+7
| | | | | via dt(4) and btrace(8). OK mpi@ millert@
* Set klist lock for pipes.visa2020-12-301-5/+15
| | | | OK anton@, mpi@
* Analog to the the kern.audio.record sysctl parameter for audio(4)mglocker2020-12-281-1/+29
| | | | | | | | | | | devices, introduce kern.video.record for video(4) devices. By default kern.video.record will be set to zero, blanking all data delivered by device drivers which attach to video(4). The idea was initially proposed by Laurence Tratt <laurie AT tratt DOT net>. ok mpi@
* Use per-CPU counters for fault and stats counters reached in uvm_fault().mpi2020-12-281-1/+2
| | | | ok kettenis@, dlg@
* Simplify parameters of pselregister().visa2020-12-261-8/+5
| | | | OK mpi@
* Refactor klist insertion and removalvisa2020-12-257-28/+48
| | | | | | | | | | | | Rename klist_{insert,remove}() to klist_{insert,remove}_locked(). These functions assume that the caller has locked the klist. The current state of locking remains intact because the kernel lock is still used with all klists. Add new functions klist_insert() and klist_remove() that lock the klist internally. This allows some code simplification. OK mpi@
* Small smr_grace_wait() optimizationvisa2020-12-251-6/+26
| | | | | | | | | | | | Make the SMR thread maintain an explicit system-wide grace period and make CPUs observe the current grace period when crossing a quiescent state. This lets the SMR thread avoid a forced context switch for CPUs that have already entered the latest grace period. This change provides a small improvement in smr_grace_wait()'s performance in terms of context switching. OK mpi@, anton@
* tsleep(9): add global "nowake" channel for threads avoiding wakeup(9)cheloha2020-12-241-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It would be convenient if there were a channel a thread could sleep on to indicate they do not want any wakeup(9) broadcasts. The easiest way to do this is to add an "int nowake" to kern_synch.c and extern it in sys/systm.h. You use it like this: #include <sys/systm.h> tsleep_nsec(&nowait, ...); There is now no need to handroll a local dead channel, e.g. int chan; tsleep_nsec(&chan, ...); which expands the stack. Local dead channels will be replaced with &nowake in later patches. One possible problem with this "one global channel" approach is sleep queue congestion. If you have lots of threads sleeping on &nowake you might slow down a wakeup(9) on a different channel that hashes into the same queue. Unsure how much of problem this actually is, if at all. NetBSD and FreeBSD have a "pause" interface in the kernel that chooses a suitable channel automatically. To keep things simple and avoid adding a new interface we will start with this global channel. Discussed with mpi@, claudio@, kettenis@, and deraadt@. Basically designed by kettenis@, who vetoed my other proposals. Bugs caught by deraadt@, tb@, and patrick@.
* sigsuspend(2): change wmesg from "pause" to "sigsusp"cheloha2020-12-231-2/+2
| | | | | | | | Make it obvious where the thread is blocked. "pause" is ambiguous. Tweaked by kettenis@. Probably ok kettenis@.
* nanosleep(2): shorten wmesg from "nanosleep" to "nanoslp"cheloha2020-12-231-2/+2
| | | | | | | We only see 8 characters of wmesg in e.g. top(1), so shorten the string to fit. Indirectly prompted by kettenis@.
* Ensure that filt_dead() takes effectvisa2020-12-231-1/+2
| | | | | | | | Invoke dead_filtops' f_event callback in klist_invalidate() to ensure that filt_dead() modifies every invalidated knote. If a knote has EV_ONESHOT set in its event flags, kqueue_scan() will not call f_event. OK mpi@
* Clear error before each iteration in kqueue_scan()visa2020-12-231-1/+3
| | | | | | | This fixes a regression where kqueue_scan() may incorrectly return EWOULDBLOCK after a timeout. OK mpi@
* Implement select(2) and pselect(2) on top of kqueue.mpi2020-12-221-55/+148
| | | | | | | | | | | | | | | | The given set of fds are converted to equivalent kevents using EV_SET(2) and passed to the scanning internals of kevent(2): kqueue_scan(). ktrace(1) will now output the converted kevents on top of the usuals set bits to be able to find possible error in the convertion. This switch implies that select(2) and pselect(2) will now query the underlying kqfilters instead of the *_poll() routines. Based on similar work done on DragonFlyBSD with inputs from from visa@, millert@, anton@, cheloha@, thanks! ok visa@
* Introduce klistopsvisa2020-12-201-8/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends struct klist with a callback descriptor and an argument. The main purpose of this is to let the kqueue subsystem assert when a klist should be locked, and operate the klist lock in klist_invalidate(). Access to a knote list of a kqueue-monitored object has to be serialized somehow. Because the object often has a lock for protecting its state, and because the object often acquires this lock at the latest in its f_event callback function, it makes sense to use this lock also for the knote lists. The existing uses of NOTE_SUBMIT already show a pattern that is likely to become more prevalent. There could be an embedded lock in klist. However, such a lock would be redundant in many cases. The code cannot rely on a single lock type (mutex, rwlock, something else) because the needs of monitored objects vary. In addition, an embedded lock would introduce new lock order constraints. Note that the patch does not rule out use of dedicated klist locks. The patch introduces a way to associate lock operations with a klist. The caller can provide a custom implementation, or use a ready-made interface with a mutex or rwlock. For compatibility with old code, the new code falls back to using the kernel lock if no specific klist initialization has been done. The existing code already relies on implicit initialization of klist. Sadly, this change increases the size of struct klist. dlg@ thinks this is not fatal, though. OK mpi@
* Add fd close notification for kqueue-based poll() and select()visa2020-12-181-7/+38
| | | | | | | | | | When the file descriptor of an __EV_POLL-flagged knote is closed, post EBADF through the kqueue instance to the caller of kqueue_scan(). This lets kqueue-based poll() and select() preserve their current behaviour of returning EBADF when a polled file descriptor is closed concurrently. OK mpi@
* Make knote_{activate,remove}() internal to kern_event.c.visa2020-12-181-1/+3
| | | | OK mpi@
* Remove kqueue_free() and use KQRELE() in kqpoll_exit().visa2020-12-161-11/+6
| | | | | | | Because kqpoll instances are now linked to the file descriptor table, the freeing of kqpoll and ordinary kqueues is similar. Suggested by mpi@
* Link kqpoll instances to fd_kqlist.visa2020-12-161-10/+14
| | | | | | | This lets the system remove kqpoll-related event registrations when a file descriptor is closed. OK mpi@
* Use nkev in place of count in kqueue_scan().visa2020-12-151-7/+4
| | | | OK cheloha@, mpi@, mvs@
* Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.jan2020-12-123-13/+13
| | | | | | OK dlg@, bluhm@ No Opinion mpi@ Not against it claudio@
* Simplify filt_pipedetach()visa2020-12-111-18/+7
| | | | | | | | By storing pipe pointer in kn_hook, filt_pipedetach() does not need extra logic to find the correct pipe instance. This also lets the kernel clear the knote lists fully. OK anton@, mpi@
* Use sysctl_int_bounded for sysctl_hwsetperfgnezdo2020-12-101-13/+7
| | | | | | Removed some trailing whitespace while there. ok gkoehler@
* Add kernel-only per-thread kqueue & helpers to initialize and free it.mpi2020-12-092-3/+37
| | | | | | This will soon be used by select(2) and poll(2). ok anton@, visa@
* Convert the per-process thread list into a SMR_TAILQ.mpi2020-12-0712-39/+43
| | | | | | | Currently all iterations are done under KERNEL_LOCK() and therefor use the *_LOCKED() variant. From and ok claudio@
* Refactor kqueue_scan() so it can be used by other syscalls.mpi2020-12-071-48/+48
| | | | | | | Stop iterating in the function and instead copy the returned events to userland after every call. ok visa@
* srp_finalize(9): tsleep(9) -> tsleep_nsec(9)cheloha2020-12-061-2/+2
| | | | | | | | | | srp_finalize(9) spins until the refcount hits zero. Blocking for at least 1ms each iteration instead of blocking for at most 1 tick is sufficient. Discussed with mpi@. ok claudio@ jmatthew@
* Convert sysctl_tc to sysctl_bounded_arrgnezdo2020-12-051-7/+8
| | | | ok gkoehler@
* Prevent a TOCTOU race in single_thread_set() by extending the scope of the lock.mpi2020-12-042-13/+27
| | | | | | | Make sure `ps_single' is set only once by checking then updating it without releasing the lock. Analyzed by and ok claudio@