summaryrefslogtreecommitdiffstats
path: root/sys/kern (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Release mbuf(9) chain with a simple m_freem(9) loop in sorflush().mvs2021-02-181-6/+7
| | | | | | | | | | | | Passing local copy of socket to sbrelease() is too complicated to just free receive buffer. We don't allocate large object on the stack. Also we don't pass unlocked socket to soassertlocked() within sbdrop(). This was not triggered because we lock the whole layer with one lock. Also sorflush() is now private to kern/uipc_socket.c, so it's definition was made to be in accordance. ok claudio@ mpi@
* Move single_thread_set() out of KERNEL_LOCK().mpi2021-02-153-10/+8
| | | | | | | Use the SCHED_LOCK() to ensure `ps_thread' isn't being modified by a sibling when entering tsleep(9) w/o KERNEL_LOCK(). ok visa@
* sbdrop(): use NULL instead of 0 in pointer assignmentmvs2021-02-111-2/+2
| | | | ok bluhm@
* "proc: table is full" actually means thread table is full; ok mpi@ sthen@otto2021-02-111-2/+2
|
* In the various open functions reduce the fdplock() to only span over theclaudio2021-02-111-16/+27
| | | | | | | function which need the lock (falloc, fdinsert, fdremove). In most cases it is not correct to hold the lock while calling VFS functions or e.g. closef since those aquire or release long lived VFS locks. OK visa@ mvs@
* Move UNIX domain sockets out of kernel lock. The new `unp_lock' rwlock(9)mvs2021-02-102-49/+171
| | | | | | | | used as solock()'s backend to protect the whole layer. With feedback from mpi@. ok bluhm@ claudio@
* Revert the convertion of per-process thread into a SMR_TAILQ.mpi2021-02-0812-43/+39
| | | | | We did not reach a consensus about using SMR to unlock single_thread_set() so there's no point in keeping this change.
* Do not hold onto the fdplock longer then needed. Release the lock afterclaudio2021-02-081-6/+9
| | | | | | | the initial falloc() calls and then regrab it for the fdinsert() or fdremove() calls respectiviely. Also move closef() outside of the lock. This replaces the previously reverted lock order change that was reverted. OK mvs@ visa@
* Simplify sleep_setup API to two operations in preparation for splittingmpi2021-02-086-135/+80
| | | | | | | | | | | | the SCHED_LOCK(). Putting a thread on a sleep queue is reduce to the following: sleep_setup(); /* check condition or release lock */ sleep_finish(); Previous version ok cheloha@, jmatthew@, ok claudio@
* Revert previous commit. The vnode returned by ptm_vn_open() is open andclaudio2021-02-041-33/+28
| | | | | | | can not simply be vrele()-ed on error. The code currently depends on closef() to do the cleanup. Reported-by: syzbot+b0e18235e96adf81883d@syzkaller.appspotmail.com
* Prevent a lock order issue by shuffling code around. Instead of allocatingclaudio2021-02-041-28/+33
| | | | | | the file descriptors early do it late. This way the fdplock is not held during the VFS operations. OK mvs@
* Add SIOCAIFADDR_IN and SIOCDIFADDR_IN to the wroute pledgetobhe2021-02-031-1/+3
| | | | | | | | to allow setting and removing IPv4 addresses. Needed for future iked(8) improvements. Discussed with sthen@ and florian@ ok bluhm@ deraadt@
* Use NULL instead of 0 to clear v_socket pointer (which actually clears allclaudio2021-01-291-2/+2
| | | | | of the v_un pointers). OK jsg@ mvs@
* Whitespace.rob2021-01-291-3/+2
|
* Show when witness(4) has run out of lock order data entries.visa2021-01-281-2/+14
| | | | | | This makes it clearer why lock order traces are sometimes not displayed. Prompted by a question from, and OK anton@
* kqueue: Fix termination assertvisa2021-01-271-2/+12
| | | | | | | | | | | | When a kqueue file is closed, the kqueue can still have threads scanning it. Consequently, kqueue_terminate() can see scan markers in the event queue. These markers are removed when the scanning threads leave the kqueue. Take this into account when checking the queue's state, to avoid a panic when kqueue is closed from under a thread. OK anton@ Reported-by: syzbot+757c60a2aa1125137cce@syzkaller.appspotmail.com
* If pledge "wroute" is missing for setsockopt SO_RTABLE, print failurebluhm2021-01-201-2/+2
| | | | | | message "wroute" into dmesg. Since revision 1.263 pledge "wroute" allows to change the routing table of a socket. OK florian@ semarie@
* kern/subr_disk.c: convert ifunit() to if_unit(9)mvs2021-01-191-3/+5
| | | | ok dlg@
* /etc/malloc.conf path-approval in pledge is no longer needed since 6.5deraadt2021-01-191-9/+1
| | | | | moved option control into a sysctl. reminder that we can delete this from benjamin baier
* regenmvs2021-01-182-5/+5
|
* Unlock getppid(2).mvs2021-01-181-2/+2
| | | | ok mpi@
* Cache parent's pid as `ps_ppid' and use it instead of `ps_pptr->ps_pid'.mvs2021-01-175-8/+10
| | | | | | This allows us to unlock getppid(2). ok mpi@
* kqueue: Revise fd close notificationvisa2021-01-171-30/+117
| | | | | | | | | | | | | | | | | | Deliver file descriptor close notification for __EV_POLL knotes through struct kevent that kqueue_scan() returns. This replaces the previous way of returning EBADF from kqueue_scan(), making it easier to determine what exactly has changed. When a file descriptor is closed, its __EV_POLL knotes are turned into one-shot events and queued for delivery. These knotes are "unregistered" as they are reachable only through the queue of active events. This reduces interference with the normal workings of kqueue. However, more care is needed to avoid leaking knotes. In addition, the unregistering removes a limit on the number of issued knotes. To prevent accumulation of pending fd close notifications, kqpoll_init() flushes the active queue at the start of a kqpoll scan. OK mpi@
* Replace SB_KNOTE and sb_flagsintr with direct checking of klist.visa2021-01-171-7/+1
| | | | OK mpi@ as part of a larger diff
* syncer_thread: sleep without lboltcheloha2021-01-141-6/+25
| | | | | | | | | | | | | | | | | | | | | The syncer_thread() uses lbolt to perform periodic execution. We can do this without lbolt. - Adding a local wakeup(9) channel (syncer_chan) and sleep on it. - Use a local copy of getnsecuptime() to get 1/hz resolution for time measurements. This is much better than using gettime(9), which is wholly unsuitable for this use case. Measure how long we spend in the loop and use this to calculate how long to sleep until the next execution. NB: getnsecuptime() is probably ready to be moved to kern_tc.c and documented. - Using the system uptime instead of the UTC time avoids issues with time jumps. ok mpi@
* kernel, sysctl(8): remove dead variable: tickadjcheloha2021-01-131-6/+1
| | | | | | | | | | | | | | | | | | | | The global "tickadj" variable is a remnant of the old NTP adjustment code we used in the kernel before the current timecounter subsystem was imported from FreeBSD circa 2004 or 2005. Fifteen years hence it is completely vestigial and we can remove it. We probably should have removed it long ago but I guess it slipped through the cracks. FreeBSD removed it in 2002: https://cgit.freebsd.org/src/commit/?id=e1d970f1811e5e1e9c912c032acdcec6521b2a6d NetBSD and DragonflyBSD can probably remove it, too. We export tickadj via the kern.clockrate sysctl(2), so update sysctl.2 and sysctl(8) accordingly. Hypothetically this change could break someone's sysctl(8) parsing script. I don't think that's very likely. ok mvs@
* Convert mbuf type KDASSERT() to a proper KASSERT() in m_get(9).bluhm2021-01-131-3/+3
| | | | | Should prevent to use uninitialized value as bogus counter index. OK mvs@ claudio@ anton@
* New rw_obj_init() API providing reference-counted rwlock.mpi2021-01-112-2/+124
| | | | | | | Original port from NetBSD by guenther@, required for upcoming amap & anon locking. ok kettenis@
* Simplify sleep signal handling a bit by introducing sleep_signal_check().claudio2021-01-111-16/+24
| | | | | | | The common code is moved to sleep_signal_check() and instead of multiple state variables for sls_sig and sls_unwind only one sls_sigerr is set. This simplifies the checks in sleep_finish_signal() a great bit. Idea from and OK mpi@
* Split hierarchical calls into kern_sysctl_dirsgnezdo2021-01-091-42/+46
| | | | | | Removed a rash of +/-1 and made both functions shorter and more focused. OK millert@
* Reduce case duplication in kern_sysctlgnezdo2021-01-091-108/+85
| | | | | | | This changes amd64 GENERIC.MP .text size of kern_sysctl.o from 6440 to 6400. Surprisingly, RAMDISK grows from 1645 to 1678. OK millert@, mglocker@
* Enforce range with sysctl_int_bounded in sysctl_wdoggnezdo2021-01-091-3/+5
| | | | OK millert@
* Enforce range with sysctl_int_bounded in witness_sysctl_watchgnezdo2021-01-091-10/+8
| | | | | | Makes previously explicit checking less verbose. OK millert@
* Use sysctl_int_bounded in sysctl_hwsmtgnezdo2021-01-091-6/+2
| | | | | | Prefer error reporting is to silent clipping. OK millert@
* If the loop check in somove(9) goes to release without setting anbluhm2021-01-091-3/+2
| | | | | | | error, a broadcast mbuf will stay in the socket buffer forever. This is bad as multiple mbufs can use up all the space. Better report ELOOP, dissolve splicing, and let userland handle it. OK anton@
* Replace a custom linked list with SLIST.visa2021-01-091-12/+10
|
* Replace SIMPLEQ with SLIST because the code does not need a queue.visa2021-01-091-26/+24
|
* Remove unnecessary relocking of w_mtx as panic() should not return.visa2021-01-091-10/+2
|
* Lock kernel before raising SPL in klist_lock()visa2021-01-081-3/+3
| | | | | | | | | This prevents unwanted spinning with interrupts disabled. At the moment, this code is only invoked through klist_invalidate() and the callers should already hold the kernel lock. Also, one could argue that in MP-unsafe contexts klist_lock() should only assert for the kernel lock.
* Fix boot-time crash on sparc64visa2021-01-081-4/+15
| | | | | | | | | | | | On sparc64, initmsgbuf() is invoked before curcpu() is usable on the boot processor. Consequently, it is unsafe to use mutexes during the message buffer initialization. Avoid such use by skipping log_mtx when appending a newline from initmsgbuf(). Use mbp instead of msgbufp as the buffer argument to the putchar routine for consistency. Bug reported and fix suggested by miod@
* Revert "Implement select(2) and pselect(2) on top of kqueue."visa2021-01-081-148/+58
| | | | | | | | | | The use of kqueue as backend has introduced a significant regression in the performance of select(2), so go back to using the original code. Some additional management overhead is to be expected when using kqueue. However, the overhead of the current implementation is too high. Reported by bluhm@ on bugs@
* Adjust comment about klist_invalidate()visa2021-01-071-5/+8
|
* Add dt(4) TRACEPOINTs for pool_get() and pool_put(), this is simmilar to theclaudio2021-01-061-1/+6
| | | | | | ones added to malloc() and free(). Pass the struct pool pointer as argv1 since it is currently not possible to pass the pool name to btrace. OK mpi@
* pool(9): remove tickscheloha2021-01-021-11/+26
| | | | | | | | | | | | | | | | | | | | | | | | Change the pool(9) timeouts to use the system uptime instead of ticks. - Change the timeouts from variables to macros so we can use SEC_TO_NSEC(). This means these timeouts are no longer patchable via ddb(4). dlg@ does not think this will be a problem, as the timeout intervals have not changed in years. - Use low-res time to keep things fast. Add a local copy of getnsecuptime() to subr_pool.c to keep the diff small. We will need to move getnsecuptime() into kern_tc.c and document it later if we ever have other users elsewhere in the kernel. - Rename ph_tick -> ph_timestamp and pr_cache_tick -> pr_cache_timestamp. Prompted by tedu@ some time ago, but the effort stalled (may have been my fault). Input from kettenis@ and dlg@. Special thanks to mpi@ for help with struct shuffling. This change does not increase the size of struct pool_page_header or struct pool. ok dlg@ mpi@
* copyright++;jsg2021-01-011-2/+2
|
* Add trace points for malloc(9) and free(9). This makes them traceableclaudio2020-12-311-1/+7
| | | | | via dt(4) and btrace(8). OK mpi@ millert@
* Set klist lock for pipes.visa2020-12-301-5/+15
| | | | OK anton@, mpi@
* Analog to the the kern.audio.record sysctl parameter for audio(4)mglocker2020-12-281-1/+29
| | | | | | | | | | | devices, introduce kern.video.record for video(4) devices. By default kern.video.record will be set to zero, blanking all data delivered by device drivers which attach to video(4). The idea was initially proposed by Laurence Tratt <laurie AT tratt DOT net>. ok mpi@
* Use per-CPU counters for fault and stats counters reached in uvm_fault().mpi2020-12-281-1/+2
| | | | ok kettenis@, dlg@
* Simplify parameters of pselregister().visa2020-12-261-8/+5
| | | | OK mpi@