summaryrefslogtreecommitdiffstats
path: root/sys/kern (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* setitimer(2): realitexpire(): call getnanouptime(9) oncecheloha2020-10-131-16/+15
| | | | | | | | | | timespecadd(3) is fast. There is no need to call getnanouptime(9) repeatedly when searching for the next expiration point. Given that it_interval is at least 1/hz, we expect to run through the loop maybe hz times at most. Even at HZ=10000 that's pretty brief. While we're here, pull *all* of the other logic out of the loop. The only thing we need to do in the loop is timespecadd(3).
* Refactor kqueue_scan() to use a context: a "kqueue_scan_state struct".mpi2020-10-111-27/+51
| | | | | | | | | | The struct keeps track of the end point of an event queue scan by persisting the end marker. This will be needed when kqueue_scan() is called repeatedly to complete a scan in a piecewise fashion. Extracted from a previous diff from visa@. ok visa@, anton@
* sys_getitimer(), sys_setitimer(): style(9), misc. cleanupcheloha2020-10-071-21/+15
| | | | | | | | - Consolidate variable declarations. - Remove superfluous parentheses from return statements. - Prefer sizeof(variable) to sizeof(type) for copyin(9)/copyout(9). - Remove some intermediate pointers from sys_setitimer(). Using SCARG() directly here makes it more obvious to the reader what you're copying.
* getitimer(2), setitimer(2): ITIMER_REAL: call getnanouptime(9) oncecheloha2020-10-071-6/+5
| | | | | | | Now that the critical sections are merged we should call getnanouptime(9) once. This makes an ITIMER_REAL timer swap atomic with respect to the clock: the time remaining on the old timer is computed with the same timestamp used to schedule the new timer.
* getitimer(2), setitimer(2): merge critical sectionscheloha2020-10-071-59/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge the common code from sys_getitimer() and sys_setitimer() into a new kernel subroutine, setitimer(). setitimer() performs all of the error-free work for both system calls within a single critical section. We need a single critical section to make the setitimer(2) timer swap operation atomic relative to realitexpire() and hardclock(9). The downside of the new atomicity is that the behavior of setitimer(2) must change. With a single critical section we can no longer copyout(9) the old timer before installing the new timer. So If SCARG(uap, oitv) points to invalid memory, setitimer(2) now fail with EFAULT but the new timer will be left running. You can see this in action with code like the following: struct itv, olditv; itv.it_value.tv_sec = 1; itv.it_value.tv_usec = 0; itv.it_interval = itv.it_value; /* This should EFAULT. 0x1 is probably an invalid address. */ if (setitimer(ITIMER_REAL, &itv, (void *)0x1) == -1) warn("setitimer"); /* The timer will be running anyway. */ getitimer(ITIMER_REAL, &olditv); printf("time left: %lld.%06ld\n", olditv.it_value.tv_sec, olditv.it_value.tv_usec); There is no easy way to work around this. Both FreeBSD's and Linux's setitimer(2) implementations have a single critical section and they too fail with EFAULT in this case and leave the new timer running. I imagine their developers decided that fixing this error case was a waste of effort. Without permitting copyout(9) from within a mutex I'm not sure it is even possible to avoid it on OpenBSD without sacrificing atomicity during a setitimer(2) timer swap. Given the rarity of this error case I would rather have an atomic swap. Behavior change discussed with deraadt@.
* Document that `a_p' is always curproc by using a KASSERT().mpi2020-10-071-1/+12
| | | | | | | | | | | | One exception of this rule is VOP_CLOSE() where NULL is used instead of curproc when the garbace collector of unix sockets, that runs in a kernel thread, drops the last reference of a file. This will allows for future simplifications of the VFS interfaces. Previous version ok visa@, anton@. ok kn@
* Fix write hang-up on file system on vnd.asou2020-10-051-1/+6
| | | | ok beck@
* expose timeval/timespec from system calls into ktrace, before determiningderaadt2020-10-023-9/+36
| | | | | if they are out of range, making it easier to isolate reason for EINVAL ok cheloha
* Move the solock() call outside of solisten(). The reason is that theclaudio2020-09-292-9/+9
| | | | | | | so_state and splice checks were done without the proper lock which is incorrect. This is similar to sobind(), soconnect() which also require the callee to hold the socket lock. Found by, with and OK mvs@, OK mpi@
* Remove the PR_WAITOK flag from the ucred_pool. The pool items are smallkettenis2020-09-261-2/+2
| | | | | | | | | | enough that this pool uses the single page allocator for which PR_WAITOK is a no-op. However it presence suggests that pool_put(9) may sleep. The single page allocator will never actually do that. This makes it obvious that refreshcreds() will not sleep. ok deraadt@, visa@
* setpriority(2): don't treat booleans as scalarscheloha2020-09-251-5/+5
| | | | | | | | | | | | The variable "found" in sys_setpriority() is used as a boolean. We should set it to 1 to indicate that we found the object we were looking for instead of incrementing it. deraadt@ notes that the current code is not buggy, because OpenBSD cannot support anywhere near 2^32 processes, but agrees that incrementing the variable signals the wrong thing to the reader. ok millert@ deraadt@
* timeout(9): timeout_run(): read to_process before leaving timeout_mutexcheloha2020-09-221-2/+4
| | | | | | | | to_process is assigned during timeout_add(9) within timeout_mutex. In timeout_run() we need to read to_process before leaving timeout_mutex to ensure that the process pointer given to kcov_remote_enter(9) is the same as the one we set from timeout_add(9) when the candidate timeout was originally scheduled to run.
* Move duplicated code to send an uncatchable SIGABRT into a function.mpi2020-09-163-15/+22
| | | | ok claudio@
* put HW_PHYSMEM64 case under CTL_HW not CTL_KERNjsg2020-09-161-2/+2
| | | | Fixes previous. Problem spotted by kettenis@
* As discovered by kettenis, recent mesa wants sysctl hw.physmem64, andderaadt2020-09-161-4/+2
| | | | | | | | in pledged programs that is unfortable. My snark levels are a bit drained, but I must say I'm always dissapointed when programs operating on virtual resources enquire about total physical resource availability, the only reason to ask is so they can act unfair relative to others in the shared environment. SIGH.
* timecounting: provide a naptime variable for userspace via kvm_read(3)cheloha2020-09-161-5/+7
| | | | | | | | | | | vmstat(8) uses kvm_read(3) to extract the naptime from the kernel. Problem is, I deleted `naptime' from the global namespace when I moved it into the timehands. This patch restores it. It gets updated from tc_windup(). Only userspace should use it, and only when the kernel is dead. We need to tweak a variable in tc_setclock() to avoid shadowing the (once again) global naptime.
* add three static probes for vfs: cleaner, bufcache_take and bufcache_rel.jasper2020-09-141-2/+12
| | | | | | | while here, swap two lines in bufcache_release() to put a KASSERT() first following the pattern in bufcache_take() ok beck@ mpi@
* Unbreak tree. Instead of passing struct process to siginit() just pass theclaudio2020-09-131-3/+2
| | | | struct sigacts since that is the only thing that is modified by siginit.
* Grep the KERNEL_LOCK in ktrpsig() before calling ktrwrite(). Anotherclaudio2020-09-131-1/+3
| | | | | little step towards moving signal delivery outside of KERNEL_LOCK. OK mpi@
* Initialize sigacts0 before making them visible by setting ps->ps_sigacts.claudio2020-09-131-2/+2
| | | | OK mpi@
* Add a NULL check in bufbackoff so we don't die when passed a NULL pmem range.beck2020-09-121-2/+2
| | | | Noticed by, and based on a diff from Mike Small <smallm@sdf.org>.
* Introduce a helper to check if a signal is ignored or masked by a thread.mpi2020-09-093-11/+23
| | | | ok claudio@, pirofti@
* Remove unused sysctl_int_arr(9)gnezdo2020-09-011-15/+1
|
* Fix a race in single-thread mode switchingvisa2020-08-261-14/+15
| | | | | | | | | | | | | | | | | | | Extend the scope of SCHED_LOCK() to better synchronize single_thread_set(), single_thread_clear() and single_thread_check(). This prevents threads from suspending before single_thread_set() has finished. If a thread suspended early, ps_singlecount might get decremented too much, which in turn could make single_thread_wait() get stuck. The race could be triggered for example by trying to stop a multithreaded process with a debugger. When triggered, the race prevents the debugger from finishing a wait4(2) call on the debuggee. This kind of gdb hang was reported by Julian Smith on misc@. Unfortunately, single-thread mode switching still has issues and hangs are still possible. OK mpi@
* Remove unused debug_syncprt, improve debug sysctl handlingkn2020-08-233-12/+8
| | | | | | | | | | | | | | | | | | | "syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008. Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c the only visible difference between used and stub ctldebug structs in the debugvars[] array is their extern keyword, indicating that it is defined elsewhere. sys/sysctl.h declares all debugN members as extern upfront, but these declarations are not needed. Remove the unused debug sysctl, rename the only remaining one to something meaningful and remove forward declarations from /sys/sysctl.h; this way, adding new debug sysctls is a matter of adding extern and coming up with a name, which is nicer to read on its own and better to grep for. OK mpi
* Move sysctl(2) CTL_DEBUG from DEBUG to new DEBUG_SYSCTLkn2020-08-223-9/+9
| | | | | | | | | | | | | | | | | Adding "debug.my-knob" sysctls is really helpful to select different code paths and/or log on demand during runtime without recompile, but as this code is under DEBUG, lots of other noise comes with it which is often undesired, at least when looking at specific subsystems only. Adding globals to the kernel and breaking into DDB to change them helps, but that does not work over SSH, hence the need for debug sysctls. Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general option for all of sysctl(2). OK gnezdo
* Push KERNEL_LOCK/UNLOCK() dance inside trapsignal().mpi2020-08-191-1/+3
| | | | ok kettenis@, visa@
* Style fixups from hurried commitsgnezdo2020-08-181-4/+4
| | | | | | Thanks kettenis@ for pointing out. ok kettenis@
* Fix kn_data returned by filt_logread().visa2020-08-181-16/+21
| | | | | | | | | Take into account the circular nature of the message buffer when computing the number of available bytes. Move the computation into a separate function and use it with the kevent(2) and ioctl(2) interfaces. OK mpi@
* Remove an unnecessary field from struct msgbuf.visa2020-08-181-2/+1
| | | | OK mvs@
* Add sysctl_bounded_arr as a replacement for sysctl_int_arrgnezdo2020-08-181-2/+33
| | | | | | Design by deraadt@ ok deraadt@
* getitimer(2): delay TIMESPEC_TO_TIMEVAL(9) conversion until copyout(9)cheloha2020-08-121-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | setitimer(2) works with timespecs in its critical section. It will be easier to merge the two critical sections if getitimer(2) also works with timespecs. In particular, we currently read the uptime clock *twice* during a setitimer(2) swap: we call getmicrouptime(9) in sys_getitimer() and then call getnanouptime(9) in sys_setitimer(). This means that swapping one timer in for another is not atomic with respect to the uptime clock. It also means the two operations are working with different time structures and resolutions, which is potentially confusing. If both critical sections work with timespecs we can combine the two getnanouptime(9) calls into a single call at the start of the combined critical section in a future patch, making the swap atomic with respect to the clock. So, in preparation, move the TIMESPEC_TO_TIMEVAL conversions in getitimer(2) after the ITIMER_REAL conversion from absolute to relative time, just before copyout(9). The ITIMER_REAL conversion must then be done with timespec macros and getnanouptime(9), just like in setitimer(2).
* setitimer(2): ITIMER_REAL: don't call timeout_del(9) before timeout_add(9)cheloha2020-08-121-3/+3
| | | | | | | | | If we're replacing the current ITIMER_REAL timer with a new one we don't need to call timeout_del(9) before calling timeout_add(9). timeout_add(9) does the work of timeout_del(9) implicitly if the timeout in question is already pending. This saves us an extra trip through the timeout_mutex.
* Reduce stack usage of kqueue_scan()visa2020-08-121-8/+12
| | | | | | | | | | | | Reuse the kev[] array of sys_kevent() in kqueue_scan() to lower stack usage. The code has reset kevp, but not nkev, whenever the retry branch is taken. However, the resetting is unnecessary because retry should be taken only if no events have been collected. Make this clearer by adding KASSERTs. OK mpi@
* setitimer(2): write new timer value in one placecheloha2020-08-111-6/+9
| | | | | | | | | | | Rearrange the critical section in setitimer(2) to match that of getitimer(2). This will make it easier to merge the two critical sections in a subsequent diff. In particular, we want to write the new timer value in *one* place in the code, regardless of which timer we're setting. ok millert@
* setitimer(2): consolidate copyin(9), input validation, input conversioncheloha2020-08-111-8/+10
| | | | | | | | | | | | | | | | | | | | | For what are probably historical reasons, setitimer(2) does not validate its input (itv) immediately after copyin(9). Instead, it waits until after (possibly) performing a getitimer(2) to copy out the state of the timer. Consolidating copyin(9), input validation, and input conversion into a single block before the getitimer(2) operation makes setitimer(2) itself easier to read. It will also simplify merging the critical sections of setitimer(2) and getitimer(2) in a subsequent patch. This changes setitimer(2)'s behavior in the EINVAL case. Currently, if your input (itv) is invalid, we return EINVAL *after* modifying the output (olditv). With the patch we will now return EINVAL *before* modifying the output. However, any code dependent upon this behavior is broken: the contents of olditv are undefined in all setitimer(2) error cases. ok millert@
* getitimer(2): don't enter itimer_mtx to read ITIMER_REAL itimerspeccheloha2020-08-111-3/+6
| | | | | | | | The ITIMER_REAL per-process interval timer is protected by the kernel lock. The ITIMER_REAL timeout (ps_realit_to), setitimer(2), and getitimer(2) all run under the kernel lock. Entering itimer_mtx during getitimer(2) when reading the ITIMER_REAL ps_timer state is superfluous and misleading.
* hardclock(9): fix race with setitimer(2) for ITIMER_VIRTUAL, ITIMER_PROFcheloha2020-08-091-1/+15
| | | | | | | | | | | | | | | | | The ITIMER_VIRTUAL and ITIMER_PROF per-process interval timers are updated from hardclock(9). If a timer for the parent process is enabled the hardclock(9) thread calls itimerdecr() to update and reload it as needed. However, in itimerdecr(), after entering itimer_mtx, the thread needs to double-check that the timer in question is still enabled. While the hardclock(9) thread is entering itimer_mtx a thread in setitimer(2) can take the mutex and disable the timer. If the timer is disabled, itimerdecr() should return 1 to indicate that the timer has not expired and that no action needs to be taken. ok kettenis@
* adjtime(2): simplify input validation for new adjustmentcheloha2020-08-081-13/+9
| | | | | | | The current input validation for overflow is more complex than it needs to be. We can flatten the conditional hierarchy into a string of checks just one level deep. The result is easier to read.
* sosplice(9): fully validate idle timeoutcheloha2020-08-071-2/+3
| | | | | | | | | | | | The socket splice idle timeout is a timeval, so we need to check that tv_usec is both non-negative and less than one million. Otherwise it isn't in canonical form. We can check for this with timerisvalid(3). benno@ says this shouldn't break anything in base. ok benno@, bluhm@
* timeout(9): remove unused interfaces: timeout_add_ts(9), timeout_add_bt(9)cheloha2020-08-071-30/+1
| | | | | | | | | | These two interfaces have been entirely unused since introduction. Remove them and thin the "timeout" namespace a bit. Discussed with mpi@ and ratchov@ almost a year ago, though I blocked the change at that time. Also discussed with visa@. ok visa@, mpi@
* timeout(9): fix miscellaneous remote kcov(4) bugscheloha2020-08-061-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit v1.77 introduced remote kcov support for timeouts. We need to tweak a few things to make our support more correct: - Set to_process for barrier timeouts to the calling thread's parent process. Currently it is uninitialized, so during timeout_run() we are passing stack garbage to kcov_remote_enter(9). - Set to_process to NULL during timeout_set_flags(9). If in the future we forget to properly initialize to_process before reaching timeout_run(), we'll pass NULL to kcov_remote_enter(9). anton@ says this is harmless. I assume it is also preferable to passing stack garbage. - Save a copy of to_process on the stack in timeout_run() before calling to_func to ensure that we pass the same process pointer to kcov_remote_leave(9) upon return. The timeout may be freely modified from to_func, so to_process may have changed when we return. Tested by anton@. ok anton@
* Move range check inside sysctl_int_arrgnezdo2020-08-011-3/+3
| | | | | | | Range violations are now consistently reported as EOPNOTSUPP. Previously they were mixed with ENOPROTOOPT. OK kn@
* Add support for remote coverage to kcov. Remote coverage is collectedanton2020-08-012-2/+30
| | | | | | | | | | | | | | | | | | | | | from threads other than the one currently having kcov enabled. A thread with kcov enabled occasionally delegates work to another thread, collecting coverage from such threads improves the ability of syzkaller to correlate side effects in the kernel caused by issuing a syscall. Remote coverage is divided into subsystems. The only supported subsystem right now collects coverage from scheduled tasks and timeouts on behalf of a kcov enabled thread. In order to make this work `struct task' and `struct timeout' must be extended with a new field keeping track of the process that scheduled the task/timeout. Both aforementioned structures have therefore increased with the size of a pointer on all architectures. The kernel API is documented in a new kcov_remote_register(9) manual. Remote coverage is also supported by kcov on NetBSD and Linux. ok mpi@
* Reference unveil(2) in system accounting and daily.8.rob2020-07-261-2/+2
| | | | | | | Reminder that unveil does not kill from brynet and gsoares. Wording tweaks from jmc; feedback from deraadt. ok jmc@, millert@, solene@, "fine with me" deraadt@
* timeout(9): remove TIMEOUT_SCHEDULED flagcheloha2020-07-251-11/+16
| | | | | | | | | | | | | The TIMEOUT_SCHEDULED flag was added a few months ago to differentiate between wheel timeouts and new timeouts during softclock(). The distinction is useful when incrementing the "rescheduled" stat and the "late" stat. Now that we have an intermediate queue for new timeouts, timeout_new, we don't need the flag. The distinction between wheel timeouts and new timeouts can be made computationally. Suggested by procter@ several months ago.
* timeout(9): delay processing of timeouts added during softclock()cheloha2020-07-241-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New timeouts are appended to the timeout_todo circq via timeout_add(9). If this is done during softclock(), i.e. a timeout function calls timeout_add(9) to reschedule itself, the newly added timeout will be processed later during the same softclock(). This works, but it is not optimal: 1. If a timeout reschedules itself to run in zero ticks, i.e. timeout_add(..., 0); it will be run again during the current softclock(). This can cause an infinite loop, softlocking the primary CPU. 2. Many timeouts are cancelled before they execute. Processing a timeout during the current softclock() is "eager": if we waited, the timeout might be cancelled and we could spare ourselves the effort. If the timeout is not cancelled before the next softclock() we can bucket it as we normally would with no change in behavior. 3. Many timeouts are scheduled to run after 1 tick, i.e. timeout_add(..., 1); Processing these timeouts during the same softclock means bucketing them for no reason: they will be dumped into the timeout_todo queue during the next hardclock(9) anyway. Processing them is pointless. We can avoid these issues by using an intermediate queue, timeout_new. New timeouts are put onto this queue during timeout_add(9). The queue is concatenated to the end of the timeout_todo queue at the start of each softclock() and then softclock() proceeds. This means the amount of work done during a given softclock() is finite and we avoid doing extra work with eager processing. Any timeouts that *depend* upon being rerun during the current softclock() will need to be updated, though I doubt any such timeouts exist. Discussed with visa@ last year. No complaints after a month.
* Implement BOOT_QUIET option that supresses kernel printf output to thekettenis2020-07-241-3/+13
| | | | | | | | console. When the kernel panics, print console output is enabled such that we see those messages. Use this option for the powerpc64 boot kernel. ok visa@, deraadt@
* Make timeout_add_sec(9) add a tick if given zero secondskn2020-07-241-1/+3
| | | | | | | All other timeout_add_*() functions do so before calling timeout_add(9) as described in the manual, this one did not. OK cheloha
* pstat -t was showing bogus column data on ttys, in modes wherederaadt2020-07-221-2/+3
| | | | | | newline doesn't occur to rewind to column 0. If OPOST is inactive, simply return 0. ok millert