summaryrefslogtreecommitdiffstats
path: root/sys/kern/kern_synch.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Fix single thread behaviour in sleep_setup_signal(). If a thread needs toclaudio2020-04-061-13/+20
| | | | | | | | | | | suspend (SINGLE_SUSPEND or SINGLE_PTRACE) it needs to do this in sleep_setup_signal(). This way the case where single_thread_clear() is called before the sleep gets its wakeup call can be correctly handled and the thread is put back to sleep in sleep_finish(). If the wakeup happens before unsuspend then p_wchan is 0 and the thread will not go to sleep again. In case of a unwind an error is returned causing the thread to return immediatly with that error. With and OK mpi@ kettenis@
* Move sleep_finish_all() down to where sleep_finish() and all otherclaudio2020-03-311-17/+17
| | | | | sleep_setup/finish related functions are. OK kettenis@
* Revert Rev 1.164. Setting sls_sig to 0 uncovered a bunch of issues when itclaudio2020-03-261-2/+2
| | | | | comes to setting a process into single thread mode. It is still worng but first the interaction with single_thread_set() must be corrected.
* Prevent tsleep(9) with PCATCH from returning immediately without errorvisa2020-03-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when called during execve(2). This was a caused by initializing sls_sig with value 0 in r1.164 of kern_synch.c. Previously, tsleep(9) returned immediately with EINTR in similar circumstances. The immediate return without error can cause a system hang. For example, vwaitforio() could end up spinning if called during execve(2) because the thread did not enter sleep and other threads were not able to finish the I/O. tsleep vwaitforio nfs_flush nfs_close VOP_CLOSE vn_closefile fdrop closef fdcloseexec sys_execve Fix the issue by checking (p->p_flag & P_SUSPSINGLE) instead of (p->p_p->ps_single != NULL) in sleep_setup_signal(). The former is more selective than the latter and allows the thread that invokes execve(2) enter sleep normally. Bug report, change bisecting and testing help by Pavel Korovin OK claudio@ mpi@
* __thrsleep(2): ensure timeout is set when calling tsleep_nsec(9)cheloha2020-03-201-2/+2
| | | | | | tsleep_nsec(9) will not set a timeout if the nsecs parameter is equal to INFSLP (UINT64_MAX). We need to limit the duration to MAXTSLP (UINT64_MAX - 1) to ensure a timeout is set.
* __thrsleep(2): fix absolute timeout checkcheloha2020-03-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | An absolute timeout T elapses when the clock has reached time T, i.e. when T is less than or equal to the clock's current time. But the current code thinks T elapses only when the clock is strictly greater than T. For example, if my absolute timeout is 1.00000000, the current code will not return EWOULDBLOCK until the clock reaches 1.00000001. This is wrong: my absolute timeout elapses a nanosecond prior to that point. So the timespeccmp(3) here should be timespeccmp(tsp, &now, <=) and not timespeccmp(tsp, &now, <) as it is currently.
* Initialize sls_sig to 0 and not 1. sls_sig stores the signal number of aclaudio2020-03-131-2/+2
| | | | | | possible signal that was caught during sleep setup. It does not make sense to have a default of 1 (SIGHUP) for this. OK visa@ mpi@
* msleep() and rwsleep() allow to release the lock when going tobluhm2020-03-021-3/+5
| | | | | | | | | sleep. If sleep_setup_signal() detects that the process has been stopped, it calls mi_switch() instead of sleeping. Then the lock was not released and other processes got stuck. Move the mtx_leave() and rw_exit() before sleep_setup_signal() to prevent that a stopped process holds a short term kernel lock. input kettenis@; OK visa@ tedu@
* Split `p_priority' into `p_runpri' and `p_slppri'.mpi2020-01-301-3/+3
| | | | | | | | | | | Using different fields to remember in which runqueue or sleepqueue threads currently are will make it easier to split the SCHED_LOCK(). With this change, the (potentially boosted) sleeping priority is no longer overwriting the thread priority. This let us get rids of the logic required to synchronize `p_priority' with `p_usrpri'. Tested by many, ok visa@
* *sleep_nsec(9): log process name and pid when nsecs == 0cheloha2020-01-241-7/+13
| | | | | | | | | | | | | | We included DIAGNOSTIC in *sleep_nsec(9) when they were first committed to help us sniff out divison-to-zero bugs when converting *sleep(9) callers to the new interfaces. Recently we exposed the new interface to userland callers. This has yielded some warnings. This diff adds a process name and pid to the warnings to help determine the source of the zero-length sleeps. ok mpi@
* Import dt(4) a driver and framework for Dynamic Profiling.mpi2020-01-211-1/+6
| | | | | | | | | | | The design is fairly simple: events, in the form of descriptors on a ring, are being produced in any kernel context and being consumed by a userland process reading /dev/dt. Code and hooks are all guarded under '#if NDT > 0' so this commit shouldn't introduce any change as long as dt(4) is disable in GENERIC. ok kettenis@, visa@, jasper@, deraadt@
* Make __thrsleep(2) and __thrwakeup(2) MP-safevisa2020-01-211-31/+73
| | | | | | | | | | | | | | Threads in __thrsleep(2) are tracked using queues, one queue per each process for synchronization between threads of a process, and one system-wide queue for the special ident -1 handling. Each of these queues has an associated rwlock that serializes access. The queue lock is released when calling copyin() and copyout() in thrsleep(). This preserves the existing behaviour where a blocked copy operation does not prevent other threads from making progress. Tested by anton@, claudio@ OK anton@, claudio@, tedu@, mpi@
* Introduce wakeup_proc() a function to un-SSTOP/SSLEEP a thread.mpi2020-01-161-16/+24
| | | | | | | This moves most of the SCHED_LOCK() related to protecting the sleepqueue and its states to kern/kern_sync.c Name suggestion from jsg@, ok kettenis@, visa@
* Introduce TIMESPEC_TO_NSEC() and use it to convert userland facingmpi2020-01-141-8/+4
| | | | | | tsleep(9) to tsleep_nsec(9). ok bluhm@
* *sleep_nsec(9): sleep *at least* the given number of nanosecondscheloha2020-01-121-10/+27
| | | | | | | | | | | | | | | | | | | | | The *sleep(9) interfaces are challenging to use when one needs to sleep for a given minimum duration: the programmer needs to account for both the current tick and any integer division when converting an interval to a count of ticks. This sort of input conversion is complicated and ugly at best and error-prone at worst. This patch consolidates this conversion logic into the *sleep_nsec(9) functions themselves. This will allow us to use the functions at the syscall layer and elsewhere in the kernel where guaranteeing a minimum sleep duration is of vital importance. With input from bluhm@, guenther@, ratchov@, tedu@, and kettenis@. Requested by mpi@ and kettenis@. Conversion algorithm from mpi@. ok mpi@, kettenis@, deraadt@
* Move kernel locking inside the sleep machinery. This enables callingvisa2019-11-301-14/+37
| | | | | | | | | rwsleep(9) with PCATCH and rw_enter(9) with RW_INTR without the kernel lock. In addition, now tsleep(9) with PCATCH should be safe to use without the kernel lock if the sleep is purely time-based. Tested by anton@, cheloha@, chris@ OK anton@, cheloha@
* Check sleep timeout state only if the sleep has a timeout. Otherwise,visa2019-11-121-10/+18
| | | | | | | | | the timeout cancellation in sleep_finish_timeout() would acquire the kernel lock every time in the no-timeout case, as noticed by mpi@. This also reduces the contention of timeout_mutex. OK mpi@, feedback guenther@
* Reduce the number of places where `p_priority' and `p_stat' are set.mpi2019-10-151-5/+5
| | | | | | | | | This refactoring will help future scheduler locking, in particular to shrink the SCHED_LOCK(). No intended behavior change. ok visa@
* *sleep_nsec(9): add missing newlines to DIAGNOSTIC logscheloha2019-10-011-4/+4
|
* Stop sleeping at PUSER.mpi2019-07-101-2/+2
| | | | | | | This allows to enforce that sleeping priorities will now always be < PUSER. ok visa@, ratchov@
* Add tsleep_nsec(9), msleep_nsec(9), and rwsleep_nsec(9).cheloha2019-07-031-1/+71
| | | | | | | | | | | | | | | | | | | | | | | | Equivalent to their unsuffixed counterparts except that (a) they take a timeout in terms of nanoseconds, and (b) INFSLP, aka UINT64_MAX (not zero) indicates that a timeout should not be set. For now, zero nanoseconds is not a strictly valid invocation: we log a warning on DIAGNOSTIC kernels if we see such a call. We still sleep until the next tick in such a case, however. In the future this could become some sort of poll... TBD. To facilitate conversions to these interfaces: add inline conversion functions to sys/time.h for turning your timeout into nanoseconds. Also do a few easy conversions for warmup and to demonstrate how further conversions should be done. Lots of input from mpi@ and ratchov@. Additional input from tedu@, deraadt@, mortimer@, millert@, and claudio@. Partly inspired by FreeBSD r247787. positive feedback from deraadt@, ok mpi@
* Ensure that timeout p_sleep_to is not left running when finishing sleep.visa2019-06-181-3/+6
| | | | | | | | | | | | This is necessary when invoking sleep_finish_timeout() without the kernel lock. If not cancelled properly, an already running endtsleep() might cause a spurious wakeup on the thread if the thread re-enters a sleep queue very quickly before the handler completes. The flag P_TIMEOUT should stay cleared across the timeout cancellation. Add an assertion for that. OK mpi@
* Remove file name and line number output from witness(4)visa2019-04-231-11/+2
| | | | | | | | | | | | | Reduce code clutter by removing the file name and line number output from witness(4). Typically it is easy enough to locate offending locks using the stack traces that are shown in lock order conflict reports. Tricky cases can be tracked using sysctl kern.witness.locktrace=1 . This patch additionally removes the witness(4) wrapper for mutexes. Now each mutex implementation has to invoke the WITNESS_*() macros in order to utilize the checker. Discussed with and OK dlg@, OK mpi@
* Sprinkle a pinch of timerisvalid/timespecisvalid over the rest of sys/kerncheloha2019-01-231-2/+2
|
* Add sleep_finish_all(), which provides the common combo of sleep_finish(),guenther2018-05-311-24/+19
| | | | | | | sleep_finish_timeout(), and sleep_finish_signal() with error preferencing, and then use it in five places. ok mpi@
* rwsleep: generalize to support both read- and write-locks.cheloha2018-05-281-8/+9
| | | | | | | Wanted for tentative clock_nanosleep(2) diff, but maybe useful elsewhere in the future. ok mpi@
* Validate timespec and return ECANCELED when interrupted with SA_RESTART.pirofti2018-04-241-4/+8
| | | | | | | | | | | | | | | Discussing with mpi@ and guenther@, we decided to first fix the existing semaphore implementation with regards to SA_RESTART and POSIX compliant returns in the case where we deal with restartable signals. Currently we return EINTR everywhere which is mostly incorrect as the user can not know if she needs to recall the syscall or not. Return ECANCELED to signal that SA_RESTART was set and EINTR otherwise. Regression tests pass and so does the posixsuite. Timespec validation bits are needed to pass the later. OK mpi@, guenther@
* add code to provide simple wait condition handling.dlg2017-12-141-1/+29
| | | | | this will be used to replace the bare sleep_state handling in a bunch of places, starting with the barriers.
* Use _kernel_lock_held() instead of __mp_lock_held(&kernel_lock).mpi2017-12-041-4/+4
| | | | ok visa@
* Do not panic if we find ourself on the sleep queue while being SONPROC.mpi2017-05-181-1/+10
| | | | | | | | | | If the rwlock passed to rwsleep(9) is contented, the CPU will call wakeup() between sleep_setup() and sleep_finish(). At this moment curproc is on the sleep queue but marked as SONPROC. Avoid panicing in this case. Problem reported by sthen@ ok kettenis@, visa@
* Hook up mutex(9) to witness(4).visa2017-04-201-1/+5
|
* Hook up rwlock(9) to witness(4).visa2017-04-201-2/+8
| | | | Loosely based on a diff from Christian Ludwig
* Remove the inifioctl hack, checking for an unheld NET_LOCK() inmpi2017-01-311-11/+1
| | | | | tsleep(9) & friends seem to only produce false positives and cannot be easily disabled.
* Introduce a hack to remove false-positives when looking for memorympi2017-01-251-1/+11
| | | | | | | | allocation that can sleep while holding the NET_LOCK(). To be removed once we're confident the remaining code paths are safe. Discussed with deraadt@
* p_comm is the process's command and isn't per thread, so move it fromguenther2017-01-211-2/+2
| | | | | | struct proc to struct process. ok deraadt@ kettenis@
* Introduce rwsleep(9), an equivalent to msleep(9) but for code protectedmpi2016-09-131-1/+35
| | | | | | by a write lock. ok guenther@, vgross@
* Remove ticket lock support from thrsleep. It's unused.akfaew2016-09-031-21/+8
| | | | OK guenther@ mpi@ tedu@
* fix several places where calculating ticks could overflow.tedu2016-07-061-3/+3
| | | | | | | | it's not enough to assign to an unsigned type because if the arithmetic overflows the compiler may decide to do anything. so change all the long long casts to uint64_t so that we start with the right type. reported by Tim Newsham of NCC. ok deraadt
* switch calculuated thrsleep timeout to unsigned to prevent overflowtedu2016-07-041-2/+2
| | | | | | into negative values, which later causes a panic. reported by Tim Newsham at NCC. ok guenther
* add back $OpenBSD$jsg2016-03-291-1/+1
|
* Make sure that a thread that calls sched_yield(2) ends up on the run queuekettenis2016-03-281-2/+19
| | | | | | | | | | | | | | behind all other threads in the process by temporarily lowering its priority. This isn't optimal but it is the easiest way to guarantee that we make progress when we're waiting on an other thread to release a lock. This results in significant improvements for processes that suffer from lock contention, most notably firefox. Unfortunately this means that sched_yield(2) needs to grab the kernel lock again. All the hard work was done by mpi@, based on observations of the behaviour of the BFS scheduler diff by Michal Mazurek. ok deraadt@
* Correct some comments and definitions, from Michal Mazurek.mpi2016-03-091-3/+2
|
* add a DIAGNOSTIC for refcnt_take overflow.dlg2016-02-011-1/+8
| | | | ok mpi@
* KASSERT on refcnt underflow.dlg2016-01-151-2/+7
| | | | ok mpi@ bluhm@
* Do not include <sys/atomic.h> inside <sys/refcnt.h>.mpi2015-11-231-1/+20
| | | | | | | Prevent lazy developers, like David and I, to use atomic operations without including <sys/atomic.h>. ok dlg@
* satisfy RAMDISK by placing cold == 2 case inside #ifdef DDBderaadt2015-09-281-1/+3
|
* In low-level suspend routines, set cold=2. In tsleep(), use this toderaadt2015-09-281-1/+4
| | | | | | | | spit out a ddb trace to console. This should allow us to find suspend or resume routines which break the rules. It depends on the console output function being non-sleeping.... but that's another codepath which should try to be safe when cold is set. ok kettenis
* introduce a wrapper around reference counts called refcnt.dlg2015-09-111-1/+23
| | | | | | | | | | | | | | its basically atomic inc/dec, but it includes magical sleep code in refcnt_finalise that is better written once than many times. refcnt_finalise sleeps until all references are released and does so with sleep_setup and sleep_finalize, which is fairly subtle. putting this in now so i we can get on with work in the stack, a proper discussion about visibility and how available intrinsics should be in the kernel can happen after next week. with help from guenther@ ok guenther@ deraadt@ mpi@
* Delete ktracing of context switches: it's unused, and not particularly useful,guenther2015-09-071-11/+1
| | | | | | | and doing VOP_WRITE() from inside tsleep/msleep makes the locking too complicated, making it harder to move forward on MP changes. ok deraadt@ kettenis@
* Drop and reacquire the kernel lock in the vfs_shutdown and "cold"mikeb2015-05-121-1/+19
| | | | | | | portions of msleep and tsleep to give interrupts a chance to run on other CPUs. Tweak and OK kettenis