summaryrefslogtreecommitdiffstats
path: root/sys/kern/sched_bsd.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Revert to using the SCHED_LOCK() to protect time accounting.mpi2019-06-011-2/+4
| | | | | | | | | It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
* Use a per-process mutex to protect time accounting instead of SCHED_LOCK().mpi2019-05-311-4/+2
| | | | | | | Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
* Do not account spinning time as running time when a thread crosses ampi2019-05-251-2/+2
| | | | | | | | | tick boundary of schedlock(). This reduces the contention on the SCHED_LOCK() when the current thread is already spinning. Prompted by deraadt@, ok visa@
* Introduce safe memory reclamation, a mechanism for reclaiming sharedvisa2019-02-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | | objects that readers can access without locking. This provides a basis for read-copy-update operations. Readers access SMR-protected shared objects inside SMR read-side critical section where sleeping is not allowed. To reclaim an SMR-protected object, the writer has to ensure mutual exclusion of other writers, remove the object's shared reference and wait until read-side references cannot exist any longer. As an alternative to waiting, the writer can schedule a callback that gets invoked when reclamation is safe. The mechanism relies on CPU quiescent states to determine when an SMR-protected object is ready for reclamation. The <sys/smr.h> header additionally provides an implementation of singly- and doubly-linked lists that can be used together with SMR. These lists allow lockless read access with a concurrent writer. Discussed with many OK mpi@ sashan@
* Stop accounting/updating priorities for Idle threads.mpi2019-01-281-1/+13
| | | | | | | | | | | | Idle threads are never placed on the runqueue so their priority doesn't matter. This fixes an accounting bug where top(1) would report a high CPU usage for Idle threads of secondary CPUs right after booting. That's because schedcpu() would give 100% CPU time to the Idle thread until "real" threads get scheduled on the corresponding CPU. Issue reported by bluhm@, ok visa@, kettenis@
* Fix unsafe use of ptsignal() in mi_switch().visa2019-01-061-19/+1
| | | | | | | | | | | | | | | | | | ptsignal() has to be called with the kernel lock held. As ensuring the locking in mi_switch() is not easy, and deferring the signaling using the task API is not possible because of lock order issues in mi_switch(), move the CPU time checking into a periodic timer where the kernel can be locked without issues. With this change, each process has a dedicated resource check timer. The timer gets activated only when a CPU time limit is set. Because the checking is not done as frequently as before, some precision is lost. Use of timers adapted from FreeBSD. OK tedu@ Reported-by: syzbot+2f5d62256e3280634623@syzkaller.appspotmail.com
* Use _kernel_lock_held() instead of __mp_lock_held(&kernel_lock).mpi2017-12-041-2/+2
| | | | ok visa@
* Convert most of the manual checks for CPU hogging to sched_pause().mpi2017-02-141-8/+2
| | | | | | | | The distinction between preempt() and yield() stays as it is usueful to know if a thread decided to yield by itself or if the kernel told him to go away. ok tedu@, guenther@
* Do no select a CPU to execute the current thread when being preempt()ed.mpi2017-02-091-2/+1
| | | | | | | | | | | Calling sched_choosecpu() at this moment often result in moving the thread to a different CPU. This does not help the scheduler and creates a domino effect, resulting in kernel thread moving to other CPUs. Tested by many without performance impact. Simon Mages measured a small performance improvement and a smaller variance with an http proxy. Discussed with kettenis@, ok martijn@, beck@, visa@
* Enable the NET_LOCK(), take 2.mpi2017-01-251-1/+3
| | | | | | Recursions are currently known and marked a XXXSMP. Please report any assert to bugs@
* Correct some comments and definitions, from Michal Mazurek.mpi2016-03-091-11/+7
|
* keep all the setperf timeout(9) handling in one place; ok tedu@naddy2015-11-081-2/+2
|
* Remove some includes include-what-you-use claims don'tjsg2015-03-141-2/+1
| | | | | | | have any direct symbols used. Tested for indirect use by compiling amd64/i386/sparc64 kernels. ok tedu@ deraadt@
* yet more mallocarray() changes.doug2014-12-131-3/+3
| | | | ok tedu@ deraadt@
* take a few more ticks to actually throttle down. hopefully helps intedu2014-11-121-2/+5
| | | | | situations where e.g. web browsing is cpu intense but intermittently idle. subject to further refinement and tuning.
* pass size argument to free()deraadt2014-11-031-2/+3
| | | | ok doug tedu
* cpu_setperf and perflevel must remain exposed, otherwise a bunch ofderaadt2014-10-171-7/+7
| | | | MD code needs excess #ifndef SMALL_KERNEL
* redo the performance throttling in the kernel.tedu2014-10-171-1/+151
| | | | | | introduce a new sysctl, hw.perfpolicy, that governs the policy. when set to anything other than manual, hw.setperf then becomes read only. phessler was heading in this direction, but this is slightly different. :)
* Track whether a process is a zombie or not yet fully built via flagsguenther2014-07-041-2/+1
| | | | | | | | | | | PS_{ZOMBIE,EMBRYO} on the process instead of peeking into the process's thread data. This eliminates the need for the thread-level SDEAD state. Change kvm_getprocs() (both the sysctl() and kvm backends) to report the "most active" scheduler state for the process's threads. tweaks kettenis@ feedback and ok matthew@
* Move from struct proc to process the reference-count-holding pointersguenther2014-05-151-2/+1
| | | | | | | | | | to the process's vmspace and filedescs. struct proc continues to keep copies of the pointers, copying them on fork, clearing them on exit, and (for vmspace) refreshing on exec. Also, make uvm_swapout_threads() thread aware, eliminating p_swtime in kernel. particular testing by ajacoutot@ and sebastia@
* Convert some internal APIs to use timespecs instead of timevalsguenther2013-06-031-10/+10
| | | | ok matthew@ deraadt@
* Use long long and %lld for printing tv_sec valuesguenther2013-06-021-3/+4
| | | | ok deraadt@
* do not include machine/cpu.h from a .c file; it is the responsibility ofderaadt2013-03-281-2/+1
| | | | | .h files to pull it in, if needed ok tedu
* Tedu old comment concerning cpu affinity which does not apply anymore.haesbaert2012-07-091-11/+2
| | | | ok blambert@ krw@ tedu@ miod@
* Make rusage totals, itimers, and profile settings per-process insteadguenther2012-03-231-6/+12
| | | | | | | of per-rthread. Handling of per-thread tick and runtime counters inspired by how FreeBSD does it. ok kettenis@
* First steps for making ptrace work with rthreads:guenther2012-02-201-2/+2
| | | | | | | | | | - move the P_TRACED and P_INEXEC flags, and p_oppid, p_ptmask, and p_ptstat member from struct proc to struct process - sort the PT_* requests into those that take a PID vs those that can also take a TID - stub in PT_GET_THREAD_FIRST and PT_GET_THREAD_NEXT ok kettenis@
* Functions used in files other than where they are defined should beguenther2011-07-071-6/+1
| | | | | | | declared in .h files, not in each .c. Apply that rule to endtsleep(), scheduler_start(), updatepri(), and realitexpire() ok deraadt@ tedu@
* Stop using the P_BIGLOCK flag to figure out when we should release theart2011-07-061-3/+5
| | | | | | | | | | | | biglock in mi_switch and just check if we're holding the biglock. The idea is that the first entry point into the kernel uses KERNEL_PROC_LOCK and recursive calls use KERNEL_LOCK. This assumption is violated in at least one place and has been causing confusion for lots of people. Initial bug report and analysis from Pedro. kettenis@ beck@ oga@ thib@ dlg@ ok
* The scheduling 'nice' value is per-process, not per-thread, so move itguenther2011-03-071-2/+3
| | | | | | into struct process. ok tedu@ deraadt@
* Add stricter asserts to DIAGNOSTIC kernels to help catch mutex andmatthew2010-09-241-1/+2
| | | | | | | | | | | | | | | | | rwlock misuse. In particular, this commit makes the following changes: 1. i386 and amd64 now count the number of active mutexes so that assertwaitok(9) can detect attempts to sleep while holding a mutex. 2. i386 and amd64 check that we actually hold mutexes when passed to mtx_leave(). 3. Calls to rw_exit*() now call rw_assert_{rd,wr}lock() as appropriate. ok krw@, oga@; "sounds good to me" deraadt@; assembly bits double checked by pirofti@
* This comment is unnecessarily confusing.art2010-06-301-2/+2
|
* Use atomic operations to access the per-cpu scheduler flags.kettenis2010-01-031-7/+6
|
* Some tweaks to the cpu affinity code.art2009-04-141-1/+3
| | | | | | | | | | | | | | - Split up choosing of cpu between fork and "normal" cases. Fork is very different and should be treated as such. - Instead of implicitly choosing a cpu in setrunqueue, do it outside where it actually makes sense. - Just because a cpu is marked as idle doesn't mean it will be soon. There could be a thundering herd effect if we call wakeup from an interrupt handler, so subtract cpus with queued processes when deciding which cpu is actually idle. - some simplifications allowed by the above. kettenis@ ok (except one bugfix that was not in the intial diff)
* Processor affinity for processes.art2009-03-231-4/+6
| | | | | | | | | | | | | | - Split up run queues so that every cpu has one. - Make setrunqueue choose the cpu where we want to make this process runnable (this should be refined and less brutal in the future). - When choosing the cpu where we want to run, make some kind of educated guess where it will be best to run (very naive right now). Other: - Set operations for sets of cpus. - load average calculations per cpu. - sched_is_idle() -> curcpu_is_idle() tested, debugged and prodded by many@
* Some paranoia and deconfusion.art2008-11-061-5/+3
| | | | | | | | | | - setrunnable should never be run on SIDL processes. That's a bug and will cause all kinds of trouble. Change the switch statement to panic if that happens. - p->p_stat == SRUN implies that p != curproc since curproc will always be SONPROC. This is a leftover from before SONPROC. deraadt@ "commit"
* Convert timeout_add() calls using multiples of hz to timeout_add_sec()blambert2008-09-101-2/+2
| | | | | | | Really just the low-hanging fruit of (hopefully) forthcoming timeout conversions. ok art@, krw@
* Add a macro that clears the want_resched flag that need_resched sets.art2008-07-181-1/+3
| | | | | | | | | | | | | | Right now when mi_switch picks up the same proc, we didn't clear the flag which would mean that every time we service an AST we would attempt a context switch. For some architectures, amd64 being probably the most extreme, that meant attempting to context switch for every trap and interrupt. Now we clear_resched explicitly after every context switch, even if it didn't do anything. Which also allows us to remove some more code in cpu_switchto (not done yet). miod@ ok
* kill 2 bogus ARGUSED and use the LIST_FOREACH() macrothib2008-05-221-4/+2
| | | | | | instead of handrolling... ok miod@
* Move the implementation of __mp_lock (biglock) into machine dependentart2007-11-261-2/+4
| | | | | | | | | | | | | | code. At this moment all architectures get the copy of the old code except i386 which gets a new shiny implementation that doesn't spin at splhigh (doh!) and doesn't try to grab the biglock when releasing the biglock (double doh!). Shaves 10% of system time during kernel compile and might solve a few bugs as a bonus. Other architectures coming shortly. miod@ deraadt@ ok
* sched_lock_idle and sched_unlock_idle are obsolete now.art2007-10-111-15/+1
|
* Make context switching much more MI:art2007-10-101-39/+29
| | | | | | | | | | | | | | | | | | | | - Move the functionality of choosing a process from cpu_switch into a much simpler function: cpu_switchto. Instead of having the locore code walk the run queues, let the MI code choose the process we want to run and only implement the context switching itself in MD code. - Let MD context switching run without worrying about spls or locks. - Instead of having the idle loop implemented with special contexts in MD code, implement one idle proc for each cpu. make the idle loop MI with MD hooks. - Change the proc lists from the old style vax queues to TAILQs. - Change the sleep queue from vax queues to TAILQs. This makes wakeup() go from O(n^2) to O(n) there will be some MD fallout, but it will be fixed shortly. There's also a few cleanups to be done after this. deraadt@, kettenis@ ok
* Widen the SCHED_LOCK in two cases to protect p_estcpu and p_priority.art2007-05-181-6/+4
| | | | kettenis@ ok
* The world of __HAVEs and __HAVE_NOTs is reducing. All architecturesart2007-05-161-78/+1
| | | | | | have cpu_info now, so kill the option. eyeballed by jsg@ and grange@
* Use atomic.h operation for manipulating p_siglist in struct proc. Solvesart2007-02-061-2/+2
| | | | | | the problem with lost signals in MP kernels. miod@, kettenis@ ok
* Kernel stack can be swapped. This means that stuff that's on the stackmiod2006-11-291-8/+3
| | | | | | | | | | should never be referenced outside the context of the process to which this stack belongs unless we do the PHOLD/PRELE dance. Loads of code doesn't follow the rules here. Instead of trying to track down all offenders and fix this hairy situation, it makes much more sense to not swap kernel stacks. From art@, tested by many some time ago.
* typos; from bret lambertjmc2006-11-151-2/+2
|
* tbert sent me a diff to change some 0 to NULLtedu2006-10-211-6/+6
| | | | | i got carried away and deleted a whole bunch of useless casts this is C, not C++. ok md5
* bret lambert sent a patch removing register. i made it ansi.tedu2006-10-091-21/+15
|
* A second approach at fixing the telnet localhost & problemniklas2005-06-171-14/+7
| | | | | | | | | | | | | | (but I tend to call it ssh localhost & now when telnetd is history). This is more localized patch, but leaves us with a recursive lock for protecting scheduling and signal state. Better care is taken to actually be symmetric over mi_switch. Also, the dolock cruft in psignal can go with this solution. Better test runs by more people for longer time has been carried out compared to the c2k5 patch. Long term the current mess with interruptible sleep, the default action on stop signals and wakeup interactions need to be revisited. ok deraadt@, art@
* sched work by niklas and art backed out; causes panicsderaadt2005-05-291-39/+42
|