aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2008-01-28kernel: add CLONE_IO to specifically request sharing of IO contextsJens Axboe1-4/+10
syslets (or other threads/processes that want io context sharing) can set this to enforce sharing of io context. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28io context sharing: preliminary supportJens Axboe1-1/+0
Detach task state from ioc, instead keep track of how many processes are accessing the ioc. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28ioprio: move io priority from task_struct to io_contextJens Axboe1-5/+27
This is where it belongs and then it doesn't take up space for a process that doesn't do IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-27printk: revert ktime_get() timestampsIngo Molnar1-3/+1
revert 19ef9309273d26cb005cb23e6a370353dca91099. Kevin Winchester reported a lockup during X startup an bisected it to this commit. Reported-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-26[S390] Remove appldata include from sysctl_check.cHeiko Carstens1-1/+0
Forgot to remove this when removing the appldata binary sysctls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-01-25sched: keep total / count stats in addition to the max forArjan van de Ven2-0/+7
Right now, the linux kernel (with scheduler statistics enabled) keeps track of the maximum time a process is waiting to be scheduled. While the maximum is a very useful metric, tracking average and total is equally useful (at least for latencytop) to figure out the accumulated effect of scheduler delays. The accumulated effect is important to judge the performance impact of scheduler tuning/behavior. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: fix: don't take a mutex from interrupt contextPeter Zijlstra1-2/+2
print_cfs_stats is callable from interrupt context (sysrq), hence it should not take mutexes. Change it to use RCU since the task group data is RCU freed anyway. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: print backtrace of running tasks tooNick Piggin1-2/+1
The attached patch is something really simple that can sometimes help in getting more info out of a hung system. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25printk: use ktime_get()Ingo Molnar1-1/+3
printk timestamps: use ktime_get(). Some platforms have a functioning clocksource function only after they are done with early bootup, so delay this until out of SYSTEM_BOOTING state. it's also inherently safe now, as any bugs in this area will be caught by the printk recursion checks. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25softlockup: fix signednessIngo Molnar2-13/+13
fix softlockup tunables signedness. mark tunables read-mostly. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: latencytop supportArjan van de Ven5-1/+258
LatencyTOP kernel infrastructure; it measures latencies in the scheduler and tracks it system wide and per process. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: fix goto retry in pick_next_task_rt()Dmitry Adamushko1-7/+2
looking at it one more time: (1) it looks to me that there is no need to call sched_rt_ratio_exceeded() from pick_next_rt_entity() - [ for CONFIG_FAIR_GROUP_SCHED ] queues with rt_rq->rt_throttled are not within this 'tree-like hierarchy' (or whatever we should call it :-) - there is also no need to re-check 'rt_rq->rt_time > ratio' at this point as 'rt_rq->rt_time' couldn't have been increased since the last call to update_curr_rt() (which obviously calls sched_rt_ratio_esceeded()) well, it might be that 'ratio' for this rt_rq has been re-configured (and the period over which this rt_rq was active has not yet been finished)... but I don't think we should really take this into account. (2) now pick_next_rt_entity() must never return NULL, so let's change pick_next_task_rt() accordingly. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: monitor clock underflows in /proc/sched_debugGuillaume Chazarain2-2/+5
We monitor clock overflows, let's also monitor clock underflows. Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: fix rq->clock warps on frequency changesGuillaume Chazarain1-1/+1
sched: fix rq->clock warps on frequency changes Fix 2bacec8c318ca0418c0ee9ac662ee44207765dd4 (sched: touch softlockup watchdog after idling) that reintroduced warps on frequency changes. touch_softlockup_watchdog() calls __update_rq_clock that checks rq->clock for warps, so call it after adjusting rq->clock. Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: fix, always create kernel threads with normal priorityMichal Schmidt1-1/+11
Ensure that the kernel threads are created with the usual nice level and affinity even if kthreadd's properties were changed from the default by root. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25debug: clean up kernel/profile.cPaolo Ciarrocchi1-50/+49
Before: total: 25 errors, 13 warnings, 602 lines checked After: total: 0 errors, 2 warnings, 601 lines checked No code changed: kernel/profile.o: text data bss dec hex filename 3048 236 24 3308 cec profile.o.before 3048 236 24 3308 cec profile.o.after md5: 2501d64748a4d350dffb11203e2a5182 profile.o.before.asm 2501d64748a4d350dffb11203e2a5182 profile.o.after.asm Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: remove the !PREEMPT_BKL codeIngo Molnar2-20/+3
remove the !PREEMPT_BKL code. this removes 160 lines of legacy code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: make PREEMPT_BKL the defaultIngo Molnar1-8/+1
make PREEMPT_BKL the default. precursor to removal of the !PREEMPT_BKL code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25debug: track and print last unloaded module in the oops traceArjan van de Ven1-0/+6
Based on a suggestion from Andi: In various cases, the unload of a module may leave some bad state around that causes a kernel crash AFTER a module is unloaded; and it's then hard to find which module caused that. This patch tracks the last unloaded module, and prints this as part of the module list in the oops trace. Right now, only the last 1 module is tracked; I expect that this is enough for the vast majority of cases where this information matters; if it turns out that tracking more is important, we can always extend it to that. [ mingo@elte.hu: build fix ] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25debug: show being-loaded/being-unloaded indicator for modulesArjan van de Ven1-6/+15
It's rather common that an oops/WARN_ON/BUG happens during the load or unload of a module. Unfortunatly, it's not always easy to see directly which module is being loaded/unloaded from the oops itself. Worse, it's not even always possible to ask the bug reporter, since there are so many components (udev etc) that auto-load modules that there's a good chance that even the reporter doesn't know which module this is. This patch extends the existing "show if it's tainting" print code, which is used as part of printing the modules in the oops/BUG/WARN_ON to include a "+" for "being loaded" and a "-" for "being unloaded". As a result this extension, the "taint_flags()" function gets renamed to "module_flags()" (and takes a module struct as argument, not a taint flags int). Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: rt-watchdog: fix .rlim_max = RLIM_INFINITYPeter Zijlstra2-8/+3
Remove the curious logic to set it_sched_expires in the future. It useless because rt.timeout wouldn't be incremented anyway. Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1 is d-2. Signed-off-by: Peter Zijlsta <a.p.zijlstra@chello.nl> CC: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: rt-group: reduce reschedulingPeter Zijlstra1-1/+4
Only reschedule if the new group has a higher prio task. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25hrtimer: unlock hrtimer_wakeupPeter Zijlstra1-1/+3
hrtimer_wakeup creates a base->lock rq->lock lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ which doesn't hold base->lock. This fully untangles hrtimer locks from the scheduler locks, and allows hrtimer usage in the scheduler proper. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25hrtimer: fixup the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ fallbackPeter Zijlstra2-132/+141
Currently all highres=off timers are run from softirq context, but HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context. Fix this up by splitting it similar to the highres=on case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25hrtimer: clean up cpu->base locking tricksPeter Zijlstra2-9/+19
In order to more easily allow for the scheduler to use timers, clean up the locking a bit. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: rt throttling vs no_hzPeter Zijlstra3-15/+43
We need to teach no_hz about the rt throttling because its tick driven. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: pull_rt_task() cleanupMike Galbraith1-6/+4
"goto out" is an odd way to spell "skip". Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: rt group schedulingPeter Zijlstra3-203/+537
Extend group scheduling to also cover the realtime classes. It uses the time limiting introduced by the previous patch to allow multiple realtime groups. The hard time limit is required to keep behaviour deterministic. The algorithms used make the realtime scheduler O(tg), linear scaling wrt the number of task groups. This is the worst case behaviour I can't seem to get out of, the avg. case of the algorithms can be improved, I focused on correctness and worst case. [ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: rt time limitPeter Zijlstra3-21/+120
Very simple time limit on the realtime scheduling classes. Allow the rq's realtime class to consume sched_rt_ratio of every sched_rt_period slice. If the class exceeds this quota the fair class will preempt the realtime class. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: high-res preemption tickPeter Zijlstra5-17/+268
Use HR-timers (when available) to deliver an accurate preemption tick. The regular scheduler tick that runs at 1/HZ can be too coarse when nice level are used. The fairness system will still keep the cpu utilisation 'fair' by then delaying the task that got an excessive amount of CPU time but try to minimize this by delivering preemption points spot-on. The average frequency of this extra interrupt is sched_latency / nr_latency. Which need not be higher than 1/HZ, its just that the distribution within the sched_latency period is important. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: do not do cond_resched() when CONFIG_PREEMPTHerbert Xu1-2/+4
Why do we even have cond_resched when real preemption is on? It seems to be a waste of space and time. remove cond_resched with CONFIG_PREEMPT on. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: documentation, whitespace fixesIngo Molnar1-4/+4
whitespace fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: SCHED_FIFO/SCHED_RR watchdog timerPeter Zijlstra2-0/+59
Introduce a new rlimit that allows the user to set a runtime timeout on real-time tasks their slice. Once this limit is exceeded the task will receive SIGXCPU. So it measures runtime since the last sleep. Input and ideas by Thomas Gleixner and Lennart Poettering. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Lennart Poettering <mzxreary@0pointer.de> CC: Michael Kerrisk <mtk.manpages@googlemail.com> CC: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: sched_rt_entityPeter Zijlstra2-11/+11
Move the task_struct members specific to rt scheduling together. A future optimization could be to put sched_entity and sched_rt_entity into a union. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25uids: merge multiple error paths in alloc_uid() into onePavel Emelyanov1-27/+20
There are already 4 error paths in alloc_uid() that do incremental rollbacks. I think it's time to merge them. This costs us 8 lines of code :) Maybe it would be better to merge this patch with the previous one, but I remember that some time ago I sent a similar patch (fixing the error path and cleaning it), but I was told to make two patches in such cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: dynamically update the root-domain span/online mapsGregory Haskins1-12/+19
The baseline code statically builds the span maps when the domain is formed. Previous attempts at dynamically updating the maps caused a suspend-to-ram regression, which should now be fixed. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25Preempt-RCU: CPU Hotplug handlingPaul E. McKenney1-5/+142
This patch allows preemptible RCU to tolerate CPU-hotplug operations. It accomplishes this by maintaining a local copy of a map of online CPUs, which it accesses under its own lock. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25Preempt-RCU: implementationPaul E. McKenney6-2/+1166
This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25Preempt-RCU: fix rcu_barrier for preemptive environment.Paul E. McKenney2-1/+11
Fix rcu_barrier() to work properly in preemptive kernel environment. Also, the ordering of callback must be preserved while moving callbacks to another CPU during CPU hotplug. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25Preempt-RCU: reorganize RCU code into rcuclassic.c and rcupdate.cPaul E. McKenney3-551/+602
This patch re-organizes the RCU code to enable multiple implementations of RCU. Users of RCU continues to include rcupdate.h and the RCU interfaces remain the same. This is in preparation for subsequently merging the preemptible RCU implementation. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25Preempt-RCU: Use softirq instead of tasklets forDipankar Sarma1-8/+17
This patch makes RCU use softirq instead of tasklets. It also adds a memory barrier after raising the softirq inorder to ensure that the cpu sees the most recently updated value of rcu->cur while processing callbacks. The discussion of the related theoretical race pointed out by James Huang can be found here --> http://lkml.org/lkml/2007/11/20/603 Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: remove some old cpuset logicGregory Haskins1-33/+0
We had support for overlapping cpuset based rto logic in early prototypes that is no longer used, so remove it. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: RT-balance, only adjust overload state when changingGregory Haskins1-3/+5
The overload set/clears were originally idempotent when this logic was first implemented. But that is no longer true due to the addition of the atomic counter and this logic was never updated to work properly with that change. So only adjust the overload state if it is actually changing to avoid getting out of sync. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: RT-balance, add new methods to sched_classSteven Rostedt4-22/+179
Dmitry Adamushko found that the current implementation of the RT balancing code left out changes to the sched_setscheduler and rt_mutex_setprio. This patch addresses this issue by adding methods to the schedule classes to handle being switched out of (switched_from) and being switched into (switched_to) a sched_class. Also a method for changing of priorities is also added (prio_changed). This patch also removes some duplicate logic between rt_mutex_setprio and sched_setscheduler. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: RT-balance, replace hooks with pre/post schedule and wakeup methodsSteven Rostedt2-14/+23
To make the main sched.c code more agnostic to the schedule classes. Instead of having specific hooks in the schedule code for the RT class balancing. They are replaced with a pre_schedule, post_schedule and task_wake_up methods. These methods may be used by any of the classes but currently, only the sched_rt class implements them. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: remove do_div() from __sched_slice()Peter Zijlstra1-1/+1
Yanmin Zhang noticed a nice optimization: p = l * nr / nl, nl = l/g -> p = g * nr which eliminates a do_div() from __sched_period(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: get rid of 'new_cpu' in try_to_wake_up()Dmitry Adamushko1-8/+3
Clean-up try_to_wake_up(). Get rid of the 'new_cpu' variable in try_to_wake_up() [ that's, one #ifdef section less ]. Also remove a few redundant blank lines. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: no need for 'affine wakeup' balancingDmitry Adamushko1-0/+3
No need to do a check for 'affine wakeup and passive balancing possibilities' in select_task_rq_fair() when task_cpu(p) == this_cpu. I guess, this part got missed upon introduction of per-sched_class select_task_rq() in try_to_wake_up(). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: add credits for RT balancing improvementsIngo Molnar1-0/+2
add credits for RT balancing improvements. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: style cleanup, #2Ingo Molnar1-15/+17
style cleanup of various changes that were done recently. no code changed: text data bss dec hex filename 26399 2578 48 29025 7161 sched.o.before 26399 2578 48 29025 7161 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>