aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2011-04-14sched: Move the second half of ttwu() to the remote cpuPeter Zijlstra2-0/+62
Now that we've removed the rq->lock requirement from the first part of ttwu() and can compute placement without holding any rq->lock, ensure we execute the second half of ttwu() on the actual cpu we want the task to run on. This avoids having to take rq->lock and doing the task enqueue remotely, saving lots on cacheline transfers. As measured using: http://oss.oracle.com/~mason/sembench.c $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done $ echo 4096 32000 64 128 > /proc/sys/kernel/sem $ ./sembench -t 2048 -w 1900 -o 0 unpatched: run time 30 seconds 647278 worker burns per second patched: run time 30 seconds 816715 worker burns per second Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152729.515897185@chello.nl
2011-04-14sched: Restructure ttwu() some morePeter Zijlstra1-33/+58
Factor our helper functions to make the inner workings of try_to_wake_up() more obvious, this also allows for adding remote queues. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152729.475848012@chello.nl
2011-04-14sched: Rename ttwu_post_activation() to ttwu_do_wakeup()Peter Zijlstra1-3/+6
The ttwu_post_activation() code does the core wakeup, it sets TASK_RUNNING and performs wakeup-preemption, so give is a more descriptive name. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152729.434609705@chello.nl
2011-04-14sched: Remove rq argument from ttwu_stat()Peter Zijlstra1-3/+6
In order to call ttwu_stat() without holding rq->lock we must remove its rq argument. Since we need to change rq stats, account to the local rq instead of the task rq, this is safe since we have IRQs disabled. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152729.394638826@chello.nl
2011-04-14sched: Remove rq->lock from the first half of ttwu()Peter Zijlstra1-28/+37
Currently ttwu() does two rq->lock acquisitions, once on the task's old rq, holding it over the p->state fiddling and load-balance pass. Then it drops the old rq->lock to acquire the new rq->lock. By having serialized ttwu(), p->sched_class, p->cpus_allowed with p->pi_lock, we can now drop the whole first rq->lock acquisition. The p->pi_lock serializing concurrent ttwu() calls protects p->state, which we will set to TASK_WAKING to bridge possible p->pi_lock to rq->lock gaps and serialize set_task_cpu() calls against task_rq_lock(). The p->pi_lock serialization of p->sched_class allows us to call scheduling class methods without holding the rq->lock, and the serialization of p->cpus_allowed allows us to do the load-balancing bits without races. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152729.354401150@chello.nl
2011-04-14sched: Drop rq->lock from sched_exec()Peter Zijlstra1-10/+5
Since we can now call select_task_rq() and set_task_cpu() with only p->pi_lock held, and sched_exec() load-balancing has always been optimistic, drop all rq->lock usage. Oleg also noted that need_migrate_task() will always be true for current, so don't bother calling that at all. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110405152729.314204889@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14sched: Drop rq->lock from first part of wake_up_new_task()Peter Zijlstra1-14/+3
Since p->pi_lock now protects all things needed to call select_task_rq() avoid the double remote rq->lock acquisition and rely on p->pi_lock. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110405152729.273362517@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14sched: Add p->pi_lock to task_rq_lock()Peter Zijlstra1-56/+47
In order to be able to call set_task_cpu() while either holding p->pi_lock or task_rq(p)->lock we need to hold both locks in order to stabilize task_rq(). This makes task_rq_lock() acquire both locks, and have __task_rq_lock() validate that p->pi_lock is held. This increases the locking overhead for most scheduler syscalls but allows reduction of rq->lock contention for some scheduler hot paths (ttwu). Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110405152729.232781355@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14sched: Also serialize ttwu_local() with p->pi_lockPeter Zijlstra1-12/+19
Since we now serialize ttwu() using p->pi_lock, we also need to serialize ttwu_local() using that, otherwise, once we drop the rq->lock from ttwu() it can race with ttwu_local(). Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152729.192366907@chello.nl
2011-04-14sched: Delay task_contributes_to_load()Peter Zijlstra1-12/+4
In prepratation of having to call task_contributes_to_load() without holding rq->lock, we need to store the result until we do and can update the rq accounting accordingly. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152729.151523907@chello.nl
2011-04-14sched: Deal with non-atomic min_vruntime reads on 32bitsPeter Zijlstra2-2/+20
In order to avoid reading partial updated min_vruntime values on 32bit implement a seqcount like solution. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110405152729.111378493@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14sched: Remove rq argument to sched_class::task_waking()Peter Zijlstra2-2/+4
In preparation of calling this without rq->lock held, remove the dependency on the rq argument. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110405152729.071474242@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14sched: Drop the rq argument to sched_class::select_task_rq()Peter Zijlstra5-25/+40
In preparation of calling select_task_rq() without rq->lock held, drop the dependency on the rq argument. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110405152729.031077745@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14sched: Serialize p->cpus_allowed and ttwu() using p->pi_lockPeter Zijlstra1-21/+16
Currently p->pi_lock already serializes p->sched_class, also put p->cpus_allowed and try_to_wake_up() under it, this prepares the way to do the first part of ttwu() without holding rq->lock. By having p->sched_class and p->cpus_allowed serialized by p->pi_lock, we prepare the way to call select_task_rq() without holding rq->lock. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110405152728.990364093@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14sched: Provide p->on_rqPeter Zijlstra4-28/+30
Provide a generic p->on_rq because the p->se.on_rq semantics are unfavourable for lockless wakeups but needed for sched_fair. In particular, p->on_rq is only cleared when we actually dequeue the task in schedule() and not on any random dequeue as done by things like __migrate_task() and __sched_setscheduler(). This also allows us to remove p->se usage from !sched_fair code. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152728.949545047@chello.nl
2011-04-14sched: Clean up ttwu() statsPeter Zijlstra1-35/+40
Collect all ttwu() stat code into a single function and ensure its always called for an actual wakeup (changing p->state to TASK_RUNNING). Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152728.908177058@chello.nl
2011-04-14sched: Change the ttwu() success detailsPeter Zijlstra1-9/+7
try_to_wake_up() would only return a success when it would have to place a task on a rq, change that to every time we change p->state to TASK_RUNNING, because that's the real measure of wakeups. This results in that success is always true for the tracepoints. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152728.866866929@chello.nl
2011-04-14sched: Move wq_worker_waking to the correct sitePeter Zijlstra1-3/+4
wq_worker_waking_up() needs to match wq_worker_sleeping(), since the latter is only called on deactivate, move the former near activate. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/n/top-t3m7n70n9frmv4pv2n5fwmov@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14mutex: Use p->on_cpu for the adaptive spinPeter Zijlstra5-54/+37
Since we now have p->on_cpu unconditionally available, use it to re-implement mutex_spin_on_owner. Requested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152728.826338173@chello.nl
2011-04-14sched: Always provide p->on_cpuPeter Zijlstra1-17/+29
Always provide p->on_cpu so that we can determine if its on a cpu without having to lock the rq. Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110405152728.785452014@chello.nl Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-14Merge branch 'linus' into sched/lockingIngo Molnar2-11/+15
Merge reason: Pick up this upstream commit: 6631e635c65d: block: don't flush plugged IO on forced preemtion scheduling As it modifies the scheduler and we'll queue up dependent patches. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-13block: don't flush plugged IO on forced preemtion schedulingLinus Torvalds1-10/+10
We really only want to unplug the pending IO when the process actually goes to sleep. So move the test for flushing the plug up to the place where we actually deactivate the task - where we have properly checked for preemption and for the process really sleeping. Acked-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-11fix XEN_SAVE_RESTORE Kconfig dependenciesShriram Rajagopalan1-1/+1
Make XEN_SAVE_RESTORE select HIBERNATE_CALLBACKS. Remove XEN_SAVE_RESTORE dependency from PM_SLEEP. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-04-11PM / Hibernate: Introduce CONFIG_HIBERNATE_CALLBACKSRafael J. Wysocki1-1/+5
Xen save/restore is going to use hibernate device callbacks for quiescing devices and putting them back to normal operations and it would need to select CONFIG_HIBERNATION for this purpose. However, that also would cause the hibernate interfaces for user space to be enabled, which might confuse user space, because the Xen kernels don't support hibernation. Moreover, it would be wasteful, as it would make the Xen kernels include a substantial amount of code that they would never use. To address this issue introduce new power management Kconfig option CONFIG_HIBERNATE_CALLBACKS, such that it will only select the code that is necessary for the hibernate device callbacks to work and make CONFIG_HIBERNATION select it. Then, Xen save/restore will be able to select CONFIG_HIBERNATE_CALLBACKS without dragging the entire hibernate code along with it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
2011-04-11sched: Eliminate dead code from wakeup_gran()Shaohua Li1-4/+1
calc_delta_fair() checks NICE_0_LOAD already, delete duplicate check. Signed-off-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/1302238389.3981.92.camel@sli10-conroe Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-11sched: Fix erroneous all_pinned logicKen Chen1-7/+4
The scheduler load balancer has specific code to deal with cases of unbalanced system due to lots of unmovable tasks (for example because of hard CPU affinity). In those situation, it excludes the busiest CPU that has pinned tasks for load balance consideration such that it can perform second 2nd load balance pass on the rest of the system. This all works as designed if there is only one cgroup in the system. However, when we have multiple cgroups, this logic has false positives and triggers multiple load balance passes despite there are actually no pinned tasks at all. The reason it has false positives is that the all pinned logic is deep in the lowest function of can_migrate_task() and is too low level: load_balance_fair() iterates each task group and calls balance_tasks() to migrate target load. Along the way, balance_tasks() will also set a all_pinned variable. Given that task-groups are iterated, this all_pinned variable is essentially the status of last group in the scanning process. Task group can have number of reasons that no load being migrated, none due to cpu affinity. However, this status bit is being propagated back up to the higher level load_balance(), which incorrectly think that no tasks were moved. It kick off the all pinned logic and start multiple passes attempt to move load onto puller CPU. To fix this, move the all_pinned aggregation up at the iterator level. This ensures that the status is aggregated over all task-groups, not just last one in the list. Signed-off-by: Ken Chen <kenchen@google.com> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/BANLkTi=ernzNawaR5tJZEsV_QVnfxqXmsQ@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-11sched: Fix sched-domain avg_load calculationKen Chen1-1/+2
In function find_busiest_group(), the sched-domain avg_load isn't calculated at all if there is a group imbalance within the domain. This will cause erroneous imbalance calculation. The reason is that calculate_imbalance() sees sds->avg_load = 0 and it will dump entire sds->max_load into imbalance variable, which is used later on to migrate entire load from busiest CPU to the puller CPU. This has two really bad effect: 1. stampede of task migration, and they won't be able to break out of the bad state because of positive feedback loop: large load delta -> heavier load migration -> larger imbalance and the cycle goes on. 2. severe imbalance in CPU queue depth. This causes really long scheduling latency blip which affects badly on application that has tight latency requirement. The fix is to have kernel calculate domain avg_load in both cases. This will ensure that imbalance calculation is always sensible and the target is usually half way between busiest and puller CPU. Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/20110408002322.3A0D812217F@elm.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-08signal.c: fix erroneous syscall kernel-docRandy Dunlap1-2/+2
Fix erroneous syscall kernel-doc comments in kernel/signal.c. Reported-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-07Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds3-4/+16
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, fpu: Fix FPU exception handling on non-SSE systems x86, hibernate: Initialize mmu_cr4_features during boot x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change x86: visws: Fixup irq overhaul fallout * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Clean up rebalance_domains() load-balance interval calculation * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/mrst/vrtc: Fix boot crash in mrst_rtc_init() rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm() * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Fix cpumask leak in __setup_irq() * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf probe: Fix listing incorrect line number with inline function perf probe: Fix to find recursively inlined function perf probe: Fix multiple --vars options behavior perf probe: Fix to remove redundant close perf probe: Fix to ensure function declared file
2011-04-07Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds40-55/+55
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
2011-04-05sched: Clean up rebalance_domains() load-balance interval calculationPeter Zijlstra2-4/+15
Instead of the possible multiple-evaluation of num_online_cpus() in rebalance_domains() that Linus reported, avoid it altogether in the normal case since it's implemented with a Hamming weight function over a cpu bitmask which can be darn expensive for those with big iron. This also makes it cleaner, smaller and documents the code. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1301991265.2225.12.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-04kernel/signal.c: add kernel-doc notation to syscallsRandy Dunlap1-2/+63
Add kernel-doc to syscalls in signal.c. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04kernel/signal.c: fix typos and coding styleRandy Dunlap1-42/+48
General coding style and comment fixes; no code changes: - Use multi-line-comment coding style. - Put some function signatures completely on one line. - Hyphenate some words. - Spell Posix as POSIX. - Correct typos & spellos in some comments. - Drop trailing whitespace. - End sentences with periods. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds2-2/+14
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix rebalance interval calculation sched, doc: Beef up load balancing description sched: Leave sched_setscheduler() earlier if possible, do not disturb SCHED_FIFO tasks
2011-04-04Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds1-2/+7
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix task_struct reference leak perf: Fix task context scheduling perf: mmap 512 kiB by default perf: Rebase max unprivileged mlock threshold on top of page size perf tools: Fix NO_NEWT=1 python build error perf symbols: Properly align symbol_conf.priv_size perf tools: Emit clearer message for sys_perf_event_open ENOENT return perf tools: Fixup exit path when not able to open events perf symbols: Fix vsyscall symbol lookup oprofile, x86: Allow setting EDGE/INV/CMASK for counter events
2011-04-04ntp: fix non privileged system time shiftingRichard Cochran1-0/+2
The ADJ_SETOFFSET bit added in commit 094aa188 ("ntp: Add ADJ_SETOFFSET mode bit") also introduced a way for any user to change the system time. Sneaky or buggy calls to adjtimex() could set ADJ_OFFSET_SS_READ | ADJ_SETOFFSET which would result in a successful call to timekeeping_inject_offset(). This patch fixes the issue by adding the capability check. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-02genirq: Fix cpumask leak in __setup_irq()Xiaotian Feng1-0/+1
The allocated cpumask should be freed in __setup_irq(). Signed-off-by: Xiaotian Feng <dfeng@redhat.com> LKML-Reference: <1301744375-6812-1-git-send-email-dfeng@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-01kdump: Allow shrinking of kdump region to be overriddenAnton Blanchard1-2/+3
On ppc64 the crashkernel region almost always overlaps an area of firmware. This works fine except when using the sysfs interface to reduce the kdump region. If we free the firmware area we are guaranteed to crash. Rename free_reserved_phys_range to crash_free_reserved_phys_range and make it a weak function so we can override it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-03-31Fix common misspellingsLucas De Marchi40-55/+55
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31perf: Fix task_struct reference leakPeter Zijlstra1-0/+5
sys_perf_event_open() had an imbalance in the number of task refs it took causing memory leakage Cc: Jiri Olsa <jolsa@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@kernel.org # .37+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31perf: Rebase max unprivileged mlock threshold on top of page sizeFrederic Weisbecker1-2/+2
Ensure we allow 512 kiB + 1 page for user control without assuming a 4096 bytes page size. Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: <stable@kernel.org> LKML-Reference: <1301535209-9679-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31sched: Fix rebalance interval calculationSisir Koppaka1-2/+3
The interval for checking scheduling domains if they are due to be balanced currently depends on boot state NR_CPUS, which may not accurately reflect the number of online CPUs at the time of check. Thus replace NR_CPUS with num_online_cpus(). (ed: Should only affect those who set NR_CPUS really high, such as 4096 or so :-) Signed-off-by: Sisir Koppaka <sisir.koppaka@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <AANLkTikqHWid2Q93F5U5Qw5snJH8C5PXoa7J6=6hYO94@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31sched: Leave sched_setscheduler() earlier if possible, do not disturb SCHED_FIFO tasksDario Faggioli1-0/+11
sched_setscheduler() (in sched.c) is called in order of changing the scheduling policy and/or the real-time priority of a task. Thus, if we find out that neither of those are actually being modified, it is possible to return earlier and save the overhead of a full deactivate+activate cycle of the task in question. Beside that, if we have more than one SCHED_FIFO task with the same priority on the same rq (which means they share the same priority queue) having one of them changing its position in the priority queue because of a sched_setscheduler (as it happens by means of the deactivate+activate) that does not actually change the priority violates POSIX which states, for SCHED_FIFO: "If a thread whose policy or priority has been modified by pthread_setschedprio() is a running thread or is runnable, the effect on its position in the thread list depends on the direction of the modification, as follows: a. <...> b. If the priority is unchanged, the thread does not change position in the thread list. c. <...>" http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html (ed: And the POSIX specification here does, briefly and somewhat unexpectedly, match what common sense tells us as well. ) Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1300971618.3960.82.camel@Palantir> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-30genirq: Remove the now obsolete config options and select statementsThomas Gleixner1-3/+0
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29genirq: Fix misnamed label in handle_edge_eoi_irqThomas Gleixner1-1/+1
Reported-by: michael@ellerman.id.au Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org
2011-03-29genirq: Remove move_*irq leftoversThomas Gleixner1-10/+0
All users converted to new interface. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29genirq: Remove compat codeThomas Gleixner12-276/+24
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29alpha: Use generic show_interrupts()Thomas Gleixner1-2/+6
The only subtle difference is that alpha uses ACTUAL_NR_IRQS and prints the IRQF_DISABLED flag. Change the generic implementation to deal with ACTUAL_NR_IRQS if defined. The IRQF_DISABLED printing is pointless, as we nowadays run all interrupts with irqs disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29genirq: Fix harmless typoThomas Gleixner1-1/+1
The late night fixup missed to convert the data type from irq_desc to irq_data, which results in a harmless but annoying warning. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-28Merge branches 'irq-cleanup-for-linus' and 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds9-95/+205
* 'irq-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: vlynq: Convert irq functions * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq; Fix cleanup fallout genirq: Fix typo and remove unused variable genirq: Fix new kernel-doc warnings genirq: Add setter for AFFINITY_SET in irq_data state genirq: Provide setter inline for IRQD_IRQ_INPROGRESS genirq: Remove handle_IRQ_event arm: Ns9xxx: Remove private irq flow handler powerpc: cell: Use the core flow handler genirq: Provide edge_eoi flow handler genirq: Move INPROGRESS, MASKED and DISABLED state flags to irq_data genirq: Split irq_set_affinity() so it can be called with lock held. genirq: Add chip flag for restricting cpu_on/offline calls genirq: Add chip hooks for taking CPUs on/off line. genirq: Add irq disabled flag to irq_data state genirq: Reserve the irq when calling irq_set_chip()