aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/sched_fair.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2009-09-16sched: Add SD_PREFER_LOCALPeter Zijlstra1-2/+5
And turn it on for NUMA and MC domains. This improves locality in balancing decisions by keeping up to capacity amount of tasks local before looking for idle CPUs. (and twice the capacity if SD_POWERSAVINGS_BALANCE is set.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Add a few SYNC hint knobs to play withPeter Zijlstra1-3/+11
Currently we use overlap to weaken the SYNC hint, but allow it to set the hint as well. echo NO_SYNC_WAKEUP > /debug/sched_features echo SYNC_MORE > /debug/sched_features preserves pipe-test behaviour without using the WF_SYNC hint. Worth playing with on more workloads... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Add WF_FORKPeter Zijlstra1-1/+1
Avoid the cache buddies from biasing the time distribution away from fork()ers. Normally the next buddy will be the preferred scheduling target, but this makes fork()s prefer to run the new child, whereas we prefer to run the parent, since that will generate more work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Rename sync argumentsPeter Zijlstra1-2/+4
In order to extend the functions to have more than 1 flag (sync), rename the argument to flags, and explicitly define a WF_ space for individual flags. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Rename select_task_rq() argumentPeter Zijlstra1-7/+7
In order to be able to rename the sync argument, we need to rename the current flag argument. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Tweak wake_idxPeter Zijlstra1-3/+18
When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Fix task affinity for select_task_rq_fairPeter Zijlstra1-3/+2
While merging select_task_rq_fair() and sched_balance_self() I made a mistake that leads to testing the wrong task affinty. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: for_each_domain() vs RCUPeter Zijlstra1-2/+7
for_each_domain() uses RCU to serialize the sched_domains, except it doesn't actually use rcu_read_lock() and instead relies on disabling preemption -> FAIL. XXX: audit other sched_domain code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Weaken SD_POWERSAVINGS_BALANCEPeter Zijlstra1-3/+18
One of the problems of power-saving balancing is that under certain scenarios it is too slow and allows tons of real work to pile up. Avoid this by ignoring the powersave stuff when there's real work to be done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Merge select_task_rq_fair() and sched_balance_self()Peter Zijlstra1-170/+63
The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Hook sched_balance_self() into sched_class::select_task_rq()Peter Zijlstra1-1/+6
Rather ugly patch to fully place the sched_balance_self() code inside the fair class. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Move sched_balance_self() into sched_fair.cPeter Zijlstra1-0/+145
Move the sched_balance_self() code into sched_fair.c This facilitates the merger of sched_balance_self() and sched_fair::select_task_rq(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Complete buddy switchesMike Galbraith1-1/+2
Add a NEXT_BUDDY feature flag to aid in debugging. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Split WAKEUP_OVERLAPPeter Zijlstra1-3/+4
It consists of two conditions, split them out in separate toggles so we can test them independently. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-13perf_counter, sched: Add sched_stat_runtime tracepointIngo Molnar1-0/+1
This allows more precise tracking of how the scheduler accounts (and acts upon) a task having spent N nanoseconds of CPU time. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-10sched: Fix sched::sched_stat_wait tracepoint fieldIngo Molnar1-2/+1
This weird perf trace output: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] Is caused by setting one component field of the delta to zero a bit too early. Move it to later. ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug, it's just a reporting bug in essence. ) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nikos Chantziaras <realnc@arcor.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09sched: Re-tune the scheduler latency defaults to decrease worst-case latenciesMike Galbraith1-6/+6
Reduce the latency target from 20 msecs to 5 msecs. Why? Larger latencies increase spread, which is good for scaling, but bad for worst case latency. We still have the ilog(nr_cpus) rule to scale up on bigger server boxes. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09sched: Turn off child_runs_firstMike Galbraith1-2/+2
Set child_runs_first default to off. It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs get preempted by child tasks, reducing parallelism. Note, this patch might make existing races in user applications more prominent than before - so breakages might be bisected to this commit. Child-runs-first is broken on SMP to begin with, and we already had it off briefly in v2.6.23 so most of the offenders ought to be fixed. Would be nice not to revert this commit but fix those apps finally ... Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case people want to work around broken apps. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-08sched: Ensure that a child can't gain time over it's parent after fork()Mike Galbraith1-3/+5
A fork/exec load is usually "pass the baton", so the child should never be placed behind the parent. With START_DEBIT we make room for the new task, but with child_runs_first, that room comes out of the _parent's_ hide. There's nothing to say that the parent wasn't ahead of min_vruntime at fork() time, which means that the "baton carrier", who is essentially the parent in drag, can gain time and increase scheduling latencies for waiters. With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first enabled, we essentially pass the sleeper fairness off to the child, which is fine, but if we don't base placement on the parent's updated vruntime, we can end up compounding latency woes if the child itself then does fork/exec. The debit incurred at fork doesn't hurt the parent who is then going to sleep and maybe exit, but the child who acquires the error harms all comers. This improves latencies of make -j<n> kernel build workloads. Reported-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07sched: Deal with low-load in wake_affine()Peter Zijlstra1-1/+11
wake_affine() would always fail under low-load situations where both prev and this were idle, because adding a single task will always be a significant imbalance, even if there's nothing around that could balance it. Deal with this by allowing imbalance when there's nothing you can do about it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07sched: Remove short cut from select_task_rq_fair()Peter Zijlstra1-2/+0
select_task_rq_fair() incorrectly skips the wake_affine() logic, remove this. When prev_cpu == this_cpu, the code jumps straight to the wake_idle() logic, this doesn't give the wake_affine() logic the chance to pin the task to this cpu. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02sched: Add wait, sleep and iowait accounting tracepointsPeter Zijlstra1-1/+11
Add 3 schedstat tracepoints to help account for wait-time, sleep-time and iowait-time. They can also be used as a perf-counter source to profile tasks on these clocks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <new-submission> [ build fix for the !CONFIG_SCHEDSTATS case ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02sched: Provide iowait countersArjan van de Ven1-0/+5
For counting how long an application has been waiting for (disk) IO, there currently is only the HZ sample driven information available, while for all other counters in this class, a high resolution version is available via CONFIG_SCHEDSTATS. In order to make an improved bootchart tool possible, we also need a higher resolution version of the iowait time. This patch below adds this scheduler statistic to the kernel. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4A64B813.1080506@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Add debug check to task_of()Peter Zijlstra1-6/+14
A frequent mistake appears to be to call task_of() on a scheduler entity that is not actually a task, which can result in a wild pointer. Add a check to catch these mistakes. Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Fully integrate cpus_active_map and root-domain codeGregory Haskins1-3/+7
Reflect "active" cpus in the rq->rd->online field, instead of the online_map. The motivation is that things that use the root-domain code (such as cpupri) only care about cpus classified as "active" anyway. By synchronizing the root-domain state with the active map, we allow several optimizations. For instance, we can remove an extra cpumask_and from the scheduler hotpath by utilizing rq->rd->online (since it is now a cached version of cpu_active_map & rq->rd->span). Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Fix latencytop and sleep profiling vs group schedulingPeter Zijlstra1-13/+19
The latencytop and sleep accounting code assumes that any scheduler entity represents a task, this is not so. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18sched: Account for vruntime wrappingFabio Checconi1-2/+8
I spotted two sites that didn't take vruntime wrap-around into account. Fix these by creating a comparison helper that does do so. Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11sched: Fix bug in SCHED_IDLE interaction with group schedulingPaul Turner1-1/+2
One of the isolation modifications for SCHED_IDLE is the unitization of sleeper credit. However the check for this assumes that the sched_entity we're placing always belongs to a task. This is potentially not true with group scheduling and leaves us rummaging randomly when we try to pull the policy. Signed-off-by: Paul Turner <pjt@google.com> Cc: peterz@infradead.org LKML-Reference: <alpine.DEB.1.00.0907101649570.29914@kitami.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17sched: Fix out of scope variable access in sched_slice()Christian Engelmayer1-1/+2
Access to local variable lw is aliased by usage of pointer load. Access to pointer load in calc_delta_mine() happens when lw is already out of scope. [ Reported by static code analysis. ] Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com> LKML-Reference: <20090616103512.0c846e51@frequentis.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09sched: remove redundant hierarchy walk in check_preempt_wakeupPaul Turner1-10/+3
Impact: micro-optimization Under group scheduling we traverse up until we are at common siblings to make the wakeup comparison on. At this point however, they should have the same parent so continuing to check up the tree is redundant. Signed-off-by: Paul Turner <pjt@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <alpine.DEB.1.00.0904081520320.30317@kitami.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-15Merge branch 'sched/urgent'; commit 'v2.6.29-rc5' into sched/coreIngo Molnar1-2/+9
2009-02-11sched: revert recent sync wakeup changesPeter Zijlstra1-2/+9
Intel reported a 10% regression (mysql+sysbench) on a 16-way machine with these patches: 1596e29: sched: symmetric sync vs avg_overlap d942fb6: sched: fix sync wakeups Revert them. Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Bisected-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-08Merge branches 'sched/rt' and 'sched/urgent' into sched/coreIngo Molnar1-21/+48
2009-02-01sched: fix buddie group latencyPeter Zijlstra1-2/+8
Similar to the previous patch, by not clearing buddies we can select entities past their run quota, which can increase latency. This means we have to clear group buddies as well. Do not use the group clear for pick_next_task(), otherwise that'll get O(n^2). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-01sched: clear buddies more aggressivelyMike Galbraith1-1/+12
It was noticed that a task could get re-elected past its run quota due to buddy affinities. This could increase latency a little. Cure it by more aggresively clearing buddy state. We do so in two situations: - when we force preempt - when we select a buddy to run Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-01sched: fix sync wakeupsPeter Zijlstra1-9/+2
Pawel Dziekonski reported that the openssl benchmark and his quantum chemistry application both show slowdowns due to the scheduler under-parallelizing execution. The reason are pipe wakeups still doing 'sync' wakeups which overrides the normal buddy wakeup logic - even if waker and wakee are loosely coupled. Fix an inversion of logic in the buddy wakeup code. Reported-by: Pawel Dziekonski <dzieko@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15sched: sched_slice() fixletLin Ming1-1/+4
Mike's change: 0a582440f "sched: fix sched_slice())" broke group scheduling by forgetting to reload cfs_rq on each loop. This patch fixes aim7 regression and specjbb2005 regression becomes less than 1.5% on 8-core stokley. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Jayson King <dev@jaysonking.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15sched: fix update_min_vruntimePeter Zijlstra1-1/+1
Impact: fix SCHED_IDLE latency problems OK, so we have 1 running task A (which is obviously curr and the tree is equally obviously empty). 'A' nicely chugs along, doing its thing, carrying min_vruntime along as it goes. Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP balancing, which is very likely far right, in that case update_curr update_min_vruntime cfs_rq->rb_leftmost := true (the crazy task sitting in a tree) vruntime = se->vruntime and voila, min_vruntime is waaay right of where it ought to be. OK, so why did I write it like that to begin with... Aah, yes. Say we've just dequeued current schedule deactivate_task(prev) dequeue_entity update_min_vruntime Then we'll set vruntime = cfs_rq->min_vruntime; we find !cfs_rq->curr, but do find someone in the tree. Then we _must_ do vruntime = se->vruntime, because vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime) will not advance vruntime, and cause lags the other way around (which we fixed with that initial patch: 1af5f730fc1bf7c62ec9fb2d307206e18bf40a69 (sched: more accurate min_vruntime accounting). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Mike Galbraith <efault@gmx.de> Acked-by: Mike Galbraith <efault@gmx.de> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15sched: SCHED_OTHER vs SCHED_IDLE isolationPeter Zijlstra1-8/+22
Stronger SCHED_IDLE isolation: - no SCHED_IDLE buddies - never let SCHED_IDLE preempt on wakeup - always preempt SCHED_IDLE on wakeup - limit SLEEPER fairness for SCHED_IDLE. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15sched: prefer wakersPeter Zijlstra1-6/+53
Prefer tasks that wake other tasks to preempt quickly. This improves performance because more work is available sooner. The workload that prompted this patch was a kernel build over NFS4 (for some curious and not understood reason we had to revert commit: 18de9735300756e3ca9c361ef58409d8561dfe0d to make any progress at all) Without this patch a make -j8 bzImage (of x86-64 defconfig) would take 3m30-ish, with this patch we're down to 2m50-ish. psql-sysbench/mysql-sysbench show a slight improvement in peak performance as well, tbench and vmark seemed to not care. It is possible to improve upon the build time (to 2m20-ish) but that seriously destroys other benchmarks (just shows that there's more room for tinkering). Much thanks to Mike who put in a lot of effort to benchmark things and proved a worthy opponent with a competing patch. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-08generic swap(): sched: remove local swap() macroWu Fengguang1-2/+0
Use the new generic implementation. Signed-off-by: Wu Fengguang <wfg@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-05Merge branch 'linus' into sched/urgentIngo Molnar1-8/+24
2009-01-02Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds1-8/+24
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually
2009-01-02sched: fix sched_slice()Mike Galbraith1-18/+12
Impact: fix bad-interactivity buglet Fix sched_slice() to emit a sane result whether a task is currently enqueued or not. Signed-off-by: Mike Galbraith <efault@gmx.de> Tested-by: Jayson King <dev@jaysonking.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/sched_fair.c | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-)
2008-12-19sched: bias task wakeups to preferred semi-idle packagesVaidyanathan Srinivasan1-0/+18
Impact: tweak task wakeup to save power more agressively Preferred wakeup cpu (from a semi idle package) has been nominated in find_busiest_group() in the previous patch. Use this information in sched_mc_preferred_wakeup_cpu in function wake_idle() to bias task wakeups if the following conditions are satisfied: - The present cpu that is trying to wakeup the process is idle and waking the target process on this cpu will potentially wakeup a completely idle package - The previous cpu on which the target process ran is also idle and hence selecting the previous cpu may wakeup a semi idle cpu package - The task being woken up is allowed to run in the nominated cpu (cpu affinity and restrictions) Basically if both the current cpu and the previous cpu on which the task ran is idle, select the nominated cpu from semi idle cpu package for running the new task that is waking up. Cache hotness is considered since the actual biasing happens in wake_idle() only if the application is cache cold. This technique will effectively move short running bursty jobs in a mostly idle system. Wakeup biasing for power savings gets automatically disabled if system utilisation increases due to the fact that the probability of finding both this_cpu and prev_cpu idle decreases. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16sched: optimize update_curr()Peter Zijlstra1-0/+2
Impact: micro-optimization Skip the hard work when there is none. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16sched: fix wakeup preemption clockMike Galbraith1-4/+3
Impact: sharpen the wakeup-granularity to always be against current scheduler time It was possible to do the preemption check against an old time stamp. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: convert remaining old-style cpumask operatorsRusty Russell1-2/+2
Impact: Trivial API conversion NR_CPUS -> nr_cpu_ids cpumask_t -> struct cpumask sizeof(cpumask_t) -> cpumask_size() cpumask_a = cpumask_b -> cpumask_copy(&cpumask_a, &cpumask_b) cpu_set() -> cpumask_set_cpu() first_cpu() -> cpumask_first() cpumask_of_cpu() -> cpumask_of() cpus_* -> cpumask_* There are some FIXMEs where we all archs to complete infrastructure (patches have been sent): cpu_coregroup_map -> cpu_coregroup_mask node_to_cpumask* -> cpumask_of_node There is also one FIXME where we pass an array of cpumasks to partition_sched_domains(): this implies knowing the definition of 'struct cpumask' and the size of a cpumask. This will be fixed in a future patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: wrap sched_group and sched_domain cpumask accesses.Rusty Russell1-6/+4
Impact: trivial wrap of member accesses This eases the transition in the next patch. We also get rid of a temporary cpumask in find_idlest_cpu() thanks to for_each_cpu_and, and sched_balance_self() due to getting weight before setting sd to NULL. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11sched: release buddies on yieldPeter Zijlstra1-5/+12
Clear buddies on yield, so that the buddy rules don't schedule them despite them being placed right-most. This fixed a performance regression with yield-happy binary JVMs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Lin Ming <ming.m.lin@intel.com>