aboutsummaryrefslogtreecommitdiffstats
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-02-18sched/autogroup: Fix failure to set cpu.rt_runtime_usPeter Zijlstra2-5/+7
Because task_group() uses a cache of autogroup_task_group(), whose output depends on sched_class, switching classes can generate problems. In particular, when started as fair, the cache points to the autogroup, so when switching to RT the tg_rt_schedulable() test fails for every cpu.rt_{runtime,period}_us change because now the autogroup has tasks and no runtime. Furthermore, going back to the previous semantics of varying task_group() with sched_class has the down-side that the sched_debug output varies as well, even though the task really is in the autogroup. Therefore add an autogroup exception to tg_has_rt_tasks() -- such that both (all) task_group() usages in sched/core now have one. And remove all the remnants of the variable task_group() output. Reported-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Stefan Bader <stefan.bader@canonical.com> Fixes: 8323f26ce342 ("sched: Fix race in task_group()") Link: http://lkml.kernel.org/r/20150209112237.GR5029@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18sched/dl: Do update_rq_clock() in yield_task_dl()Kirill Tkhai1-0/+1
update_curr_dl() needs actual rq clock. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1423040972.18770.10.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18sched: Prevent recursion in io_schedule()NeilBrown1-19/+12
io_schedule() calls blk_flush_plug() which, depending on the contents of current->plug, can initiate arbitrary blk-io requests. Note that this contrasts with blk_schedule_flush_plug() which requires all non-trivial work to be handed off to a separate thread. This makes it possible for io_schedule() to recurse, and initiating block requests could possibly call mempool_alloc() which, in times of memory pressure, uses io_schedule(). Apart from any stack usage issues, io_schedule() will not behave correctly when called recursively as delayacct_blkio_start() does not allow for repeated calls. So: - use ->in_iowait to detect recursion. Set it earlier, and restore it to the old value. - move the call to "raw_rq" after the call to blk_flush_plug(). As this is some sort of per-cpu thing, we want some chance that we are on the right CPU - When io_schedule() is called recurively, use blk_schedule_flush_plug() which cannot further recurse. - as this makes io_schedule() a lot more complex and as io_schedule() must match io_schedule_timeout(), but all the changes in io_schedule_timeout() and make io_schedule a simple wrapper for that. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Moved the now rudimentary io_schedule() into sched.h. ] Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tony Battersby <tonyb@cybernetics.com> Link: http://lkml.kernel.org/r/20150213162600.059fffb2@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18sched/completion: Serialize completion_done() with complete()Oleg Nesterov1-2/+17
Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when reading completion state" was not correct, without lock/unlock the code like stop_machine_from_inactive_cpu() while (!completion_done()) cpu_relax(); can return before complete() finishes its spin_unlock() which writes to this memory. And spin_unlock_wait(). While at it, change try_wait_for_completion() to use READ_ONCE(). Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Added a comment with the barrier. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: raghavendra.kt@linux.vnet.ibm.com Cc: waiman.long@hp.com Fixes: de30ec47302c ("sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state") Link: http://lkml.kernel.org/r/20150212195913.GA30430@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18sched: Fix preempt_schedule_common() triggering tracing recursionFrederic Weisbecker1-1/+1
Since the function graph tracer needs to disable preemption, it might call preempt_schedule() after reenabling it if something triggered the need for rescheduling in between. Therefore we can't trace preempt_schedule() itself because we would face a function tracing recursion otherwise as the tracer is always called before PREEMPT_ACTIVE gets set to prevent that recursion. This is why preempt_schedule() is tagged as "notrace". But the same issue applies to every function called by preempt_schedule() before PREEMPT_ACTIVE is actually set. And preempt_schedule_common() is one such example. Unfortunately we forgot to tag it as notrace as well and as a result we are encountering tracing recursion since it got introduced by: a18b5d0181923 ("sched: Fix missing preemption opportunity") Let's fix that by applying the appropriate function tag to preempt_schedule_common(). Reported-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1424110807-15057-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18sched/dl: Prevent enqueue of a sleeping task in dl_task_timer()Kirill Tkhai1-0/+20
A deadline task may be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; Later the timer fires, but the task is still dequeued: dl_task_timer() enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */ Someone wakes it up: try_to_wake_up() enqueue_dl_entity() BUG_ON(on_dl_rq()) Patch fixes this problem, it prevents queueing !on_rq tasks on dl_rq. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [ Wrote comment. ] Cc: Juri Lelli <juri.lelli@arm.com> Fixes: 1019a359d3dc ("sched/deadline: Fix stale yield state") Link: http://lkml.kernel.org/r/1374601424090314@web4j.yandex.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18sched: Make dl_task_time() use task_rq_lock()Peter Zijlstra3-85/+79
Kirill reported that a dl task can be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; This invalidates the assumption from commit 0f397f2c90ce ("sched/dl: Fix race in dl_task_timer()"): "The only reason we don't strictly need ->pi_lock now is because we're guaranteed to have p->state == TASK_RUNNING here and are thus free of ttwu races". And therefore we have to use the full task_rq_lock() here. This further amends the fact that we forgot to update the rq lock loop for TASK_ON_RQ_MIGRATE, from commit cca26e8009d1 ("sched: Teach scheduler to understand TASK_ON_RQ_MIGRATING state"). Reported-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Link: http://lkml.kernel.org/r/20150217123139.GN5029@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18sched: Clarify ordering between task_rq_lock() and move_queued_task()Peter Zijlstra1-0/+16
There was a wee bit of confusion around the exact ordering here; clarify things. Reported-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20150217121258.GM5029@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-09Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-68/+60
Pull timer updates from Ingo Molnar: "The main changes in this cycle were: - rework hrtimer expiry calculation in hrtimer_interrupt(): the previous code had a subtle bug where expiry caching would miss an expiry, resulting in occasional bogus (late) expiry of hrtimers. - continuing Y2038 fixes - ktime division optimization - misc smaller fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Make __hrtimer_get_next_event() static rtc: Convert rtc_set_ntp_time() to use timespec64 rtc: Remove redundant rtc_valid_tm() from rtc_hctosys() rtc: Modify rtc_hctosys() to address y2038 issues rtc: Update rtc-dev to use y2038-safe time interfaces rtc: Update interface.c to use y2038-safe time interfaces time: Expose get_monotonic_boottime64 for in-kernel use time: Expose getboottime64 for in-kernel uses ktime: Optimize ktime_divns for constant divisors hrtimer: Prevent stale expiry time in hrtimer_interrupt() ktime.h: Introduce ktime_ms_delta
2015-02-09Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds10-77/+171
Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - various sched/deadline fixes and enhancements - rescheduling latency fixes/cleanups - rework the rq->clock code to be more consistent and more robust. - minor micro-optimizations - ->avg.decay_count fixes - add a stack overflow check to might_sleep() - idle-poll handler fix, possibly resulting in power savings - misc smaller updates and fixes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/Documentation: Remove unneeded word sched/wait: Introduce wait_on_bit_timeout() sched: Pull resched loop to __schedule() callers sched/deadline: Remove cpu_active_mask from cpudl_find() sched: Fix hrtick_start() on UP sched/deadline: Avoid pointless __setscheduler() sched/deadline: Fix stale yield state sched/deadline: Fix hrtick for a non-leftmost task sched/deadline: Modify cpudl::free_cpus to reflect rd->online sched/idle: Add missing checks to the exit condition of cpu_idle_poll() sched: Fix missing preemption opportunity sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target sched/debug: Print rq->clock_task sched/core: Rework rq->clock update skips sched/core: Validate rq_clock*() serialization sched/core: Remove check of p->sched_class sched/fair: Fix sched_entity::avg::decay_count initialization sched/debug: Fix potential call to __ffs(0) in sched_show_task() sched/debug: Check for stack overflow in ___might_sleep() sched/fair: Fix the dealing with decay_count in __synchronize_entity_decay()
2015-02-09Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds7-128/+355
Pull perf updates from Ingo Molnar: "Kernel side changes: - AMD range breakpoints support: Extend breakpoint tools and core to support address range through perf event with initial backend support for AMD extended breakpoints. The syntax is: perf record -e mem:addr/len:type For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512) perf record -e mem:0x1000/512:w - event throttling/rotating fixes - various event group handling fixes, cleanups and general paranoia code to be more robust against bugs in the future. - kernel stack overhead fixes User-visible tooling side changes: - Show precise number of samples in at the end of a 'record' session, if processing build ids, since we will then traverse the whole perf.data file and see all the PERF_RECORD_SAMPLE records, otherwise stop showing the previous off-base heuristicly counted number of "samples" (Namhyung Kim). - Support to read compressed module from build-id cache (Namhyung Kim) - Enable sampling loads and stores simultaneously in 'perf mem' (Stephane Eranian) - 'perf diff' output improvements (Namhyung Kim) - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho de Melo) Tooling side infrastructure changes: - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim) - Support parsing parameterized events (Cody P Schafer) - Add support for IP address formats in libtraceevent (David Ahern) Plus other misc fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) perf: Decouple unthrottling and rotating perf: Drop module reference on event init failure perf: Use POLLIN instead of POLL_IN for perf poll data in flag perf: Fix put_event() ctx lock perf: Fix move_group() order perf: Fix event->ctx locking perf: Add a bit of paranoia perf symbols: Convert lseek + read to pread perf tools: Use perf_data_file__fd() consistently perf symbols: Support to read compressed module from build-id cache perf evsel: Set attr.task bit for a tracking event perf header: Set header version correctly perf record: Show precise number of samples perf tools: Do not use __perf_session__process_events() directly perf callchain: Cache eh/debug frame offset for dwarf unwind perf tools: Provide stub for missing pthread_attr_setaffinity_np perf evsel: Don't rely on malloc working for sz 0 tools lib traceevent: Add support for IP address formats perf ui/tui: Show fatal error message only if exists perf tests: Fix typo in sample-parsing.c ...
2015-02-09Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds11-71/+63
Pull core locking updates from Ingo Molnar: "The main changes are: - mutex, completions and rtmutex micro-optimizations - lock debugging fix - various cleanups in the MCS and the futex code" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Optimize setting task running after being blocked locking/rwsem: Use task->state helpers sched/completion: Add lock-free checking of the blocking case sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state locking/mutex: Explicitly mark task as running after wakeup futex: Fix argument handling in futex_lock_pi() calls doc: Fix misnamed FUTEX_CMP_REQUEUE_PI op constants locking/Documentation: Update code path softirq/preempt: Add missing current->preempt_disable_ip update locking/osq: No need for load/acquire when acquire-polling locking/mcs: Better differentiate between MCS variants locking/mutex: Introduce ww_mutex_set_context_slowpath() locking/mutex: Move MCS related comments to proper location locking/mutex: Checking the stamp is WW only
2015-02-09Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds14-518/+440
Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle are: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) rcu: Initialize tiny RCU stall-warning timeouts at boot rcu: Fix RCU CPU stall detection in tiny implementation rcu: Add GP-kthread-starvation checks to CPU stall warnings rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors rcu: Optionally run grace-period kthreads at real-time priority ksoftirqd: Use new cond_resched_rcu_qs() function ksoftirqd: Enable IRQs and call cond_resched() before poking RCU rcutorture: Add more diagnostics in rcu_barrier() test failure case torture: Flag console.log file to prevent holdovers from earlier runs torture: Add "-enable-kvm -soundhw pcspk" to qemu command line rcutorture: Handle different mpstat versions rcutorture: Check from beginning to end of grace period rcu: Remove redundant rcu_batches_completed() declaration rcutorture: Drop rcu_torture_completed() and friends rcu: Provide rcu_batches_completed_sched() for TINY_RCU rcutorture: Use unsigned for Reader Batch computations rcutorture: Make build-output parsing correctly flag RCU's warnings rcu: Make _batches_completed() functions return unsigned long rcutorture: Issue warnings on close calls due to Reader Batch blows documentation: Fix smp typo in memory-barriers.txt ...
2015-02-06Merge branches 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull timer and x86 fix from Ingo Molnar: "A CLOCK_TAI early expiry fix and an x86 microcode driver oops fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Fix incorrect tai offset calculation for non high-res timer systems * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Return error from driver init code when loader is disabled
2015-02-06Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-7/+34
Pull scheduler fixes from Ingo Molnar: "Misc fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix deadline parameter modification handling sched/wait: Remove might_sleep() from wait_event_cmd() sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumask sched/fair: Avoid using uninitialized variable in preferred_group_nid()
2015-02-06Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+2
Pull core kernel fixes from Ingo Molnar: "Two liblockdep fixes and a CPU hotplug race fix" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/liblockdep: don't include host headers tools/liblockdep: ignore generated .so file smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()
2015-02-05hrtimer: Fix incorrect tai offset calculation for non high-res timer systemsJohn Stultz1-1/+1
I noticed some CLOCK_TAI timer test failures on one of my less-frequently used configurations. And after digging in I found in 76f4108892d9 (Cleanup hrtimer accessors to the timekepeing state), the hrtimer_get_softirq_time tai offset calucation was incorrectly rewritten, as the tai offset we return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME. This results in CLOCK_TAI timers expiring early on non-highres capable machines. This patch fixes the issue, calculating the tai time properly from the monotonic base. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable <stable@vger.kernel.org> # 3.17+ Link: http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Decouple unthrottling and rotatingMark Rutland1-52/+29
Currently the adjusments made as part of perf_event_task_tick() use the percpu rotation lists to iterate over any active PMU contexts, but these are not used by the context rotation code, having been replaced by separate (per-context) hrtimer callbacks. However, some manipulation of the rotation lists (i.e. removal of contexts) has remained in perf_rotate_context(). This leads to the following issues: * Contexts are not always removed from the rotation lists. Removal of PMUs which have been placed in rotation lists, but have not been removed by a hrtimer callback can result in corruption of the rotation lists (when memory backing the context is freed). This has been observed to result in hangs when PMU drivers built as modules are inserted and removed around the creation of events for said PMUs. * Contexts which do not require rotation may be removed from the rotation lists as a result of a hrtimer, and will not be considered by the unthrottling code in perf_event_task_tick. This patch fixes the issue by updating the rotation ist when events are scheduled in/out, ensuring that each rotation list stays in sync with the HW state. As each event holds a refcount on the module of its PMU, this ensures that when a PMU module is unloaded none of its CPU contexts can be in a rotation list. By maintaining a list of perf_event_contexts rather than perf_event_cpu_contexts, we don't need separate paths to handle the cpu and task contexts, which also makes the code a little simpler. As the rotation_list variables are not used for rotation, these are renamed to active_ctx_list, which better matches their current function. perf_pmu_rotate_{start,stop} are renamed to perf_pmu_ctx_{activate,deactivate}. Reported-by: Johannes Jensen <johannes.jensen@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134511.GR17721@leverpostej Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Drop module reference on event init failureMark Rutland1-12/+16
When initialising an event, perf_init_event will call try_module_get() to ensure that the PMU's module cannot be removed for the lifetime of the event, with __free_event() dropping the reference when the event is finally destroyed. If something fails after the event has been initialised, but before the event is installed, perf_event_alloc will drop the reference on the module. However, if we fail to initialise an event for some reason (e.g. we ask an uncore PMU to perform sampling, and it refuses to initialise the event), we do not drop the refcount. If we try to open such a bogus event without a precise IDR type, we will loop over each PMU in the pmus list, incrementing each of their refcounts without decrementing them. This patch adds a module_put when pmu->event_init(event) fails, ensuring that the refcounts are balanced in failure cases. As the innards of the precise and search based initialisation look very similar, this logic is hoisted out into a new helper function. While the early return for the failed try_module_get is removed from the search case, this is handled by the remaining return when ret is not -ENOENT. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420642611-22667-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Use POLLIN instead of POLL_IN for perf poll data in flagJiri Olsa1-1/+2
Currently we flag available data (via poll syscall) on perf fd with POLL_IN macro, which is normally used for SIGIO interface. We've been lucky, because POLLIN (0x1) is subset of POLL_IN (0x20001) and sys_poll (do_pollfd function) cut the extra bit out (0x20000). Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422467678-22341-1-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Fix put_event() ctx lockPeter Zijlstra1-5/+12
So what I suspect; but I'm in zombie mode today it seems; is that while I initially thought that it was impossible for ctx to change when refcount dropped to 0, I now suspect its possible. Note that until perf_remove_from_context() the event is still active and visible on the lists. So a concurrent sys_perf_event_open() from another task into this task can race. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: mark.rutland@arm.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150129134434.GB26304@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Fix move_group() orderPeter Zijlstra (Intel)1-9/+47
Jiri reported triggering the new WARN_ON_ONCE in event_sched_out over the weekend: event_sched_out.isra.79+0x2b9/0x2d0 group_sched_out+0x69/0xc0 ctx_sched_out+0x106/0x130 task_ctx_sched_out+0x37/0x70 __perf_install_in_context+0x70/0x1a0 remote_function+0x48/0x60 generic_exec_single+0x15b/0x1d0 smp_call_function_single+0x67/0xa0 task_function_call+0x53/0x80 perf_install_in_context+0x8b/0x110 I think the below should cure this; if we install a group leader it will iterate the (still intact) group list and find its siblings and try and install those too -- even though those still have the old event->ctx -- in the new ctx. Upon installing the first group sibling we'd try and schedule out the group and trigger the above warn. Fix this by installing the group leader last, installing siblings would have no effect, they're not reachable through the group lists and therefore we don't schedule them. Also delay resetting the state until we're absolutely sure the events are quiescent. Reported-by: Jiri Olsa <jolsa@redhat.com> Reported-by: vincent.weaver@maine.edu Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150126162639.GA21418@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Fix event->ctx lockingPeter Zijlstra1-37/+207
There have been a few reported issues wrt. the lack of locking around changing event->ctx. This patch tries to address those. It avoids the whole rwsem thing; and while it appears to work, please give it some thought in review. What I did fail at is sensible runtime checks on the use of event->ctx, the RCU use makes it very hard. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04perf: Add a bit of paranoiaPeter Zijlstra1-1/+18
Add a few WARN()s to catch things that should never happen. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.150481799@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04Merge tag 'v3.19-rc7' into perf/core, to merge fixes before applying new changesIngo Molnar2-11/+19
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04locking/rtmutex: Optimize setting task running after being blockedDavidlohr Bueso1-4/+3
We explicitly mark the task running after returning from a __rt_mutex_slowlock() call, which does the actual sleeping via wait-wake-trylocking. As such, this patch does two things: (1) refactors the code so that setting current to TASK_RUNNING is done by __rt_mutex_slowlock(), and not by the callers. The downside to this is that it becomes a bit unclear when at what point we block. As such I've added a comment that the task blocks when calling __rt_mutex_slowlock() so readers can figure out when it is running again. (2) relaxes setting current's state through __set_current_state(), instead of it's more expensive barrier alternative. There was no need for the implied barrier as we're obviously not planning on blocking. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422857784.18096.1.camel@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04locking/rwsem: Use task->state helpersDavidlohr Bueso2-3/+2
Call __set_task_state() instead of assigning the new state directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping track of who last changed the state. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Michel Lespinasse <walken@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422257769-14083-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/completion: Add lock-free checking of the blocking caseNicholas Mc Guire1-0/+9
The "thread would block" case can be checked without grabbing ->wait.lock. [ If the check does not return early then grab the lock and recheck. A memory barrier is not needed as complete() and complete_all() imply a barrier. The ACCESS_ONCE() is needed for calls in a loop that, if inlined, could optimize out the re-fetching of x->done. ] Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422013307-13200-1-git-send-email-der.herr@hofr.at Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/completion: Remove unnecessary ->wait.lock serialization when reading completion stateNicholas Mc Guire1-8/+1
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421467534-22834-1-git-send-email-der.herr@hofr.at Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04locking/mutex: Explicitly mark task as running after wakeupDavidlohr Bueso1-0/+2
By the time we wake up and get the lock after being asleep in the slowpath, we better be running. As good practice, be explicit about this and avoid any mischief. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421717961.4903.11.camel@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04Merge tag 'v3.19-rc7' into locking/core, to refresh the branch before applying new changesIngo Molnar17-85/+235
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/Documentation: Remove unneeded wordSharon Dvir1-1/+1
The second 'mutex' shouldn't be there, it can't be about the mutex, as the mutex can't be freed, but unlocked, the memory where the mutex resides however, can be freed. Signed-off-by: Sharon Dvir <sharon.dvir1@mail.huji.ac.il> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422827252-31363-1-git-send-email-sharon.dvir1@mail.huji.ac.il Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched: Pull resched loop to __schedule() callersFrederic Weisbecker1-4/+7
__schedule() disables preemption during its job and re-enables it afterward without doing a preemption check to avoid recursion. But if an event happens after the context switch which requires rescheduling, we need to check again if a task of a higher priority needs the CPU. A preempt irq can raise such a situation. To handle that, __schedule() loops on need_resched(). But preempt_schedule_*() functions, which call __schedule(), also loop on need_resched() to handle missed preempt irqs. Hence we end up with the same loop happening twice. Lets simplify that by attributing the need_resched() loop responsibility to all __schedule() callers. There is a risk that the outer loop now handles reschedules that used to be handled by the inner loop with the added overhead of caller details (inc/dec of PREEMPT_ACTIVE, irq save/restore) but assuming those inner rescheduling loop weren't too frequent, this shouldn't matter. Especially since the whole preemption path is now losing one loop in any case. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1422404652-29067-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Remove cpu_active_mask from cpudl_find()Xunlei Pang1-2/+1
cpu_active_mask is rarely changed (only on hotplug), so remove this operation to gain a little performance. If there is a change in cpu_active_mask, rq_online_dl() and rq_offline_dl() should take care of it normally, so cpudl::free_cpus carries enough information for us. For the rare case when a task is put onto a dying cpu (which rq_offline_dl() can't handle in a timely fashion), it will be handled through _cpu_down()->...->multi_cpu_stop()->migration_call() ->migrate_tasks(), preventing the task from hanging on the dead cpu. Cc: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> [peterz: changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421642980-10045-2-git-send-email-pang.xunlei@linaro.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched: Fix hrtick_start() on UPWanpeng Li1-0/+5
The commit 177ef2a6315e ("sched/deadline: Fix a precision problem in the microseconds range") forgot to change the UP version of hrtick_start(), do so now. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Fixes: 177ef2a6315e ("sched/deadline: Fix a precision problem in the microseconds range") [ Fixed the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-7-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Avoid pointless __setscheduler()Wanpeng Li1-1/+15
There is no need to dequeue/enqueue and push/pull if there are no scheduling parameters changed for the DL class. Both fair and RT classes already check if parameters changed for them to avoid unnecessary overhead. This patch add the parameters changed test for the DL class in order to reduce overhead. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [ Fixed up the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-5-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix stale yield statePeter Zijlstra1-19/+19
When we fail to start the deadline timer in update_curr_dl(), we forget to clear ->dl_yielded, resulting in wrecked time keeping. Since the natural place to clear both ->dl_yielded and ->dl_throttled is in replenish_dl_entity(); both are after all waiting for that event; make it so. Luckily since 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") the task_on_rq_queued() condition in dl_task_timer() must be true, and can therefore call enqueue_task_dl() unconditionally. Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-4-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix hrtick for a non-leftmost taskWanpeng Li1-1/+7
After update_curr_dl() the current task might not be the leftmost task anymore. In that case do not start a new hrtick for it. In this case NEED_RESCHED will be set and the next schedule will start the hrtick for the new task if and when appropriate. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Acked-by: Juri Lelli <juri.lelli@arm.com> [ Rewrote the changelog and comment. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-2-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04Merge branch 'sched/urgent' into sched/core, to merge fixes before applying new patchesIngo Molnar2-6/+30
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/deadline: Fix deadline parameter modification handlingPeter Zijlstra2-6/+30
Commit 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") removed the hrtimer_try_cancel() function call out from init_dl_task_timer(), which gets called from __setparam_dl(). The result is that we can now re-init the timer while its active -- this is bad and corrupts timer state. Furthermore; changing the parameters of an active deadline task is tricky in that you want to maintain guarantees, while immediately effective change would allow one to circumvent the CBS guarantees -- this too is bad, as one (bad) task should not be able to affect the others. Rework things to avoid both problems. We only need to initialize the timer once, so move that to __sched_fork() for new tasks. Then make sure __setparam_dl() doesn't affect the current running state but only updates the parameters used to calculate the next scheduling period -- this guarantees the CBS functions as expected (albeit slightly pessimistic). This however means we need to make sure __dl_clear_params() needs to reset the active state otherwise new (and tasks flipping between classes) will not properly (re)compute their first instance. Todo: close class flipping CBS hole. Todo: implement delayed BW release. Reported-by: Luca Abeni <luca.abeni@unitn.it> Acked-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Luca Abeni <luca.abeni@unitn.it> Fixes: 67dfa1b756f2 ("sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150128140803.GF23038@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-01sched: don't cause task state changes in nested sleep debuggingLinus Torvalds1-3/+2
Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org> Tested-by: Rafael J Wysocki <rjw@rjwysocki.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-30Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+13
Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also an event groups fix, two PMU driver fixes and a CPU model variant addition" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Tighten (and fix) the grouping condition perf/x86/intel: Add model number for Airmont perf/rapl: Fix crash in rapl_scale() perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization perf probe: Fix probing kretprobes perf symbols: Introduce 'for' method to iterate over the symbols with a given name perf probe: Do not rely on map__load() filter to find symbols perf symbols: Introduce method to iterate symbols ordered by name perf symbols: Return the first entry with a given name in find_by_name method perf annotate: Fix memory leaks in LOCK handling perf annotate: Handle ins parsing failures perf scripting perl: Force to use stdbool perf evlist: Remove extraneous 'was' on error message
2015-01-30sched/deadline: Modify cpudl::free_cpus to reflect rd->onlineXunlei Pang3-7/+28
Currently, cpudl::free_cpus contains all CPUs during init, see cpudl_init(). When calling cpudl_find(), we have to add rd->span to avoid selecting the cpu outside the current root domain, because cpus_allowed cannot be depended on when performing clustered scheduling using the cpuset, see find_later_rq(). This patch adds cpudl_set_freecpu() and cpudl_clear_freecpu() for changing cpudl::free_cpus when doing rq_online_dl()/rq_offline_dl(), so we can avoid the rd->span operation when calling cpudl_find() in find_later_rq(). Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421642980-10045-1-git-send-email-pang.xunlei@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched/idle: Add missing checks to the exit condition of cpu_idle_poll()Preeti U Murthy1-1/+2
cpu_idle_poll() is entered into when either the cpu_idle_force_poll is set or tick_check_broadcast_expired() returns true. The exit condition from cpu_idle_poll() is tif_need_resched(). However this does not take into account scenarios where cpu_idle_force_poll changes or tick_check_broadcast_expired() returns false, without setting the resched flag. So a cpu will be caught in cpu_idle_poll() needlessly, thereby wasting power. Add an explicit check on cpu_idle_force_poll and tick_check_broadcast_expired() to the exit condition of cpu_idle_poll() to avoid this. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150121105655.15279.59626.stgit@preeti.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched: Fix missing preemption opportunityFrederic Weisbecker1-21/+19
If an interrupt fires in cond_resched(), between the call to __schedule() and the PREEMPT_ACTIVE count decrementation, and that interrupt sets TIF_NEED_RESCHED, the call to preempt_schedule_irq() will be ignored due to the PREEMPT_ACTIVE count. This kind of scenario, with irq preemption being delayed because it's interrupting a preempt-disabled area, is usually fixed up after preemption is re-enabled back with an explicit call to preempt_schedule(). This is what preempt_enable() does but a raw preempt count decrement as performed by __preempt_count_sub(PREEMPT_ACTIVE) doesn't handle delayed preemption check. Therefore when such a race happens, the rescheduling is going to be delayed until the next scheduler or preemption entrypoint. This can be a problem for scheduler latency sensitive workloads. Lets fix that by consolidating cond_resched() with preempt_schedule() internals. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Ingo Molnar <mingo@kernel.org> Original-patch-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1421946484-9298-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30sched/rt: Reduce rq lock contention by eliminating locking of non-feasible targetTim Chen1-1/+16
This patch adds checks that prevens futile attempts to move rt tasks to a CPU with active tasks of equal or higher priority. This reduces run queue lock contention and improves the performance of a well known OLTP benchmark by 0.7%. Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Cc: Suruchi Kadu <suruchi.a.kadu@intel.com> Cc: Doug Nelson<doug.nelson@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421430374.2399.27.camel@schen9-desk2.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-30Merge branch 'sched/urgent' into sched/coreIngo Molnar27-251/+511
Merge all pending fixes and refresh the tree, before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar15-74/+216
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28sched: Fix crash if cpuset_cpumask_can_shrink() is passed an empty cpumaskMike Galbraith1-0/+3
While creating an exclusive cpuset, we passed cpuset_cpumask_can_shrink() an empty cpumask (cur), and dl_bw_of(cpumask_any(cur)) made boom with it: CPU: 0 PID: 6942 Comm: shield.sh Not tainted 3.19.0-master #19 Hardware name: MEDIONPC MS-7502/MS-7502, BIOS 6.00 PG 12/26/2007 task: ffff880224552450 ti: ffff8800caab8000 task.ti: ffff8800caab8000 RIP: 0010:[<ffffffff81073846>] [<ffffffff81073846>] cpuset_cpumask_can_shrink+0x56/0xb0 [...] Call Trace: [<ffffffff810cb82a>] validate_change+0x18a/0x200 [<ffffffff810cc877>] cpuset_write_resmask+0x3b7/0x720 [<ffffffff810c4d58>] cgroup_file_write+0x38/0x100 [<ffffffff811d953a>] kernfs_fop_write+0x12a/0x180 [<ffffffff8116e1a3>] vfs_write+0xb3/0x1d0 [<ffffffff8116ed06>] SyS_write+0x46/0xb0 [<ffffffff8159ced6>] system_call_fastpath+0x16/0x1b Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Acked-by: Zefan Li <lizefan@huawei.com> Fixes: f82f80426f7a ("sched/deadline: Ensure that updates to exclusive cpusets don't break AC") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422417235.5716.5.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28perf: Tighten (and fix) the grouping conditionPeter Zijlstra1-2/+13
The fix from 9fc81d87420d ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87420d ("perf: Fix events installation during moving group") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>