aboutsummaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorPeter Zijlstra <peterz@infradead.org>2016-01-18 15:27:07 +0100
committerIngo Molnar <mingo@kernel.org>2016-02-29 09:53:05 +0100
commitff77e468535987b3d21b7bd4da15608ea3ce7d0b (patch)
treed35cdc7929da435ababea451f8f7e81e76fc76b0 /include
parentsched/core: Remove duplicated sched_group_set_shares() prototype (diff)
downloadlinux-dev-ff77e468535987b3d21b7bd4da15608ea3ce7d0b.tar.xz
linux-dev-ff77e468535987b3d21b7bd4da15608ea3ce7d0b.zip
sched/rt: Fix PI handling vs. sched_setscheduler()
Andrea Parri reported: > I found that the following scenario (with CONFIG_RT_GROUP_SCHED=y) is not > handled correctly: > > T1 (prio = 20) > lock(rtmutex); > > T2 (prio = 20) > blocks on rtmutex (rt_nr_boosted = 0 on T1's rq) > > T1 (prio = 20) > sys_set_scheduler(prio = 0) > [new_effective_prio == oldprio] > T1 prio = 20 (rt_nr_boosted = 0 on T1's rq) > > The last step is incorrect as T1 is now boosted (c.f., rt_se_boosted()); > in particular, if we continue with > > T1 (prio = 20) > unlock(rtmutex) > wakeup(T2) > adjust_prio(T1) > [prio != rt_mutex_getprio(T1)] > dequeue(T1) > rt_nr_boosted = (unsigned long)(-1) > ... > T1 prio = 0 > > then we end up leaving rt_nr_boosted in an "inconsistent" state. > > The simple program attached could reproduce the previous scenario; note > that, as a consequence of the presence of this state, the "assertion" > > WARN_ON(!rt_nr_running && rt_nr_boosted) > > from dec_rt_group() may trigger. So normally we dequeue/enqueue tasks in sched_setscheduler(), which would ensure the accounting stays correct. However in the early PI path we fail to do so. So this was introduced at around v3.14, by: c365c292d059 ("sched: Consider pi boosting in setscheduler()") which fixed another problem exactly because that dequeue/enqueue, joy. Fix this by teaching rt about DEQUEUE_SAVE/ENQUEUE_RESTORE and have it preserve runqueue location with that option. This requires decoupling the on_rt_rq() state from being on the list. In order to allow for explicit movement during the SAVE/RESTORE, introduce {DE,EN}QUEUE_MOVE. We still must use SAVE/RESTORE in these cases to preserve other invariants. Respecting the SAVE/RESTORE flags also has the (nice) side-effect that things like sys_nice()/sys_sched_setaffinity() also do not reorder FIFO tasks (whereas they used to before this patch). Reported-by: Andrea Parri <parri.andrea@gmail.com> Tested-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'include')
-rw-r--r--include/linux/sched.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a292c4b7e94c..87e5f9886ac8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1293,6 +1293,8 @@ struct sched_rt_entity {
unsigned long timeout;
unsigned long watchdog_stamp;
unsigned int time_slice;
+ unsigned short on_rq;
+ unsigned short on_list;
struct sched_rt_entity *back;
#ifdef CONFIG_RT_GROUP_SCHED