sched, time: Atomically increment stime & utime

The functions task_cputime_adjusted and thread_group_cputime_adjusted() can be called locklessly, as well as concurrently on many different CPUs. This can occasionally lead to the utime and stime reported by times(), and other syscalls like it, going backward. The cause for this appears to be multiple threads racing in cputime_adjust(), both with values for utime or stime that is larger than the original, but each with a different value. Sometimes the larger value gets saved first, only to be immediately overwritten with a smaller value by another thread. Using atomic exchange prevents that problem, and ensures time progresses monotonically. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: umgwanakikbuti@gmail.com Cc: fweisbec@gmail.com Cc: akpm@linux-foundation.org Cc: srao@redhat.com Cc: lwoodman@redhat.com Cc: atheurer@redhat.com Cc: oleg@redhat.com Link: http://lkml.kernel.org/r/1408133138-22048-4-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Rik van Riel <riel@redhat.com> 2014-08-15 16:05:38 -0400
committer: Ingo Molnar <mingo@kernel.org> 2014-09-08 08:17:02 +0200
commit: eb1b4af0a64ac7bb0ee36f579c1c7cefcbc3ac2c (patch)
tree: b8a48561d5a51a5b5249d987f1ecfa97a0a44fbc /kernel/sched
parent: time, signal: Protect resource use statistics with seqlock (diff)
download: linux-dev-eb1b4af0a64ac7bb0ee36f579c1c7cefcbc3ac2c.tar.xz
linux-dev-eb1b4af0a64ac7bb0ee36f579c1c7cefcbc3ac2c.zip
1 files changed, 5 insertions, 2 deletions
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 49b7cfe98f7a..2b57031afc19 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -602,9 +602,12 @@ static void cputime_adjust(struct task_cputime *curr,
 	 * If the tick based count grows faster than the scheduler one,
 	 * the result of the scaling may go backward.
 	 * Let's enforce monotonicity.
+	 * Atomic exchange protects against concurrent cputime_adjust().
 	 */
-	prev->stime = max(prev->stime, stime);
-	prev->utime = max(prev->utime, utime);
+	while (stime > (rtime = ACCESS_ONCE(prev->stime)))
+		cmpxchg(&prev->stime, rtime, stime);
+	while (utime > (rtime = ACCESS_ONCE(prev->utime)))
+		cmpxchg(&prev->utime, rtime, utime);
 
 out:
 	*ut = prev->utime;
author	Rik van Riel <riel@redhat.com>	2014-08-15 16:05:38 -0400
committer	Ingo Molnar <mingo@kernel.org>	2014-09-08 08:17:02 +0200
commit	eb1b4af0a64ac7bb0ee36f579c1c7cefcbc3ac2c (patch)
tree	b8a48561d5a51a5b5249d987f1ecfa97a0a44fbc /kernel/sched
parent	time, signal: Protect resource use statistics with seqlock (diff)
download	linux-dev-eb1b4af0a64ac7bb0ee36f579c1c7cefcbc3ac2c.tar.xz linux-dev-eb1b4af0a64ac7bb0ee36f579c1c7cefcbc3ac2c.zip