locking/qspinlock: Prefetch the next node cacheline

A queue head CPU, after acquiring the lock, will have to notify the next CPU in the wait queue that it has became the new queue head. This involves loading a new cacheline from the MCS node of the next CPU. That operation can be expensive and add to the latency of locking operation. This patch addes code to optmistically prefetch the next MCS node cacheline if the next pointer is defined and it has been spinning for the MCS lock for a while. This reduces the locking latency and improves the system throughput. The performance change will depend on whether the prefetch overhead can be hidden within the latency of the lock spin loop. On really short critical section, there may not be performance gain at all. With longer critical section, however, it was found to have a performance boost of 5-10% over a range of different queue depths with a spinlock loop microbenchmark. Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447114167-47185-3-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Waiman Long <Waiman.Long@hpe.com> 2015-11-09 19:09:22 -0500
committer: Ingo Molnar <mingo@kernel.org> 2015-11-23 10:01:59 +0100
commit: 81b5598665a24083dd889fbd8cb08b0d8de4b8ad (patch)
tree: 1b012514afb1a297a20cbfd68ca1be322700c390 /kernel/locking
parent: locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg() (diff)
download: linux-dev-81b5598665a24083dd889fbd8cb08b0d8de4b8ad.tar.xz
linux-dev-81b5598665a24083dd889fbd8cb08b0d8de4b8ad.zip
1 files changed, 10 insertions, 0 deletions
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 7868418ea586..365b2033f55e 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -407,6 +407,16 @@ queue:
 
 		pv_wait_node(node);
 		arch_mcs_spin_lock_contended(&node->locked);
+
+		/*
+		 * While waiting for the MCS lock, the next pointer may have
+		 * been set by another lock waiter. We optimistically load
+		 * the next pointer & prefetch the cacheline for writing
+		 * to reduce latency in the upcoming MCS unlock operation.
+		 */
+		next = READ_ONCE(node->next);
+		if (next)
+			prefetchw(next);
 	}
 
 	/*
author	Waiman Long <Waiman.Long@hpe.com>	2015-11-09 19:09:22 -0500
committer	Ingo Molnar <mingo@kernel.org>	2015-11-23 10:01:59 +0100
commit	81b5598665a24083dd889fbd8cb08b0d8de4b8ad (patch)
tree	1b012514afb1a297a20cbfd68ca1be322700c390 /kernel/locking
parent	locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg() (diff)
download	linux-dev-81b5598665a24083dd889fbd8cb08b0d8de4b8ad.tar.xz linux-dev-81b5598665a24083dd889fbd8cb08b0d8de4b8ad.zip