diff options
Diffstat (limited to 'tools/memory-model')
-rw-r--r-- | tools/memory-model/Documentation/explanation.txt | 95 | ||||
-rw-r--r-- | tools/memory-model/Documentation/litmus-tests.txt | 37 | ||||
-rw-r--r-- | tools/memory-model/README | 15 | ||||
-rw-r--r-- | tools/memory-model/linux-kernel.cat | 6 | ||||
-rw-r--r-- | tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus | 35 | ||||
-rw-r--r-- | tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus | 33 | ||||
-rw-r--r-- | tools/memory-model/litmus-tests/README | 8 |
7 files changed, 196 insertions, 33 deletions
diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt index 5d72f3112e56..ee819a402b69 100644 --- a/tools/memory-model/Documentation/explanation.txt +++ b/tools/memory-model/Documentation/explanation.txt @@ -485,6 +485,57 @@ have R ->po X. It wouldn't make sense for a computation to depend somehow on a value that doesn't get loaded from shared memory until later in the code! +Here's a trick question: When is a dependency not a dependency? Answer: +When it is purely syntactic rather than semantic. We say a dependency +between two accesses is purely syntactic if the second access doesn't +actually depend on the result of the first. Here is a trivial example: + + r1 = READ_ONCE(x); + WRITE_ONCE(y, r1 * 0); + +There appears to be a data dependency from the load of x to the store +of y, since the value to be stored is computed from the value that was +loaded. But in fact, the value stored does not really depend on +anything since it will always be 0. Thus the data dependency is only +syntactic (it appears to exist in the code) but not semantic (the +second access will always be the same, regardless of the value of the +first access). Given code like this, a compiler could simply discard +the value returned by the load from x, which would certainly destroy +any dependency. (The compiler is not permitted to eliminate entirely +the load generated for a READ_ONCE() -- that's one of the nice +properties of READ_ONCE() -- but it is allowed to ignore the load's +value.) + +It's natural to object that no one in their right mind would write +code like the above. However, macro expansions can easily give rise +to this sort of thing, in ways that often are not apparent to the +programmer. + +Another mechanism that can lead to purely syntactic dependencies is +related to the notion of "undefined behavior". Certain program +behaviors are called "undefined" in the C language specification, +which means that when they occur there are no guarantees at all about +the outcome. Consider the following example: + + int a[1]; + int i; + + r1 = READ_ONCE(i); + r2 = READ_ONCE(a[r1]); + +Access beyond the end or before the beginning of an array is one kind +of undefined behavior. Therefore the compiler doesn't have to worry +about what will happen if r1 is nonzero, and it can assume that r1 +will always be zero regardless of the value actually loaded from i. +(If the assumption turns out to be wrong the resulting behavior will +be undefined anyway, so the compiler doesn't care!) Thus the value +from the load can be discarded, breaking the address dependency. + +The LKMM is unaware that purely syntactic dependencies are different +from semantic dependencies and therefore mistakenly predicts that the +accesses in the two examples above will be ordered. This is another +example of how the compiler can undermine the memory model. Be warned. + THE READS-FROM RELATION: rf, rfi, and rfe ----------------------------------------- @@ -1813,15 +1864,16 @@ spin_trylock() -- we can call these things lock-releases and lock-acquires -- have two properties beyond those of ordinary releases and acquires. -First, when a lock-acquire reads from a lock-release, the LKMM -requires that every instruction po-before the lock-release must -execute before any instruction po-after the lock-acquire. This would -naturally hold if the release and acquire operations were on different -CPUs, but the LKMM says it holds even when they are on the same CPU. -For example: +First, when a lock-acquire reads from or is po-after a lock-release, +the LKMM requires that every instruction po-before the lock-release +must execute before any instruction po-after the lock-acquire. This +would naturally hold if the release and acquire operations were on +different CPUs and accessed the same lock variable, but the LKMM says +it also holds when they are on the same CPU, even if they access +different lock variables. For example: int x, y; - spinlock_t s; + spinlock_t s, t; P0() { @@ -1830,9 +1882,9 @@ For example: spin_lock(&s); r1 = READ_ONCE(x); spin_unlock(&s); - spin_lock(&s); + spin_lock(&t); r2 = READ_ONCE(y); - spin_unlock(&s); + spin_unlock(&t); } P1() @@ -1842,10 +1894,10 @@ For example: WRITE_ONCE(x, 1); } -Here the second spin_lock() reads from the first spin_unlock(), and -therefore the load of x must execute before the load of y. Thus we -cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the -MP pattern). +Here the second spin_lock() is po-after the first spin_unlock(), and +therefore the load of x must execute before the load of y, even though +the two locking operations use different locks. Thus we cannot have +r1 = 1 and r2 = 0 at the end (this is an instance of the MP pattern). This requirement does not apply to ordinary release and acquire fences, only to lock-related operations. For instance, suppose P0() @@ -1872,13 +1924,13 @@ instructions in the following order: and thus it could load y before x, obtaining r2 = 0 and r1 = 1. -Second, when a lock-acquire reads from a lock-release, and some other -stores W and W' occur po-before the lock-release and po-after the -lock-acquire respectively, the LKMM requires that W must propagate to -each CPU before W' does. For example, consider: +Second, when a lock-acquire reads from or is po-after a lock-release, +and some other stores W and W' occur po-before the lock-release and +po-after the lock-acquire respectively, the LKMM requires that W must +propagate to each CPU before W' does. For example, consider: int x, y; - spinlock_t x; + spinlock_t s; P0() { @@ -1908,7 +1960,12 @@ each CPU before W' does. For example, consider: If r1 = 1 at the end then the spin_lock() in P1 must have read from the spin_unlock() in P0. Hence the store to x must propagate to P2 -before the store to y does, so we cannot have r2 = 1 and r3 = 0. +before the store to y does, so we cannot have r2 = 1 and r3 = 0. But +if P1 had used a lock variable different from s, the writes could have +propagated in either order. (On the other hand, if the code in P0 and +P1 had all executed on a single CPU, as in the example before this +one, then the writes would have propagated in order even if the two +critical sections used different lock variables.) These two special requirements for lock-release and lock-acquire do not arise from the operational model. Nevertheless, kernel developers diff --git a/tools/memory-model/Documentation/litmus-tests.txt b/tools/memory-model/Documentation/litmus-tests.txt index 8a9d5d2787f9..26554b1c5575 100644 --- a/tools/memory-model/Documentation/litmus-tests.txt +++ b/tools/memory-model/Documentation/litmus-tests.txt @@ -946,22 +946,39 @@ Limitations of the Linux-kernel memory model (LKMM) include: carrying a dependency, then the compiler can break that dependency by substituting a constant of that value. - Conversely, LKMM sometimes doesn't recognize that a particular - optimization is not allowed, and as a result, thinks that a - dependency is not present (because the optimization would break it). - The memory model misses some pretty obvious control dependencies - because of this limitation. A simple example is: + Conversely, LKMM will sometimes overestimate the amount of + reordering compilers and CPUs can carry out, leading it to miss + some pretty obvious cases of ordering. A simple example is: r1 = READ_ONCE(x); if (r1 == 0) smp_mb(); WRITE_ONCE(y, 1); - There is a control dependency from the READ_ONCE to the WRITE_ONCE, - even when r1 is nonzero, but LKMM doesn't realize this and thinks - that the write may execute before the read if r1 != 0. (Yes, that - doesn't make sense if you think about it, but the memory model's - intelligence is limited.) + The WRITE_ONCE() does not depend on the READ_ONCE(), and as a + result, LKMM does not claim ordering. However, even though no + dependency is present, the WRITE_ONCE() will not be executed before + the READ_ONCE(). There are two reasons for this: + + The presence of the smp_mb() in one of the branches + prevents the compiler from moving the WRITE_ONCE() + up before the "if" statement, since the compiler has + to assume that r1 will sometimes be 0 (but see the + comment below); + + CPUs do not execute stores before po-earlier conditional + branches, even in cases where the store occurs after the + two arms of the branch have recombined. + + It is clear that it is not dangerous in the slightest for LKMM to + make weaker guarantees than architectures. In fact, it is + desirable, as it gives compilers room for making optimizations. + For instance, suppose that a 0 value in r1 would trigger undefined + behavior elsewhere. Then a clever compiler might deduce that r1 + can never be 0 in the if condition. As a result, said clever + compiler might deem it safe to optimize away the smp_mb(), + eliminating the branch and any ordering an architecture would + guarantee otherwise. 2. Multiple access sizes for a single variable are not supported, and neither are misaligned or partially overlapping accesses. diff --git a/tools/memory-model/README b/tools/memory-model/README index 9a84c45504ab..dab38904206a 100644 --- a/tools/memory-model/README +++ b/tools/memory-model/README @@ -54,7 +54,8 @@ klitmus7 Compatibility Table -- 4.14 7.48 -- 4.15 -- 4.19 7.49 -- 4.20 -- 5.5 7.54 -- - 5.6 -- 7.56 -- + 5.6 -- 5.16 7.56 -- + 5.17 -- 7.56.1 -- ============ ========== @@ -195,6 +196,18 @@ litmus-tests are listed in litmus-tests/README. A great deal more litmus tests are available at https://github.com/paulmckrcu/litmus. + By "representative", it means the one in the litmus-tests + directory is: + + 1) simple, the number of threads should be relatively + small and each thread function should be relatively + simple. + 2) orthogonal, there should be no two litmus tests + describing the same aspect of the memory model. + 3) textbook, developers can easily copy-paste-modify + the litmus tests to use the patterns on their own + code. + lock.cat Provides a front-end analysis of lock acquisition and release, for example, associating a lock acquisition with the preceding diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat index 2a9b4fe4a84e..d70315fddef6 100644 --- a/tools/memory-model/linux-kernel.cat +++ b/tools/memory-model/linux-kernel.cat @@ -27,7 +27,7 @@ include "lock.cat" (* Release Acquire *) let acq-po = [Acquire] ; po ; [M] let po-rel = [M] ; po ; [Release] -let po-unlock-rf-lock-po = po ; [UL] ; rf ; [LKR] ; po +let po-unlock-lock-po = po ; [UL] ; (po|rf) ; [LKR] ; po (* Fences *) let R4rmb = R \ Noreturn (* Reads for which rmb works *) @@ -70,12 +70,12 @@ let rwdep = (dep | ctrl) ; [W] let overwrite = co | fr let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb) let to-r = addr | (dep ; [Marked] ; rfi) -let ppo = to-r | to-w | fence | (po-unlock-rf-lock-po & int) +let ppo = to-r | to-w | fence | (po-unlock-lock-po & int) (* Propagation: Ordering from release operations and strong fences. *) let A-cumul(r) = (rfe ; [Marked])? ; r let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb | - po-unlock-rf-lock-po) ; [Marked] + po-unlock-lock-po) ; [Marked] let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ; [Marked] ; rfe? ; [Marked] diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus new file mode 100644 index 000000000000..eb34123a6ffe --- /dev/null +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus @@ -0,0 +1,35 @@ +C LB+unlocklockonceonce+poacquireonce + +(* + * Result: Never + * + * If two locked critical sections execute on the same CPU, all accesses + * in the first must execute before any accesses in the second, even if the + * critical sections are protected by different locks. Note: Even when a + * write executes before a read, their memory effects can be reordered from + * the viewpoint of another CPU (the kind of reordering allowed by TSO). + *) + +{} + +P0(spinlock_t *s, spinlock_t *t, int *x, int *y) +{ + int r1; + + spin_lock(s); + r1 = READ_ONCE(*x); + spin_unlock(s); + spin_lock(t); + WRITE_ONCE(*y, 1); + spin_unlock(t); +} + +P1(int *x, int *y) +{ + int r2; + + r2 = smp_load_acquire(y); + WRITE_ONCE(*x, 1); +} + +exists (0:r1=1 /\ 1:r2=1) diff --git a/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus b/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus new file mode 100644 index 000000000000..2feb1398be71 --- /dev/null +++ b/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus @@ -0,0 +1,33 @@ +C MP+unlocklockonceonce+fencermbonceonce + +(* + * Result: Never + * + * If two locked critical sections execute on the same CPU, stores in the + * first must propagate to each CPU before stores in the second do, even if + * the critical sections are protected by different locks. + *) + +{} + +P0(spinlock_t *s, spinlock_t *t, int *x, int *y) +{ + spin_lock(s); + WRITE_ONCE(*x, 1); + spin_unlock(s); + spin_lock(t); + WRITE_ONCE(*y, 1); + spin_unlock(t); +} + +P1(int *x, int *y) +{ + int r1; + int r2; + + r1 = READ_ONCE(*y); + smp_rmb(); + r2 = READ_ONCE(*x); +} + +exists (1:r1=1 /\ 1:r2=0) diff --git a/tools/memory-model/litmus-tests/README b/tools/memory-model/litmus-tests/README index 681f9067fa9e..d311a0ff1ae6 100644 --- a/tools/memory-model/litmus-tests/README +++ b/tools/memory-model/litmus-tests/README @@ -63,6 +63,10 @@ LB+poonceonces.litmus As above, but with store-release replaced with WRITE_ONCE() and load-acquire replaced with READ_ONCE(). +LB+unlocklockonceonce+poacquireonce.litmus + Does a unlock+lock pair provides ordering guarantee between a + load and a store? + MP+onceassign+derefonce.litmus As below, but with rcu_assign_pointer() and an rcu_dereference(). @@ -90,6 +94,10 @@ MP+porevlocks.litmus As below, but with the first access of the writer process and the second access of reader process protected by a lock. +MP+unlocklockonceonce+fencermbonceonce.litmus + Does a unlock+lock pair provides ordering guarantee between a + store and another store? + MP+fencewmbonceonce+fencermbonceonce.litmus Does a smp_wmb() (between the stores) and an smp_rmb() (between the loads) suffice for the message-passing litmus test, where one |