aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation/locking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/locking')
-rw-r--r--Documentation/locking/futex-requeue-pi.rst2
-rw-r--r--Documentation/locking/hwspinlock.rst68
-rw-r--r--Documentation/locking/index.rst3
-rw-r--r--Documentation/locking/lockdep-design.rst281
-rw-r--r--Documentation/locking/locktorture.rst8
-rw-r--r--Documentation/locking/locktypes.rst42
-rw-r--r--Documentation/locking/mutex-design.rst22
-rw-r--r--Documentation/locking/percpu-rw-semaphore.rst4
-rw-r--r--Documentation/locking/seqlock.rst239
-rw-r--r--Documentation/locking/ww-mutex-design.rst6
10 files changed, 576 insertions, 99 deletions
diff --git a/Documentation/locking/futex-requeue-pi.rst b/Documentation/locking/futex-requeue-pi.rst
index 14ab5787b9a7..dd4ecf4528a4 100644
--- a/Documentation/locking/futex-requeue-pi.rst
+++ b/Documentation/locking/futex-requeue-pi.rst
@@ -5,7 +5,7 @@ Futex Requeue PI
Requeueing of tasks from a non-PI futex to a PI futex requires
special handling in order to ensure the underlying rt_mutex is never
left without an owner if it has waiters; doing so would break the PI
-boosting logic [see rt-mutex-desgin.txt] For the purposes of
+boosting logic [see rt-mutex-design.rst] For the purposes of
brevity, this action will be referred to as "requeue_pi" throughout
this document. Priority inheritance is abbreviated throughout as
"PI".
diff --git a/Documentation/locking/hwspinlock.rst b/Documentation/locking/hwspinlock.rst
index 6f03713b7003..a737c702a7d1 100644
--- a/Documentation/locking/hwspinlock.rst
+++ b/Documentation/locking/hwspinlock.rst
@@ -40,17 +40,6 @@ User API
::
- struct hwspinlock *hwspin_lock_request(void);
-
-Dynamically assign an hwspinlock and return its address, or NULL
-in case an unused hwspinlock isn't available. Users of this
-API will usually want to communicate the lock's id to the remote core
-before it can be used to achieve synchronization.
-
-Should be called from a process context (might sleep).
-
-::
-
struct hwspinlock *hwspin_lock_request_specific(unsigned int id);
Assign a specific hwspinlock id and return its address, or NULL
@@ -87,6 +76,17 @@ Should be called from a process context (might sleep).
::
+ int hwspin_lock_bust(struct hwspinlock *hwlock, unsigned int id);
+
+After verifying the owner of the hwspinlock, release a previously acquired
+hwspinlock; returns 0 on success, or an appropriate error code on failure
+(e.g. -EOPNOTSUPP if the bust operation is not defined for the specific
+hwspinlock).
+
+Should be called from a process context (might sleep).
+
+::
+
int hwspin_lock_timeout(struct hwspinlock *hwlock, unsigned int timeout);
Lock a previously-assigned hwspinlock with a timeout limit (specified in
@@ -301,17 +301,6 @@ The caller should **never** unlock an hwspinlock which is already unlocked.
Doing so is considered a bug (there is no protection against this).
This function will never sleep.
-::
-
- int hwspin_lock_get_id(struct hwspinlock *hwlock);
-
-Retrieve id number of a given hwspinlock. This is needed when an
-hwspinlock is dynamically assigned: before it can be used to achieve
-mutual exclusion with a remote cpu, the id number should be communicated
-to the remote task with which we want to synchronize.
-
-Returns the hwspinlock id number, or -EINVAL if hwlock is null.
-
Typical usage
=============
@@ -320,40 +309,7 @@ Typical usage
#include <linux/hwspinlock.h>
#include <linux/err.h>
- int hwspinlock_example1(void)
- {
- struct hwspinlock *hwlock;
- int ret;
-
- /* dynamically assign a hwspinlock */
- hwlock = hwspin_lock_request();
- if (!hwlock)
- ...
-
- id = hwspin_lock_get_id(hwlock);
- /* probably need to communicate id to a remote processor now */
-
- /* take the lock, spin for 1 sec if it's already taken */
- ret = hwspin_lock_timeout(hwlock, 1000);
- if (ret)
- ...
-
- /*
- * we took the lock, do our thing now, but do NOT sleep
- */
-
- /* release the lock */
- hwspin_unlock(hwlock);
-
- /* free the lock */
- ret = hwspin_lock_free(hwlock);
- if (ret)
- ...
-
- return ret;
- }
-
- int hwspinlock_example2(void)
+ int hwspinlock_example(void)
{
struct hwspinlock *hwlock;
int ret;
diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index d785878cad65..6a9ea96c8bcb 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: GPL-2.0
=======
-locking
+Locking
=======
.. toctree::
@@ -14,6 +14,7 @@ locking
mutex-design
rt-mutex-design
rt-mutex
+ seqlock
spinlocks
ww-mutex-design
preempt-locking
diff --git a/Documentation/locking/lockdep-design.rst b/Documentation/locking/lockdep-design.rst
index 23fcbc4d3fc0..56b90eea2731 100644
--- a/Documentation/locking/lockdep-design.rst
+++ b/Documentation/locking/lockdep-design.rst
@@ -29,7 +29,7 @@ the validator will shoot a splat if incorrect.
A lock-class's behavior is constructed by its instances collectively:
when the first instance of a lock-class is used after bootup the class
gets registered, then all (subsequent) instances will be mapped to the
-class and hence their usages and dependecies will contribute to those of
+class and hence their usages and dependencies will contribute to those of
the class. A lock-class does not go away when a lock instance does, but
it can be removed if the memory space of the lock class (static or
dynamic) is reclaimed, this happens for example when a module is
@@ -42,6 +42,7 @@ The validator tracks lock-class usage history and divides the usage into
(4 usages * n STATEs + 1) categories:
where the 4 usages can be:
+
- 'ever held in STATE context'
- 'ever held as readlock in STATE context'
- 'ever held with STATE enabled'
@@ -49,10 +50,12 @@ where the 4 usages can be:
where the n STATEs are coded in kernel/locking/lockdep_states.h and as of
now they include:
+
- hardirq
- softirq
where the last 1 category is:
+
- 'ever used' [ == !unused ]
When locking rules are violated, these usage bits are presented in the
@@ -96,13 +99,13 @@ exact case is for the lock as of the reporting time.
+--------------+-------------+--------------+
| | irq enabled | irq disabled |
+--------------+-------------+--------------+
- | ever in irq | ? | - |
+ | ever in irq | '?' | '-' |
+--------------+-------------+--------------+
- | never in irq | + | . |
+ | never in irq | '+' | '.' |
+--------------+-------------+--------------+
The character '-' suggests irq is disabled because if otherwise the
-charactor '?' would have been shown instead. Similar deduction can be
+character '?' would have been shown instead. Similar deduction can be
applied for '+' too.
Unused locks (e.g., mutexes) cannot be part of the cause of an error.
@@ -216,7 +219,7 @@ looks like this::
BD_MUTEX_PARTITION
};
-mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
+ mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
In this case the locking is done on a bdev object that is known to be a
partition.
@@ -334,7 +337,7 @@ Troubleshooting:
----------------
The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
-Exceeding this number will trigger the following lockdep warning:
+Exceeding this number will trigger the following lockdep warning::
(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
@@ -392,3 +395,269 @@ Run the command and save the output, then compare against the output from
a later run of this command to identify the leakers. This same output
can also help you find situations where runtime lock initialization has
been omitted.
+
+Recursive read locks:
+---------------------
+The whole of the rest document tries to prove a certain type of cycle is equivalent
+to deadlock possibility.
+
+There are three types of lockers: writers (i.e. exclusive lockers, like
+spin_lock() or write_lock()), non-recursive readers (i.e. shared lockers, like
+down_read()) and recursive readers (recursive shared lockers, like rcu_read_lock()).
+And we use the following notations of those lockers in the rest of the document:
+
+ W or E: stands for writers (exclusive lockers).
+ r: stands for non-recursive readers.
+ R: stands for recursive readers.
+ S: stands for all readers (non-recursive + recursive), as both are shared lockers.
+ N: stands for writers and non-recursive readers, as both are not recursive.
+
+Obviously, N is "r or W" and S is "r or R".
+
+Recursive readers, as their name indicates, are the lockers allowed to acquire
+even inside the critical section of another reader of the same lock instance,
+in other words, allowing nested read-side critical sections of one lock instance.
+
+While non-recursive readers will cause a self deadlock if trying to acquire inside
+the critical section of another reader of the same lock instance.
+
+The difference between recursive readers and non-recursive readers is because:
+recursive readers get blocked only by a write lock *holder*, while non-recursive
+readers could get blocked by a write lock *waiter*. Considering the follow
+example::
+
+ TASK A: TASK B:
+
+ read_lock(X);
+ write_lock(X);
+ read_lock_2(X);
+
+Task A gets the reader (no matter whether recursive or non-recursive) on X via
+read_lock() first. And when task B tries to acquire writer on X, it will block
+and become a waiter for writer on X. Now if read_lock_2() is recursive readers,
+task A will make progress, because writer waiters don't block recursive readers,
+and there is no deadlock. However, if read_lock_2() is non-recursive readers,
+it will get blocked by writer waiter B, and cause a self deadlock.
+
+Block conditions on readers/writers of the same lock instance:
+--------------------------------------------------------------
+There are simply four block conditions:
+
+1. Writers block other writers.
+2. Readers block writers.
+3. Writers block both recursive readers and non-recursive readers.
+4. And readers (recursive or not) don't block other recursive readers but
+ may block non-recursive readers (because of the potential co-existing
+ writer waiters)
+
+Block condition matrix, Y means the row blocks the column, and N means otherwise.
+
+ +---+---+---+---+
+ | | W | r | R |
+ +---+---+---+---+
+ | W | Y | Y | Y |
+ +---+---+---+---+
+ | r | Y | Y | N |
+ +---+---+---+---+
+ | R | Y | Y | N |
+ +---+---+---+---+
+
+ (W: writers, r: non-recursive readers, R: recursive readers)
+
+
+acquired recursively. Unlike non-recursive read locks, recursive read locks
+only get blocked by current write lock *holders* other than write lock
+*waiters*, for example::
+
+ TASK A: TASK B:
+
+ read_lock(X);
+
+ write_lock(X);
+
+ read_lock(X);
+
+is not a deadlock for recursive read locks, as while the task B is waiting for
+the lock X, the second read_lock() doesn't need to wait because it's a recursive
+read lock. However if the read_lock() is non-recursive read lock, then the above
+case is a deadlock, because even if the write_lock() in TASK B cannot get the
+lock, but it can block the second read_lock() in TASK A.
+
+Note that a lock can be a write lock (exclusive lock), a non-recursive read
+lock (non-recursive shared lock) or a recursive read lock (recursive shared
+lock), depending on the lock operations used to acquire it (more specifically,
+the value of the 'read' parameter for lock_acquire()). In other words, a single
+lock instance has three types of acquisition depending on the acquisition
+functions: exclusive, non-recursive read, and recursive read.
+
+To be concise, we call that write locks and non-recursive read locks as
+"non-recursive" locks and recursive read locks as "recursive" locks.
+
+Recursive locks don't block each other, while non-recursive locks do (this is
+even true for two non-recursive read locks). A non-recursive lock can block the
+corresponding recursive lock, and vice versa.
+
+A deadlock case with recursive locks involved is as follow::
+
+ TASK A: TASK B:
+
+ read_lock(X);
+ read_lock(Y);
+ write_lock(Y);
+ write_lock(X);
+
+Task A is waiting for task B to read_unlock() Y and task B is waiting for task
+A to read_unlock() X.
+
+Dependency types and strong dependency paths:
+---------------------------------------------
+Lock dependencies record the orders of the acquisitions of a pair of locks, and
+because there are 3 types for lockers, there are, in theory, 9 types of lock
+dependencies, but we can show that 4 types of lock dependencies are enough for
+deadlock detection.
+
+For each lock dependency::
+
+ L1 -> L2
+
+, which means lockdep has seen L1 held before L2 held in the same context at runtime.
+And in deadlock detection, we care whether we could get blocked on L2 with L1 held,
+IOW, whether there is a locker L3 that L1 blocks L3 and L2 gets blocked by L3. So
+we only care about 1) what L1 blocks and 2) what blocks L2. As a result, we can combine
+recursive readers and non-recursive readers for L1 (as they block the same types) and
+we can combine writers and non-recursive readers for L2 (as they get blocked by the
+same types).
+
+With the above combination for simplification, there are 4 types of dependency edges
+in the lockdep graph:
+
+1) -(ER)->:
+ exclusive writer to recursive reader dependency, "X -(ER)-> Y" means
+ X -> Y and X is a writer and Y is a recursive reader.
+
+2) -(EN)->:
+ exclusive writer to non-recursive locker dependency, "X -(EN)-> Y" means
+ X -> Y and X is a writer and Y is either a writer or non-recursive reader.
+
+3) -(SR)->:
+ shared reader to recursive reader dependency, "X -(SR)-> Y" means
+ X -> Y and X is a reader (recursive or not) and Y is a recursive reader.
+
+4) -(SN)->:
+ shared reader to non-recursive locker dependency, "X -(SN)-> Y" means
+ X -> Y and X is a reader (recursive or not) and Y is either a writer or
+ non-recursive reader.
+
+Note that given two locks, they may have multiple dependencies between them,
+for example::
+
+ TASK A:
+
+ read_lock(X);
+ write_lock(Y);
+ ...
+
+ TASK B:
+
+ write_lock(X);
+ write_lock(Y);
+
+, we have both X -(SN)-> Y and X -(EN)-> Y in the dependency graph.
+
+We use -(xN)-> to represent edges that are either -(EN)-> or -(SN)->, the
+similar for -(Ex)->, -(xR)-> and -(Sx)->
+
+A "path" is a series of conjunct dependency edges in the graph. And we define a
+"strong" path, which indicates the strong dependency throughout each dependency
+in the path, as the path that doesn't have two conjunct edges (dependencies) as
+-(xR)-> and -(Sx)->. In other words, a "strong" path is a path from a lock
+walking to another through the lock dependencies, and if X -> Y -> Z is in the
+path (where X, Y, Z are locks), and the walk from X to Y is through a -(SR)-> or
+-(ER)-> dependency, the walk from Y to Z must not be through a -(SN)-> or
+-(SR)-> dependency.
+
+We will see why the path is called "strong" in next section.
+
+Recursive Read Deadlock Detection:
+----------------------------------
+
+We now prove two things:
+
+Lemma 1:
+
+If there is a closed strong path (i.e. a strong circle), then there is a
+combination of locking sequences that causes deadlock. I.e. a strong circle is
+sufficient for deadlock detection.
+
+Lemma 2:
+
+If there is no closed strong path (i.e. strong circle), then there is no
+combination of locking sequences that could cause deadlock. I.e. strong
+circles are necessary for deadlock detection.
+
+With these two Lemmas, we can easily say a closed strong path is both sufficient
+and necessary for deadlocks, therefore a closed strong path is equivalent to
+deadlock possibility. As a closed strong path stands for a dependency chain that
+could cause deadlocks, so we call it "strong", considering there are dependency
+circles that won't cause deadlocks.
+
+Proof for sufficiency (Lemma 1):
+
+Let's say we have a strong circle::
+
+ L1 -> L2 ... -> Ln -> L1
+
+, which means we have dependencies::
+
+ L1 -> L2
+ L2 -> L3
+ ...
+ Ln-1 -> Ln
+ Ln -> L1
+
+We now can construct a combination of locking sequences that cause deadlock:
+
+Firstly let's make one CPU/task get the L1 in L1 -> L2, and then another get
+the L2 in L2 -> L3, and so on. After this, all of the Lx in Lx -> Lx+1 are
+held by different CPU/tasks.
+
+And then because we have L1 -> L2, so the holder of L1 is going to acquire L2
+in L1 -> L2, however since L2 is already held by another CPU/task, plus L1 ->
+L2 and L2 -> L3 are not -(xR)-> and -(Sx)-> (the definition of strong), which
+means either L2 in L1 -> L2 is a non-recursive locker (blocked by anyone) or
+the L2 in L2 -> L3, is writer (blocking anyone), therefore the holder of L1
+cannot get L2, it has to wait L2's holder to release.
+
+Moreover, we can have a similar conclusion for L2's holder: it has to wait L3's
+holder to release, and so on. We now can prove that Lx's holder has to wait for
+Lx+1's holder to release, and note that Ln+1 is L1, so we have a circular
+waiting scenario and nobody can get progress, therefore a deadlock.
+
+Proof for necessary (Lemma 2):
+
+Lemma 2 is equivalent to: If there is a deadlock scenario, then there must be a
+strong circle in the dependency graph.
+
+According to Wikipedia[1], if there is a deadlock, then there must be a circular
+waiting scenario, means there are N CPU/tasks, where CPU/task P1 is waiting for
+a lock held by P2, and P2 is waiting for a lock held by P3, ... and Pn is waiting
+for a lock held by P1. Let's name the lock Px is waiting as Lx, so since P1 is waiting
+for L1 and holding Ln, so we will have Ln -> L1 in the dependency graph. Similarly,
+we have L1 -> L2, L2 -> L3, ..., Ln-1 -> Ln in the dependency graph, which means we
+have a circle::
+
+ Ln -> L1 -> L2 -> ... -> Ln
+
+, and now let's prove the circle is strong:
+
+For a lock Lx, Px contributes the dependency Lx-1 -> Lx and Px+1 contributes
+the dependency Lx -> Lx+1, and since Px is waiting for Px+1 to release Lx,
+so it's impossible that Lx on Px+1 is a reader and Lx on Px is a recursive
+reader, because readers (no matter recursive or not) don't block recursive
+readers, therefore Lx-1 -> Lx and Lx -> Lx+1 cannot be a -(xR)-> -(Sx)-> pair,
+and this is true for any lock in the circle, therefore, the circle is strong.
+
+References:
+-----------
+[1]: https://en.wikipedia.org/wiki/Deadlock
+[2]: Shibu, K. (2009). Intro To Embedded Systems (1st ed.). Tata McGraw-Hill
diff --git a/Documentation/locking/locktorture.rst b/Documentation/locking/locktorture.rst
index 8012a74555e7..3e763f77a620 100644
--- a/Documentation/locking/locktorture.rst
+++ b/Documentation/locking/locktorture.rst
@@ -5,7 +5,7 @@ Kernel Lock Torture Test Operation
CONFIG_LOCK_TORTURE_TEST
========================
-The CONFIG LOCK_TORTURE_TEST config option provides a kernel module
+The CONFIG_LOCK_TORTURE_TEST config option provides a kernel module
that runs torture tests on core kernel locking primitives. The kernel
module, 'locktorture', may be built after the fact on the running
kernel to be tested, if desired. The tests periodically output status
@@ -67,7 +67,7 @@ torture_type
- "rtmutex_lock":
rtmutex_lock() and rtmutex_unlock() pairs.
- Kernel must have CONFIG_RT_MUTEX=y.
+ Kernel must have CONFIG_RT_MUTEXES=y.
- "rwsem_lock":
read/write down() and up() semaphore pairs.
@@ -113,7 +113,7 @@ stutter
without pausing.
shuffle_interval
- The number of seconds to keep the test threads affinitied
+ The number of seconds to keep the test threads affinitized
to a particular subset of the CPUs, defaults to 3 seconds.
Used in conjunction with test_no_idle_hz.
@@ -166,4 +166,4 @@ checked for such errors. The "rmmod" command forces a "SUCCESS",
two are self-explanatory, while the last indicates that while there
were no locking failures, CPU-hotplug problems were detected.
-Also see: Documentation/RCU/torture.txt
+Also see: Documentation/RCU/torture.rst
diff --git a/Documentation/locking/locktypes.rst b/Documentation/locking/locktypes.rst
index 1b577a8bf982..80c914f6eae7 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -10,7 +10,7 @@ Introduction
============
The kernel provides a variety of locking primitives which can be divided
-into two categories:
+into three categories:
- Sleeping locks
- CPU local locks
@@ -164,14 +164,14 @@ by disabling preemption or interrupts.
On non-PREEMPT_RT kernels local_lock operations map to the preemption and
interrupt disabling and enabling primitives:
- =========================== ======================
- local_lock(&llock) preempt_disable()
- local_unlock(&llock) preempt_enable()
- local_lock_irq(&llock) local_irq_disable()
- local_unlock_irq(&llock) local_irq_enable()
- local_lock_save(&llock) local_irq_save()
- local_lock_restore(&llock) local_irq_save()
- =========================== ======================
+ =============================== ======================
+ local_lock(&llock) preempt_disable()
+ local_unlock(&llock) preempt_enable()
+ local_lock_irq(&llock) local_irq_disable()
+ local_unlock_irq(&llock) local_irq_enable()
+ local_lock_irqsave(&llock) local_irq_save()
+ local_unlock_irqrestore(&llock) local_irq_restore()
+ =============================== ======================
The named scope of local_lock has two advantages over the regular
primitives:
@@ -211,9 +211,6 @@ raw_spinlock_t and spinlock_t
raw_spinlock_t
--------------
-raw_spinlock_t is a strict spinning lock implementation regardless of the
-kernel configuration including PREEMPT_RT enabled kernels.
-
raw_spinlock_t is a strict spinning lock implementation in all kernels,
including PREEMPT_RT kernels. Use raw_spinlock_t only in real critical
core code, low-level interrupt handling and places where disabling
@@ -247,7 +244,7 @@ based on rt_mutex which changes the semantics:
Non-PREEMPT_RT kernels disable preemption to get this effect.
PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
- preemption disabled. The lock disables softirq handlers and also
+ preemption enabled. The lock disables softirq handlers and also
prevents reentrancy due to task preemption.
PREEMPT_RT kernels preserve all other spinlock_t semantics:
@@ -353,14 +350,14 @@ protection scope. So the following substitution is wrong::
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock_1, flags);
func3();
- local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_1, flags);
+ local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_1, flags);
}
func2()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock_2, flags);
func3();
- local_irq_restore(flags); -> local_lock_irqrestore(&local_lock_2, flags);
+ local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_2, flags);
}
func3()
@@ -379,14 +376,14 @@ PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is::
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
func3();
- local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags);
+ local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
}
func2()
{
local_irq_save(flags); -> local_lock_irqsave(&local_lock, flags);
func3();
- local_irq_restore(flags); -> local_lock_irqrestore(&local_lock, flags);
+ local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
}
func3()
@@ -439,11 +436,9 @@ preemption. The following substitution works on both kernels::
spin_lock(&p->lock);
p->count += this_cpu_read(var2);
-On a non-PREEMPT_RT kernel migrate_disable() maps to preempt_disable()
-which makes the above code fully equivalent. On a PREEMPT_RT kernel
migrate_disable() ensures that the task is pinned on the current CPU which
in turn guarantees that the per-CPU access to var1 and var2 are staying on
-the same CPU.
+the same CPU while the task remains preemptible.
The migrate_disable() substitution is not valid for the following
scenario::
@@ -456,9 +451,8 @@ scenario::
p = this_cpu_ptr(&var1);
p->val = func2();
-While correct on a non-PREEMPT_RT kernel, this breaks on PREEMPT_RT because
-here migrate_disable() does not protect against reentrancy from a
-preempting task. A correct substitution for this case is::
+This breaks because migrate_disable() does not protect against reentrancy from
+a preempting task. A correct substitution for this case is::
func()
{
@@ -506,7 +500,7 @@ caveats also apply to bit spinlocks.
Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
using conditional (#ifdef'ed) code changes at the usage site. In contrast,
usage-site changes are not needed for the spinlock_t substitution.
-Instead, conditionals in header files and the core locking implemementation
+Instead, conditionals in header files and the core locking implementation
enable the compiler to do the substitution transparently.
diff --git a/Documentation/locking/mutex-design.rst b/Documentation/locking/mutex-design.rst
index 4d8236b81fa5..7c30b4aa5e28 100644
--- a/Documentation/locking/mutex-design.rst
+++ b/Documentation/locking/mutex-design.rst
@@ -18,7 +18,7 @@ as an alternative to these. This new data structure provided a number
of advantages, including simpler interfaces, and at that time smaller
code (see Disadvantages).
-[1] http://lwn.net/Articles/164802/
+[1] https://lwn.net/Articles/164802/
Implementation
--------------
@@ -28,7 +28,7 @@ and implemented in kernel/locking/mutex.c. These locks use an atomic variable
(->owner) to keep track of the lock state during its lifetime. Field owner
actually contains `struct task_struct *` to the current lock owner and it is
therefore NULL if not currently owned. Since task_struct pointers are aligned
-at at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
+to at least L1_CACHE_BYTES, low bits (3) are used to store extra state (e.g.,
if waiter list is non-empty). In its most basic form it also includes a
wait-queue and a spinlock that serializes access to it. Furthermore,
CONFIG_MUTEX_SPIN_ON_OWNER=y systems use a spinner MCS lock (->osq), described
@@ -101,6 +101,24 @@ features that make lock debugging easier and faster:
- Detects multi-task circular deadlocks and prints out all affected
locks and tasks (and only those tasks).
+Mutexes - and most other sleeping locks like rwsems - do not provide an
+implicit reference for the memory they occupy, which reference is released
+with mutex_unlock().
+
+[ This is in contrast with spin_unlock() [or completion_done()], which
+ APIs can be used to guarantee that the memory is not touched by the
+ lock implementation after spin_unlock()/completion_done() releases
+ the lock. ]
+
+mutex_unlock() may access the mutex structure even after it has internally
+released the lock already - so it's not safe for another context to
+acquire the mutex and assume that the mutex_unlock() context is not using
+the structure anymore.
+
+The mutex user must ensure that the mutex is not destroyed while a
+release operation is still in progress - in other words, callers of
+mutex_unlock() must ensure that the mutex stays alive until mutex_unlock()
+has returned.
Interfaces
----------
diff --git a/Documentation/locking/percpu-rw-semaphore.rst b/Documentation/locking/percpu-rw-semaphore.rst
index 247de6410855..a105bf2dd812 100644
--- a/Documentation/locking/percpu-rw-semaphore.rst
+++ b/Documentation/locking/percpu-rw-semaphore.rst
@@ -16,8 +16,8 @@ writing is very expensive, it calls synchronize_rcu() that can take
hundreds of milliseconds.
The lock is declared with "struct percpu_rw_semaphore" type.
-The lock is initialized percpu_init_rwsem, it returns 0 on success and
--ENOMEM on allocation failure.
+The lock is initialized with percpu_init_rwsem, it returns 0 on success
+and -ENOMEM on allocation failure.
The lock must be freed with percpu_free_rwsem to avoid memory leak.
The lock is locked for read with percpu_down_read, percpu_up_read and
diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
new file mode 100644
index 000000000000..ec6411d02ac8
--- /dev/null
+++ b/Documentation/locking/seqlock.rst
@@ -0,0 +1,239 @@
+======================================
+Sequence counters and sequential locks
+======================================
+
+Introduction
+============
+
+Sequence counters are a reader-writer consistency mechanism with
+lockless readers (read-only retry loops), and no writer starvation. They
+are used for data that's rarely written to (e.g. system time), where the
+reader wants a consistent set of information and is willing to retry if
+that information changes.
+
+A data set is consistent when the sequence count at the beginning of the
+read side critical section is even and the same sequence count value is
+read again at the end of the critical section. The data in the set must
+be copied out inside the read side critical section. If the sequence
+count has changed between the start and the end of the critical section,
+the reader must retry.
+
+Writers increment the sequence count at the start and the end of their
+critical section. After starting the critical section the sequence count
+is odd and indicates to the readers that an update is in progress. At
+the end of the write side critical section the sequence count becomes
+even again which lets readers make progress.
+
+A sequence counter write side critical section must never be preempted
+or interrupted by read side sections. Otherwise the reader will spin for
+the entire scheduler tick due to the odd sequence count value and the
+interrupted writer. If that reader belongs to a real-time scheduling
+class, it can spin forever and the kernel will livelock.
+
+This mechanism cannot be used if the protected data contains pointers,
+as the writer can invalidate a pointer that the reader is following.
+
+
+.. _seqcount_t:
+
+Sequence counters (``seqcount_t``)
+==================================
+
+This is the raw counting mechanism, which does not protect against
+multiple writers. Write side critical sections must thus be serialized
+by an external lock.
+
+If the write serialization primitive is not implicitly disabling
+preemption, preemption must be explicitly disabled before entering the
+write side section. If the read section can be invoked from hardirq or
+softirq contexts, interrupts or bottom halves must also be respectively
+disabled before entering the write section.
+
+If it's desired to automatically handle the sequence counter
+requirements of writer serialization and non-preemptibility, use
+:ref:`seqlock_t` instead.
+
+Initialization::
+
+ /* dynamic */
+ seqcount_t foo_seqcount;
+ seqcount_init(&foo_seqcount);
+
+ /* static */
+ static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
+
+ /* C99 struct init */
+ struct {
+ .seq = SEQCNT_ZERO(foo.seq),
+ } foo;
+
+Write path::
+
+ /* Serialized context with disabled preemption */
+
+ write_seqcount_begin(&foo_seqcount);
+
+ /* ... [[write-side critical section]] ... */
+
+ write_seqcount_end(&foo_seqcount);
+
+Read path::
+
+ do {
+ seq = read_seqcount_begin(&foo_seqcount);
+
+ /* ... [[read-side critical section]] ... */
+
+ } while (read_seqcount_retry(&foo_seqcount, seq));
+
+
+.. _seqcount_locktype_t:
+
+Sequence counters with associated locks (``seqcount_LOCKNAME_t``)
+-----------------------------------------------------------------
+
+As discussed at :ref:`seqcount_t`, sequence count write side critical
+sections must be serialized and non-preemptible. This variant of
+sequence counters associate the lock used for writer serialization at
+initialization time, which enables lockdep to validate that the write
+side critical sections are properly serialized.
+
+This lock association is a NOOP if lockdep is disabled and has neither
+storage nor runtime overhead. If lockdep is enabled, the lock pointer is
+stored in struct seqcount and lockdep's "lock is held" assertions are
+injected at the beginning of the write side critical section to validate
+that it is properly protected.
+
+For lock types which do not implicitly disable preemption, preemption
+protection is enforced in the write side function.
+
+The following sequence counters with associated locks are defined:
+
+ - ``seqcount_spinlock_t``
+ - ``seqcount_raw_spinlock_t``
+ - ``seqcount_rwlock_t``
+ - ``seqcount_mutex_t``
+ - ``seqcount_ww_mutex_t``
+
+The sequence counter read and write APIs can take either a plain
+seqcount_t or any of the seqcount_LOCKNAME_t variants above.
+
+Initialization (replace "LOCKNAME" with one of the supported locks)::
+
+ /* dynamic */
+ seqcount_LOCKNAME_t foo_seqcount;
+ seqcount_LOCKNAME_init(&foo_seqcount, &lock);
+
+ /* static */
+ static seqcount_LOCKNAME_t foo_seqcount =
+ SEQCNT_LOCKNAME_ZERO(foo_seqcount, &lock);
+
+ /* C99 struct init */
+ struct {
+ .seq = SEQCNT_LOCKNAME_ZERO(foo.seq, &lock),
+ } foo;
+
+Write path: same as in :ref:`seqcount_t`, while running from a context
+with the associated write serialization lock acquired.
+
+Read path: same as in :ref:`seqcount_t`.
+
+
+.. _seqcount_latch_t:
+
+Latch sequence counters (``seqcount_latch_t``)
+----------------------------------------------
+
+Latch sequence counters are a multiversion concurrency control mechanism
+where the embedded seqcount_t counter even/odd value is used to switch
+between two copies of protected data. This allows the sequence counter
+read path to safely interrupt its own write side critical section.
+
+Use seqcount_latch_t when the write side sections cannot be protected
+from interruption by readers. This is typically the case when the read
+side can be invoked from NMI handlers.
+
+Check `write_seqcount_latch()` for more information.
+
+
+.. _seqlock_t:
+
+Sequential locks (``seqlock_t``)
+================================
+
+This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
+embedded spinlock for writer serialization and non-preemptibility.
+
+If the read side section can be invoked from hardirq or softirq context,
+use the write side function variants which disable interrupts or bottom
+halves respectively.
+
+Initialization::
+
+ /* dynamic */
+ seqlock_t foo_seqlock;
+ seqlock_init(&foo_seqlock);
+
+ /* static */
+ static DEFINE_SEQLOCK(foo_seqlock);
+
+ /* C99 struct init */
+ struct {
+ .seql = __SEQLOCK_UNLOCKED(foo.seql)
+ } foo;
+
+Write path::
+
+ write_seqlock(&foo_seqlock);
+
+ /* ... [[write-side critical section]] ... */
+
+ write_sequnlock(&foo_seqlock);
+
+Read path, three categories:
+
+1. Normal Sequence readers which never block a writer but they must
+ retry if a writer is in progress by detecting change in the sequence
+ number. Writers do not wait for a sequence reader::
+
+ do {
+ seq = read_seqbegin(&foo_seqlock);
+
+ /* ... [[read-side critical section]] ... */
+
+ } while (read_seqretry(&foo_seqlock, seq));
+
+2. Locking readers which will wait if a writer or another locking reader
+ is in progress. A locking reader in progress will also block a writer
+ from entering its critical section. This read lock is
+ exclusive. Unlike rwlock_t, only one locking reader can acquire it::
+
+ read_seqlock_excl(&foo_seqlock);
+
+ /* ... [[read-side critical section]] ... */
+
+ read_sequnlock_excl(&foo_seqlock);
+
+3. Conditional lockless reader (as in 1), or locking reader (as in 2),
+ according to a passed marker. This is used to avoid lockless readers
+ starvation (too much retry loops) in case of a sharp spike in write
+ activity. First, a lockless read is tried (even marker passed). If
+ that trial fails (odd sequence counter is returned, which is used as
+ the next iteration marker), the lockless read is transformed to a
+ full locking read and no retry loop is necessary::
+
+ /* marker; even initialization */
+ int seq = 0;
+ do {
+ read_seqbegin_or_lock(&foo_seqlock, &seq);
+
+ /* ... [[read-side critical section]] ... */
+
+ } while (need_seqretry(&foo_seqlock, seq));
+ done_seqretry(&foo_seqlock, seq);
+
+
+API documentation
+=================
+
+.. kernel-doc:: include/linux/seqlock.h
diff --git a/Documentation/locking/ww-mutex-design.rst b/Documentation/locking/ww-mutex-design.rst
index 1846c199da23..6a8f8beb9ec4 100644
--- a/Documentation/locking/ww-mutex-design.rst
+++ b/Documentation/locking/ww-mutex-design.rst
@@ -2,7 +2,7 @@
Wound/Wait Deadlock-Proof Mutex Design
======================================
-Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
+Please read mutex-design.rst first, as it applies to wait/wound mutexes too.
Motivation for WW-Mutexes
-------------------------
@@ -49,7 +49,7 @@ However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
compared to Wait-Die, but is, on the other hand, associated with more work than
Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
algorithm in that transactions are wounded by other transactions, and that
-requires a reliable way to pick up up the wounded condition and preempt the
+requires a reliable way to pick up the wounded condition and preempt the
running transaction. Note that this is not the same as process preemption. A
Wound-Wait transaction is considered preempted when it dies (returning
-EDEADLK) following a wound.
@@ -60,7 +60,7 @@ Concepts
Compared to normal mutexes two additional concepts/objects show up in the lock
interface for w/w mutexes:
-Acquire context: To ensure eventual forward progress it is important the a task
+Acquire context: To ensure eventual forward progress it is important that a task
trying to acquire locks doesn't grab a new reservation id, but keeps the one it
acquired when starting the lock acquisition. This ticket is stored in the
acquire context. Furthermore the acquire context keeps track of debugging state