aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation/RCU/whatisRCU.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/RCU/whatisRCU.rst')
-rw-r--r--Documentation/RCU/whatisRCU.rst256
1 files changed, 168 insertions, 88 deletions
diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst
index 77ea260efd12..872ac665223f 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -6,26 +6,33 @@ What is RCU? -- "Read, Copy, Update"
Please note that the "What is RCU?" LWN series is an excellent place
to start learning about RCU:
-| 1. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
-| 2. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
-| 3. RCU part 3: the RCU API http://lwn.net/Articles/264090/
-| 4. The RCU API, 2010 Edition http://lwn.net/Articles/418853/
-| 2010 Big API Table http://lwn.net/Articles/419086/
-| 5. The RCU API, 2014 Edition http://lwn.net/Articles/609904/
-| 2014 Big API Table http://lwn.net/Articles/609973/
+| 1. What is RCU, Fundamentally? https://lwn.net/Articles/262464/
+| 2. What is RCU? Part 2: Usage https://lwn.net/Articles/263130/
+| 3. RCU part 3: the RCU API https://lwn.net/Articles/264090/
+| 4. The RCU API, 2010 Edition https://lwn.net/Articles/418853/
+| 2010 Big API Table https://lwn.net/Articles/419086/
+| 5. The RCU API, 2014 Edition https://lwn.net/Articles/609904/
+| 2014 Big API Table https://lwn.net/Articles/609973/
+| 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
+| 2019 Big API Table https://lwn.net/Articles/777165/
+
+For those preferring video:
+
+| 1. Unraveling RCU Mysteries: Fundamentals https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries
+| 2. Unraveling RCU Mysteries: Additional Use Cases https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries-additional-use-cases
What is RCU?
RCU is a synchronization mechanism that was added to the Linux kernel
during the 2.5 development effort that is optimized for read-mostly
-situations. Although RCU is actually quite simple once you understand it,
-getting there can sometimes be a challenge. Part of the problem is that
-most of the past descriptions of RCU have been written with the mistaken
-assumption that there is "one true way" to describe RCU. Instead,
-the experience has been that different people must take different paths
-to arrive at an understanding of RCU. This document provides several
-different paths, as follows:
+situations. Although RCU is actually quite simple, making effective use
+of it requires you to think differently about your code. Another part
+of the problem is the mistaken assumption that there is "one true way" to
+describe and to use RCU. Instead, the experience has been that different
+people must take different paths to arrive at an understanding of RCU,
+depending on their experiences and use cases. This document provides
+several different paths, as follows:
:ref:`1. RCU OVERVIEW <1_whatisRCU>`
@@ -52,8 +59,8 @@ experiment with should focus on Section 2. People who prefer to start
with example uses should focus on Sections 3 and 4. People who need to
understand the RCU implementation should focus on Section 5, then dive
into the kernel source code. People who reason best by analogy should
-focus on Section 6. Section 7 serves as an index to the docbook API
-documentation, and Section 8 is the traditional answer key.
+focus on Section 6 and 7. Section 8 serves as an index to the docbook
+API documentation, and Section 9 is the traditional answer key.
So, start with the section that makes the most sense to you and your
preferred method of learning. If you need to know everything about
@@ -155,34 +162,47 @@ rcu_read_lock()
^^^^^^^^^^^^^^^
void rcu_read_lock(void);
- Used by a reader to inform the reclaimer that the reader is
- entering an RCU read-side critical section. It is illegal
- to block while in an RCU read-side critical section, though
- kernels built with CONFIG_PREEMPT_RCU can preempt RCU
- read-side critical sections. Any RCU-protected data structure
- accessed during an RCU read-side critical section is guaranteed to
- remain unreclaimed for the full duration of that critical section.
- Reference counts may be used in conjunction with RCU to maintain
- longer-term references to data structures.
+ This temporal primitive is used by a reader to inform the
+ reclaimer that the reader is entering an RCU read-side critical
+ section. It is illegal to block while in an RCU read-side
+ critical section, though kernels built with CONFIG_PREEMPT_RCU
+ can preempt RCU read-side critical sections. Any RCU-protected
+ data structure accessed during an RCU read-side critical section
+ is guaranteed to remain unreclaimed for the full duration of that
+ critical section. Reference counts may be used in conjunction
+ with RCU to maintain longer-term references to data structures.
+
+ Note that anything that disables bottom halves, preemption,
+ or interrupts also enters an RCU read-side critical section.
+ Acquiring a spinlock also enters an RCU read-side critical
+ sections, even for spinlocks that do not disable preemption,
+ as is the case in kernels built with CONFIG_PREEMPT_RT=y.
+ Sleeplocks do *not* enter RCU read-side critical sections.
rcu_read_unlock()
^^^^^^^^^^^^^^^^^
void rcu_read_unlock(void);
- Used by a reader to inform the reclaimer that the reader is
- exiting an RCU read-side critical section. Note that RCU
- read-side critical sections may be nested and/or overlapping.
+ This temporal primitives is used by a reader to inform the
+ reclaimer that the reader is exiting an RCU read-side critical
+ section. Anything that enables bottom halves, preemption,
+ or interrupts also exits an RCU read-side critical section.
+ Releasing a spinlock also exits an RCU read-side critical section.
+
+ Note that RCU read-side critical sections may be nested and/or
+ overlapping.
synchronize_rcu()
^^^^^^^^^^^^^^^^^
void synchronize_rcu(void);
- Marks the end of updater code and the beginning of reclaimer
- code. It does this by blocking until all pre-existing RCU
- read-side critical sections on all CPUs have completed.
- Note that synchronize_rcu() will **not** necessarily wait for
- any subsequent RCU read-side critical sections to complete.
- For example, consider the following sequence of events::
+ This temporal primitive marks the end of updater code and the
+ beginning of reclaimer code. It does this by blocking until
+ all pre-existing RCU read-side critical sections on all CPUs
+ have completed. Note that synchronize_rcu() will **not**
+ necessarily wait for any subsequent RCU read-side critical
+ sections to complete. For example, consider the following
+ sequence of events::
CPU 0 CPU 1 CPU 2
----------------- ------------------------- ---------------
@@ -209,13 +229,13 @@ synchronize_rcu()
to be useful in all but the most read-intensive situations,
synchronize_rcu()'s overhead must also be quite small.
- The call_rcu() API is a callback form of synchronize_rcu(),
- and is described in more detail in a later section. Instead of
- blocking, it registers a function and argument which are invoked
- after all ongoing RCU read-side critical sections have completed.
- This callback variant is particularly useful in situations where
- it is illegal to block or where update-side performance is
- critically important.
+ The call_rcu() API is an asynchronous callback form of
+ synchronize_rcu(), and is described in more detail in a later
+ section. Instead of blocking, it registers a function and
+ argument which are invoked after all ongoing RCU read-side
+ critical sections have completed. This callback variant is
+ particularly useful in situations where it is illegal to block
+ or where update-side performance is critically important.
However, the call_rcu() API should not be used lightly, as use
of the synchronize_rcu() API generally results in simpler code.
@@ -234,11 +254,13 @@ rcu_assign_pointer()
would be cool to be able to declare a function in this manner.
(Compiler experts will no doubt disagree.)
- The updater uses this function to assign a new value to an
+ The updater uses this spatial macro to assign a new value to an
RCU-protected pointer, in order to safely communicate the change
- in value from the updater to the reader. This macro does not
- evaluate to an rvalue, but it does execute any memory-barrier
- instructions required for a given CPU architecture.
+ in value from the updater to the reader. This is a spatial (as
+ opposed to temporal) macro. It does not evaluate to an rvalue,
+ but it does execute any memory-barrier instructions required
+ for a given CPU architecture. Its ordering properties are that
+ of a store-release operation.
Perhaps just as important, it serves to document (1) which
pointers are protected by RCU and (2) the point at which a
@@ -253,14 +275,15 @@ rcu_dereference()
Like rcu_assign_pointer(), rcu_dereference() must be implemented
as a macro.
- The reader uses rcu_dereference() to fetch an RCU-protected
- pointer, which returns a value that may then be safely
- dereferenced. Note that rcu_dereference() does not actually
- dereference the pointer, instead, it protects the pointer for
- later dereferencing. It also executes any needed memory-barrier
- instructions for a given CPU architecture. Currently, only Alpha
- needs memory barriers within rcu_dereference() -- on other CPUs,
- it compiles to nothing, not even a compiler directive.
+ The reader uses the spatial rcu_dereference() macro to fetch
+ an RCU-protected pointer, which returns a value that may
+ then be safely dereferenced. Note that rcu_dereference()
+ does not actually dereference the pointer, instead, it
+ protects the pointer for later dereferencing. It also
+ executes any needed memory-barrier instructions for a given
+ CPU architecture. Currently, only Alpha needs memory barriers
+ within rcu_dereference() -- on other CPUs, it compiles to a
+ volatile load.
Common coding practice uses rcu_dereference() to copy an
RCU-protected pointer to a local variable, then dereferences
@@ -353,12 +376,15 @@ reader, updater, and reclaimer.
synchronize_rcu() & call_rcu()
-The RCU infrastructure observes the time sequence of rcu_read_lock(),
+The RCU infrastructure observes the temporal sequence of rcu_read_lock(),
rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
order to determine when (1) synchronize_rcu() invocations may return
to their callers and (2) call_rcu() callbacks may be invoked. Efficient
implementations of the RCU infrastructure make heavy use of batching in
order to amortize their overhead over many uses of the corresponding APIs.
+The rcu_assign_pointer() and rcu_dereference() invocations communicate
+spatial changes via stores to and loads from the RCU-protected pointer in
+question.
There are at least three flavors of RCU usage in the Linux kernel. The diagram
above shows the most common one. On the updater side, the rcu_assign_pointer(),
@@ -390,7 +416,9 @@ b. RCU applied to networking data structures that may be subjected
c. RCU applied to scheduler and interrupt/NMI-handler tasks.
Again, most uses will be of (a). The (b) and (c) cases are important
-for specialized uses, but are relatively uncommon.
+for specialized uses, but are relatively uncommon. The SRCU, RCU-Tasks,
+RCU-Tasks-Rude, and RCU-Tasks-Trace have similar relationships among
+their assorted primitives.
.. _3_whatisRCU:
@@ -466,7 +494,7 @@ So, to sum up:
- Within an RCU read-side critical section, use rcu_dereference()
to dereference RCU-protected pointers.
-- Use some solid scheme (such as locks or semaphores) to
+- Use some solid design (such as locks or semaphores) to
keep concurrent updates from interfering with each other.
- Use rcu_assign_pointer() to update an RCU-protected pointer.
@@ -577,6 +605,14 @@ to avoid having to write your own callback::
kfree_rcu(old_fp, rcu);
+If the occasional sleep is permitted, the single-argument form may
+be used, omitting the rcu_head structure from struct foo.
+
+ kfree_rcu_mightsleep(old_fp);
+
+This variant almost never blocks, but might do so by invoking
+synchronize_rcu() in response to memory-allocation failure.
+
Again, see checklist.rst for additional rules governing the use of RCU.
.. _5_whatisRCU:
@@ -594,7 +630,7 @@ lacking both functionality and performance. However, they are useful
in getting a feel for how RCU works. See kernel/rcu/update.c for a
production-quality implementation, and see:
- http://www.rdrop.com/users/paulmck/RCU
+ https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit
for papers describing the Linux kernel RCU implementation. The OLS'01
and OLS'02 papers are a good introduction, and the dissertation provides
@@ -915,13 +951,20 @@ which an RCU reference is held include:
The understanding that RCU provides a reference that only prevents a
change of type is particularly visible with objects allocated from a
slab cache marked ``SLAB_TYPESAFE_BY_RCU``. RCU operations may yield a
-reference to an object from such a cache that has been concurrently
-freed and the memory reallocated to a completely different object,
-though of the same type. In this case RCU doesn't even protect the
-identity of the object from changing, only its type. So the object
-found may not be the one expected, but it will be one where it is safe
-to take a reference or spinlock and then confirm that the identity
-matches the expectations.
+reference to an object from such a cache that has been concurrently freed
+and the memory reallocated to a completely different object, though of
+the same type. In this case RCU doesn't even protect the identity of the
+object from changing, only its type. So the object found may not be the
+one expected, but it will be one where it is safe to take a reference
+(and then potentially acquiring a spinlock), allowing subsequent code
+to check whether the identity matches expectations. It is tempting
+to simply acquire the spinlock without first taking the reference, but
+unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
+initialized after each and every call to kmem_cache_alloc(), which renders
+reference-free spinlock acquisition completely unsafe. Therefore, when
+using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
+(Those willing to initialize their locks in a kmem_cache constructor
+may also use locking, including cache-friendly sequence locking.)
With traditional reference counting -- such as that implemented by the
kref library in Linux -- there is typically code that runs when the last
@@ -1040,6 +1083,30 @@ sched::
rcu_read_lock_sched_held
+RCU-Tasks::
+
+ Critical sections Grace period Barrier
+
+ N/A call_rcu_tasks rcu_barrier_tasks
+ synchronize_rcu_tasks
+
+
+RCU-Tasks-Rude::
+
+ Critical sections Grace period Barrier
+
+ N/A call_rcu_tasks_rude rcu_barrier_tasks_rude
+ synchronize_rcu_tasks_rude
+
+
+RCU-Tasks-Trace::
+
+ Critical sections Grace period Barrier
+
+ rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace
+ rcu_read_unlock_trace synchronize_rcu_tasks_trace
+
+
SRCU::
Critical sections Grace period Barrier
@@ -1057,13 +1124,18 @@ SRCU: Initialization/cleanup::
init_srcu_struct
cleanup_srcu_struct
-All: lockdep-checked RCU-protected pointer access::
+All: lockdep-checked RCU utility APIs::
- rcu_access_pointer
- rcu_dereference_raw
RCU_LOCKDEP_WARN
rcu_sleep_check
- RCU_NONIDLE
+
+All: Unchecked RCU-protected pointer access::
+
+ rcu_dereference_raw
+
+All: Unchecked RCU-protected pointer access with dereferencing prohibited::
+
+ rcu_access_pointer
See the comment headers in the source code (or the docbook generated
from them) for more information.
@@ -1074,35 +1146,43 @@ list can be helpful:
a. Will readers need to block? If so, you need SRCU.
-b. What about the -rt patchset? If readers would need to block
- in an non-rt kernel, you need SRCU. If readers would block
- in a -rt kernel, but not in a non-rt kernel, SRCU is not
- necessary. (The -rt patchset turns spinlocks into sleeplocks,
- hence this distinction.)
+b. Will readers need to block and are you doing tracing, for
+ example, ftrace or BPF? If so, you need RCU-tasks,
+ RCU-tasks-rude, and/or RCU-tasks-trace.
+
+c. What about the -rt patchset? If readers would need to block in
+ an non-rt kernel, you need SRCU. If readers would block when
+ acquiring spinlocks in a -rt kernel, but not in a non-rt kernel,
+ SRCU is not necessary. (The -rt patchset turns spinlocks into
+ sleeplocks, hence this distinction.)
-c. Do you need to treat NMI handlers, hardirq handlers,
+d. Do you need to treat NMI handlers, hardirq handlers,
and code segments with preemption disabled (whether
via preempt_disable(), local_irq_save(), local_bh_disable(),
or some other mechanism) as if they were explicit RCU readers?
- If so, RCU-sched is the only choice that will work for you.
-
-d. Do you need RCU grace periods to complete even in the face
- of softirq monopolization of one or more of the CPUs? For
- example, is your code subject to network-based denial-of-service
- attacks? If so, you should disable softirq across your readers,
- for example, by using rcu_read_lock_bh().
-
-e. Is your workload too update-intensive for normal use of
+ If so, RCU-sched readers are the only choice that will work
+ for you, but since about v4.20 you use can use the vanilla RCU
+ update primitives.
+
+e. Do you need RCU grace periods to complete even in the face of
+ softirq monopolization of one or more of the CPUs? For example,
+ is your code subject to network-based denial-of-service attacks?
+ If so, you should disable softirq across your readers, for
+ example, by using rcu_read_lock_bh(). Since about v4.20 you
+ use can use the vanilla RCU update primitives.
+
+f. Is your workload too update-intensive for normal use of
RCU, but inappropriate for other synchronization mechanisms?
If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
named SLAB_DESTROY_BY_RCU). But please be careful!
-f. Do you need read-side critical sections that are respected
- even though they are in the middle of the idle loop, during
- user-mode execution, or on an offlined CPU? If so, SRCU is the
- only choice that will work for you.
+g. Do you need read-side critical sections that are respected even
+ on CPUs that are deep in the idle loop, during entry to or exit
+ from user-mode execution, or on an offlined CPU? If so, SRCU
+ and RCU Tasks Trace are the only choices that will work for you,
+ with SRCU being strongly preferred in almost all cases.
-g. Otherwise, use RCU.
+h. Otherwise, use RCU.
Of course, this all assumes that you have determined that RCU is in fact
the right tool for your job.