1 files changed, 168 insertions, 88 deletions
diff --git a/Documentation/RCU/whatisRCU.rst b/Documentation/RCU/whatisRCU.rst
index 77ea260efd12..872ac665223f 100644
--- a/Documentation/RCU/whatisRCU.rst
+++ b/Documentation/RCU/whatisRCU.rst
@@ -6,26 +6,33 @@ What is RCU?  --  "Read, Copy, Update"
 Please note that the "What is RCU?" LWN series is an excellent place
 to start learning about RCU:
 
-| 1.	What is RCU, Fundamentally?  http://lwn.net/Articles/262464/
-| 2.	What is RCU? Part 2: Usage   http://lwn.net/Articles/263130/
-| 3.	RCU part 3: the RCU API      http://lwn.net/Articles/264090/
-| 4.	The RCU API, 2010 Edition    http://lwn.net/Articles/418853/
-| 	2010 Big API Table           http://lwn.net/Articles/419086/
-| 5.	The RCU API, 2014 Edition    http://lwn.net/Articles/609904/
-|	2014 Big API Table           http://lwn.net/Articles/609973/
+| 1.	What is RCU, Fundamentally?  https://lwn.net/Articles/262464/
+| 2.	What is RCU? Part 2: Usage   https://lwn.net/Articles/263130/
+| 3.	RCU part 3: the RCU API      https://lwn.net/Articles/264090/
+| 4.	The RCU API, 2010 Edition    https://lwn.net/Articles/418853/
+| 	2010 Big API Table           https://lwn.net/Articles/419086/
+| 5.	The RCU API, 2014 Edition    https://lwn.net/Articles/609904/
+|	2014 Big API Table           https://lwn.net/Articles/609973/
+| 6.	The RCU API, 2019 Edition    https://lwn.net/Articles/777036/
+|	2019 Big API Table           https://lwn.net/Articles/777165/
+
+For those preferring video:
+
+| 1.	Unraveling RCU Mysteries: Fundamentals          https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries
+| 2.	Unraveling RCU Mysteries: Additional Use Cases  https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries-additional-use-cases
 
 
 What is RCU?
 
 RCU is a synchronization mechanism that was added to the Linux kernel
 during the 2.5 development effort that is optimized for read-mostly
-situations.  Although RCU is actually quite simple once you understand it,
-getting there can sometimes be a challenge.  Part of the problem is that
-most of the past descriptions of RCU have been written with the mistaken
-assumption that there is "one true way" to describe RCU.  Instead,
-the experience has been that different people must take different paths
-to arrive at an understanding of RCU.  This document provides several
-different paths, as follows:
+situations.  Although RCU is actually quite simple, making effective use
+of it requires you to think differently about your code.  Another part
+of the problem is the mistaken assumption that there is "one true way" to
+describe and to use RCU.  Instead, the experience has been that different
+people must take different paths to arrive at an understanding of RCU,
+depending on their experiences and use cases.  This document provides
+several different paths, as follows:
 
 :ref:`1.	RCU OVERVIEW <1_whatisRCU>`
 
@@ -52,8 +59,8 @@ experiment with should focus on Section 2.  People who prefer to start
 with example uses should focus on Sections 3 and 4.  People who need to
 understand the RCU implementation should focus on Section 5, then dive
 into the kernel source code.  People who reason best by analogy should
-focus on Section 6.  Section 7 serves as an index to the docbook API
-documentation, and Section 8 is the traditional answer key.
+focus on Section 6 and 7.  Section 8 serves as an index to the docbook
+API documentation, and Section 9 is the traditional answer key.
 
 So, start with the section that makes the most sense to you and your
 preferred method of learning.  If you need to know everything about
@@ -155,34 +162,47 @@ rcu_read_lock()
 ^^^^^^^^^^^^^^^
 	void rcu_read_lock(void);
 
-	Used by a reader to inform the reclaimer that the reader is
-	entering an RCU read-side critical section.  It is illegal
-	to block while in an RCU read-side critical section, though
-	kernels built with CONFIG_PREEMPT_RCU can preempt RCU
-	read-side critical sections.  Any RCU-protected data structure
-	accessed during an RCU read-side critical section is guaranteed to
-	remain unreclaimed for the full duration of that critical section.
-	Reference counts may be used in conjunction with RCU to maintain
-	longer-term references to data structures.
+	This temporal primitive is used by a reader to inform the
+	reclaimer that the reader is entering an RCU read-side critical
+	section.  It is illegal to block while in an RCU read-side
+	critical section, though kernels built with CONFIG_PREEMPT_RCU
+	can preempt RCU read-side critical sections.  Any RCU-protected
+	data structure accessed during an RCU read-side critical section
+	is guaranteed to remain unreclaimed for the full duration of that
+	critical section.  Reference counts may be used in conjunction
+	with RCU to maintain longer-term references to data structures.
+
+	Note that anything that disables bottom halves, preemption,
+	or interrupts also enters an RCU read-side critical section.
+	Acquiring a spinlock also enters an RCU read-side critical
+	sections, even for spinlocks that do not disable preemption,
+	as is the case in kernels built with CONFIG_PREEMPT_RT=y.
+	Sleeplocks do *not* enter RCU read-side critical sections.
 
 rcu_read_unlock()
 ^^^^^^^^^^^^^^^^^
 	void rcu_read_unlock(void);
 
-	Used by a reader to inform the reclaimer that the reader is
-	exiting an RCU read-side critical section.  Note that RCU
-	read-side critical sections may be nested and/or overlapping.
+	This temporal primitives is used by a reader to inform the
+	reclaimer that the reader is exiting an RCU read-side critical
+	section.  Anything that enables bottom halves, preemption,
+	or interrupts also exits an RCU read-side critical section.
+	Releasing a spinlock also exits an RCU read-side critical section.
+
+	Note that RCU read-side critical sections may be nested and/or
+	overlapping.
 
 synchronize_rcu()
 ^^^^^^^^^^^^^^^^^
 	void synchronize_rcu(void);
 
-	Marks the end of updater code and the beginning of reclaimer
-	code.  It does this by blocking until all pre-existing RCU
-	read-side critical sections on all CPUs have completed.
-	Note that synchronize_rcu() will **not** necessarily wait for
-	any subsequent RCU read-side critical sections to complete.
-	For example, consider the following sequence of events::
+	This temporal primitive marks the end of updater code and the
+	beginning of reclaimer code.  It does this by blocking until
+	all pre-existing RCU read-side critical sections on all CPUs
+	have completed.  Note that synchronize_rcu() will **not**
+	necessarily wait for any subsequent RCU read-side critical
+	sections to complete.  For example, consider the following
+	sequence of events::
 
 	         CPU 0                  CPU 1                 CPU 2
 	     ----------------- ------------------------- ---------------
@@ -209,13 +229,13 @@ synchronize_rcu()
 	to be useful in all but the most read-intensive situations,
 	synchronize_rcu()'s overhead must also be quite small.
 
-	The call_rcu() API is a callback form of synchronize_rcu(),
-	and is described in more detail in a later section.  Instead of
-	blocking, it registers a function and argument which are invoked
-	after all ongoing RCU read-side critical sections have completed.
-	This callback variant is particularly useful in situations where
-	it is illegal to block or where update-side performance is
-	critically important.
+	The call_rcu() API is an asynchronous callback form of
+	synchronize_rcu(), and is described in more detail in a later
+	section.  Instead of blocking, it registers a function and
+	argument which are invoked after all ongoing RCU read-side
+	critical sections have completed.  This callback variant is
+	particularly useful in situations where it is illegal to block
+	or where update-side performance is critically important.
 
 	However, the call_rcu() API should not be used lightly, as use
 	of the synchronize_rcu() API generally results in simpler code.
@@ -234,11 +254,13 @@ rcu_assign_pointer()
 	would be cool to be able to declare a function in this manner.
 	(Compiler experts will no doubt disagree.)
 
-	The updater uses this function to assign a new value to an
+	The updater uses this spatial macro to assign a new value to an
 	RCU-protected pointer, in order to safely communicate the change
-	in value from the updater to the reader.  This macro does not
-	evaluate to an rvalue, but it does execute any memory-barrier
-	instructions required for a given CPU architecture.
+	in value from the updater to the reader.  This is a spatial (as
+	opposed to temporal) macro.  It does not evaluate to an rvalue,
+	but it does execute any memory-barrier instructions required
+	for a given CPU architecture.  Its ordering properties are that
+	of a store-release operation.
 
 	Perhaps just as important, it serves to document (1) which
 	pointers are protected by RCU and (2) the point at which a
@@ -253,14 +275,15 @@ rcu_dereference()
 	Like rcu_assign_pointer(), rcu_dereference() must be implemented
 	as a macro.
 
-	The reader uses rcu_dereference() to fetch an RCU-protected
-	pointer, which returns a value that may then be safely
-	dereferenced.  Note that rcu_dereference() does not actually
-	dereference the pointer, instead, it protects the pointer for
-	later dereferencing.  It also executes any needed memory-barrier
-	instructions for a given CPU architecture.  Currently, only Alpha
-	needs memory barriers within rcu_dereference() -- on other CPUs,
-	it compiles to nothing, not even a compiler directive.
+	The reader uses the spatial rcu_dereference() macro to fetch
+	an RCU-protected pointer, which returns a value that may
+	then be safely dereferenced.  Note that rcu_dereference()
+	does not actually dereference the pointer, instead, it
+	protects the pointer for later dereferencing.  It also
+	executes any needed memory-barrier instructions for a given
+	CPU architecture.  Currently, only Alpha needs memory barriers
+	within rcu_dereference() -- on other CPUs, it compiles to a
+	volatile load.
 
 	Common coding practice uses rcu_dereference() to copy an
 	RCU-protected pointer to a local variable, then dereferences
@@ -353,12 +376,15 @@ reader, updater, and reclaimer.
 	      synchronize_rcu() & call_rcu()
 
 
-The RCU infrastructure observes the time sequence of rcu_read_lock(),
+The RCU infrastructure observes the temporal sequence of rcu_read_lock(),
 rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
 order to determine when (1) synchronize_rcu() invocations may return
 to their callers and (2) call_rcu() callbacks may be invoked.  Efficient
 implementations of the RCU infrastructure make heavy use of batching in
 order to amortize their overhead over many uses of the corresponding APIs.
+The rcu_assign_pointer() and rcu_dereference() invocations communicate
+spatial changes via stores to and loads from the RCU-protected pointer in
+question.
 
 There are at least three flavors of RCU usage in the Linux kernel. The diagram
 above shows the most common one. On the updater side, the rcu_assign_pointer(),
@@ -390,7 +416,9 @@ b.	RCU applied to networking data structures that may be subjected
 c.	RCU applied to scheduler and interrupt/NMI-handler tasks.
 
 Again, most uses will be of (a).  The (b) and (c) cases are important
-for specialized uses, but are relatively uncommon.
+for specialized uses, but are relatively uncommon.  The SRCU, RCU-Tasks,
+RCU-Tasks-Rude, and RCU-Tasks-Trace have similar relationships among
+their assorted primitives.
 
 .. _3_whatisRCU:
 
@@ -466,7 +494,7 @@ So, to sum up:
 -	Within an RCU read-side critical section, use rcu_dereference()
 	to dereference RCU-protected pointers.
 
--	Use some solid scheme (such as locks or semaphores) to
+-	Use some solid design (such as locks or semaphores) to
 	keep concurrent updates from interfering with each other.
 
 -	Use rcu_assign_pointer() to update an RCU-protected pointer.
@@ -577,6 +605,14 @@ to avoid having to write your own callback::
 
 	kfree_rcu(old_fp, rcu);
 
+If the occasional sleep is permitted, the single-argument form may
+be used, omitting the rcu_head structure from struct foo.
+
+	kfree_rcu_mightsleep(old_fp);
+
+This variant almost never blocks, but might do so by invoking
+synchronize_rcu() in response to memory-allocation failure.
+
 Again, see checklist.rst for additional rules governing the use of RCU.
 
 .. _5_whatisRCU:
@@ -594,7 +630,7 @@ lacking both functionality and performance.  However, they are useful
 in getting a feel for how RCU works.  See kernel/rcu/update.c for a
 production-quality implementation, and see:
 
-	http://www.rdrop.com/users/paulmck/RCU
+	https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit
 
 for papers describing the Linux kernel RCU implementation.  The OLS'01
 and OLS'02 papers are a good introduction, and the dissertation provides
@@ -915,13 +951,20 @@ which an RCU reference is held include:
 The understanding that RCU provides a reference that only prevents a
 change of type is particularly visible with objects allocated from a
 slab cache marked ``SLAB_TYPESAFE_BY_RCU``.  RCU operations may yield a
-reference to an object from such a cache that has been concurrently
-freed and the memory reallocated to a completely different object,
-though of the same type.  In this case RCU doesn't even protect the
-identity of the object from changing, only its type.  So the object
-found may not be the one expected, but it will be one where it is safe
-to take a reference or spinlock and then confirm that the identity
-matches the expectations.
+reference to an object from such a cache that has been concurrently freed
+and the memory reallocated to a completely different object, though of
+the same type.  In this case RCU doesn't even protect the identity of the
+object from changing, only its type.  So the object found may not be the
+one expected, but it will be one where it is safe to take a reference
+(and then potentially acquiring a spinlock), allowing subsequent code
+to check whether the identity matches expectations.  It is tempting
+to simply acquire the spinlock without first taking the reference, but
+unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
+initialized after each and every call to kmem_cache_alloc(), which renders
+reference-free spinlock acquisition completely unsafe.  Therefore, when
+using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
+(Those willing to initialize their locks in a kmem_cache constructor
+may also use locking, including cache-friendly sequence locking.)
 
 With traditional reference counting -- such as that implemented by the
 kref library in Linux -- there is typically code that runs when the last
@@ -1040,6 +1083,30 @@ sched::
 	rcu_read_lock_sched_held
 
 
+RCU-Tasks::
+
+	Critical sections	Grace period		Barrier
+
+	N/A			call_rcu_tasks		rcu_barrier_tasks
+				synchronize_rcu_tasks
+
+
+RCU-Tasks-Rude::
+
+	Critical sections	Grace period		Barrier
+
+	N/A			call_rcu_tasks_rude	rcu_barrier_tasks_rude
+				synchronize_rcu_tasks_rude
+
+
+RCU-Tasks-Trace::
+
+	Critical sections	Grace period		Barrier
+
+	rcu_read_lock_trace	call_rcu_tasks_trace	rcu_barrier_tasks_trace
+	rcu_read_unlock_trace	synchronize_rcu_tasks_trace
+
+
 SRCU::
 
 	Critical sections	Grace period		Barrier
@@ -1057,13 +1124,18 @@ SRCU: Initialization/cleanup::
 	init_srcu_struct
 	cleanup_srcu_struct
 
-All: lockdep-checked RCU-protected pointer access::
+All: lockdep-checked RCU utility APIs::
 
-	rcu_access_pointer
-	rcu_dereference_raw
 	RCU_LOCKDEP_WARN
 	rcu_sleep_check
-	RCU_NONIDLE
+
+All: Unchecked RCU-protected pointer access::
+
+	rcu_dereference_raw
+
+All: Unchecked RCU-protected pointer access with dereferencing prohibited::
+
+	rcu_access_pointer
 
 See the comment headers in the source code (or the docbook generated
 from them) for more information.
@@ -1074,35 +1146,43 @@ list can be helpful:
 
 a.	Will readers need to block?  If so, you need SRCU.
 
-b.	What about the -rt patchset?  If readers would need to block
-	in an non-rt kernel, you need SRCU.  If readers would block
-	in a -rt kernel, but not in a non-rt kernel, SRCU is not
-	necessary.  (The -rt patchset turns spinlocks into sleeplocks,
-	hence this distinction.)
+b.	Will readers need to block and are you doing tracing, for
+	example, ftrace or BPF?  If so, you need RCU-tasks,
+	RCU-tasks-rude, and/or RCU-tasks-trace.
+
+c.	What about the -rt patchset?  If readers would need to block in
+	an non-rt kernel, you need SRCU.  If readers would block when
+	acquiring spinlocks in a -rt kernel, but not in a non-rt kernel,
+	SRCU is not necessary.	(The -rt patchset turns spinlocks into
+	sleeplocks, hence this distinction.)
 
-c.	Do you need to treat NMI handlers, hardirq handlers,
+d.	Do you need to treat NMI handlers, hardirq handlers,
 	and code segments with preemption disabled (whether
 	via preempt_disable(), local_irq_save(), local_bh_disable(),
 	or some other mechanism) as if they were explicit RCU readers?
-	If so, RCU-sched is the only choice that will work for you.
-
-d.	Do you need RCU grace periods to complete even in the face
-	of softirq monopolization of one or more of the CPUs?  For
-	example, is your code subject to network-based denial-of-service
-	attacks?  If so, you should disable softirq across your readers,
-	for example, by using rcu_read_lock_bh().
-
-e.	Is your workload too update-intensive for normal use of
+	If so, RCU-sched readers are the only choice that will work
+	for you, but since about v4.20 you use can use the vanilla RCU
+	update primitives.
+
+e.	Do you need RCU grace periods to complete even in the face of
+	softirq monopolization of one or more of the CPUs?  For example,
+	is your code subject to network-based denial-of-service attacks?
+	If so, you should disable softirq across your readers, for
+	example, by using rcu_read_lock_bh().  Since about v4.20 you
+	use can use the vanilla RCU update primitives.
+
+f.	Is your workload too update-intensive for normal use of
 	RCU, but inappropriate for other synchronization mechanisms?
 	If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
 	named SLAB_DESTROY_BY_RCU).  But please be careful!
 
-f.	Do you need read-side critical sections that are respected
-	even though they are in the middle of the idle loop, during
-	user-mode execution, or on an offlined CPU?  If so, SRCU is the
-	only choice that will work for you.
+g.	Do you need read-side critical sections that are respected even
+	on CPUs that are deep in the idle loop, during entry to or exit
+	from user-mode execution, or on an offlined CPU?  If so, SRCU
+	and RCU Tasks Trace are the only choices that will work for you,
+	with SRCU being strongly preferred in almost all cases.
 
-g.	Otherwise, use RCU.
+h.	Otherwise, use RCU.
 
 Of course, this all assumes that you have determined that RCU is in fact
 the right tool for your job.