aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/rcu/tree.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-05-27rcu: Create an immutable rcu_data_p pointer to default rcu_data structurePaul E. McKenney1-0/+1
This commit creates an immutable rcu_data_p pointer that references rcu_preempt_data for TREE_PREEMPT_RCU builds and that references rcu_sched_data for TREE_RCU builds. This rcu_data_p pointer will enable more code to move from #ifdef to IS_ENABLED(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Tell the compiler that rcu_state_p is immutablePaul E. McKenney1-1/+1
This commit adds a "const" tag to the declarations of rcu_state_p, which should allow the compiler to generate better code and also to catch erroneous assignments to this variable. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()Paul E. McKenney1-92/+92
This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE() and WRITE_ONCE() APIs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Updated to include kernel/torture.c as suggested by Jason Low. ]
2015-05-13tracing: Rename ftrace_event.h to trace_events.hSteven Rostedt (Red Hat)1-1/+1
The term "ftrace" is really the infrastructure of the function hooks, and not the trace events. Rename ftrace_event.h to trace_events.h to represent the trace_event infrastructure and decouple the term ftrace from it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-04-14rcu: Control grace-period delays directly from valuePaul E. McKenney1-7/+9
In a misguided attempt to avoid an #ifdef, the use of the gp_init_delay module parameter was conditioned on the corresponding RCU_TORTURE_TEST_SLOW_INIT Kconfig variable, using IS_ENABLED() at the point of use in the code. This meant that the compiler always saw the delay, which meant that RCU_TORTURE_TEST_SLOW_INIT_DELAY had to be unconditionally defined. This in turn caused "make oldconfig" to ask pointless questions about the value of RCU_TORTURE_TEST_SLOW_INIT_DELAY in cases where it was not even used. This commit avoids these pointless questions by defining gp_init_delay under #ifdef. In one branch, gp_init_delay is initialized to RCU_TORTURE_TEST_SLOW_INIT_DELAY and is also a module parameter (thus allowing boot-time modification), and in the other branch gp_init_delay is a const variable initialized by default to zero. This approach also simplifies the code at the delay point by eliminating the IS_DEFINED(). Because gp_init_delay is constant zero in the no-delay case intended for production use, the "gp_init_delay > 0" check causes the delay to become dead code, as desired in this case. In addition, this commit replaces magic constant "10" with the preprocessor variable PER_RCU_NODE_PERIOD, which controls the number of grace periods that are allowed to elapse at full speed before a delay is inserted. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-20Merge branches 'doc.2015.02.26a', 'earlycb.2015.03.03a', 'fixes.2015.03.03a', 'gpexp.2015.02.26a', 'hotplug.2015.03.20a', 'sysidle.2015.02.26b' and 'tiny.2015.02.26a' into HEADPaul E. McKenney1-126/+311
doc.2015.02.26a: Documentation changes earlycb.2015.03.03a: Permit early-boot RCU callbacks fixes.2015.03.03a: Miscellaneous fixes gpexp.2015.02.26a: In-kernel expediting of normal grace periods hotplug.2015.03.20a: CPU hotplug fixes sysidle.2015.02.26b: NO_HZ_FULL_SYSIDLE fixes tiny.2015.02.26a: TINY_RCU fixes
2015-03-20rcu: Associate quiescent-state reports with grace periodPaul E. McKenney1-12/+22
As noted in earlier commit logs, CPU hotplug operations running concurrently with grace-period initialization can result in a given leaf rcu_node structure having all CPUs offline and no blocked readers, but with this rcu_node structure nevertheless blocking the current grace period. Therefore, the quiescent-state forcing code now checks for this situation and repairs it. Unfortunately, this checking can result in false positives, for example, when the last task has just removed itself from this leaf rcu_node structure, but has not yet started clearing the ->qsmask bits further up the structure. This means that the grace-period kthread (which forces quiescent states) and some other task might be attempting to concurrently clear these ->qsmask bits. This is usually not a problem: One of these tasks will be the first to acquire the upper-level rcu_node structure's lock and with therefore clear the bit, and the other task, seeing the bit already cleared, will stop trying to clear bits. Sadly, this means that the following unusual sequence of events -can- result in a problem: 1. The grace-period kthread wins, and clears the ->qsmask bits. 2. This is the last thing blocking the current grace period, so that the grace-period kthread clears ->qsmask bits all the way to the root and finds that the root ->qsmask field is now zero. 3. Another grace period is required, so that the grace period kthread initializes it, including setting all the needed qsmask bits. 4. The leaf rcu_node structure (the one that started this whole mess) is blocking this new grace period, either because it has at least one online CPU or because there is at least one task that had blocked within an RCU read-side critical section while running on one of this leaf rcu_node structure's CPUs. (And yes, that CPU might well have gone offline before the grace period in step (3) above started, which can mean that there is a task on the leaf rcu_node structure's ->blkd_tasks list, but ->qsmask equal to zero.) 5. The other kthread didn't get around to trying to clear the upper level ->qsmask bits until all the above had happened. This means that it now sees bits set in the upper-level ->qsmask field, so it proceeds to clear them. Too bad that it is doing so on behalf of a quiescent state that does not apply to the current grace period! This sequence of events can result in the new grace period being too short. It can also result in the new grace period ending before the leaf rcu_node structure's ->qsmask bits have been cleared, which will result in splats during initialization of the next grace period. In addition, it can result in tasks blocking the new grace period still being queued at the start of the next grace period, which will result in other splats. Sasha's testing turned up another of these splats, as did rcutorture testing. (And yes, rcutorture is being adjusted to make these splats show up more quickly. Which probably is having the undesirable side effect of making other problems show up less quickly. Can't have everything!) Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 4.0.x Tested-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-20rcu: Yet another fix for preemption and CPU hotplugPaul E. McKenney1-16/+27
As noted earlier, the following sequence of events can occur when running PREEMPT_RCU and HOTPLUG_CPU on a system with a multi-level rcu_node combining tree: 1. A group of tasks block on CPUs corresponding to a given leaf rcu_node structure while within RCU read-side critical sections. 2. All CPUs corrsponding to that rcu_node structure go offline. 3. The next grace period starts, but because there are still tasks blocked, the upper-level bits corresponding to this leaf rcu_node structure remain set. 4. All the tasks exit their RCU read-side critical sections and remove themselves from the leaf rcu_node structure's list, leaving it empty. 5. But because there now is code to check for this condition at force-quiescent-state time, the upper bits are cleared and the grace period completes. However, there is another complication that can occur following step 4 above: 4a. The grace period starts, and the leaf rcu_node structure's gp_tasks pointer is set to NULL because there are no tasks blocked on this structure. 4b. One of the CPUs corresponding to the leaf rcu_node structure comes back online. 4b. An endless stream of tasks are preempted within RCU read-side critical sections on this CPU, such that the ->blkd_tasks list is always non-empty. The grace period will never end. This commit therefore makes the force-quiescent-state processing check only for absence of tasks blocking the current grace period rather than absence of tasks altogether. This will cause a quiescent state to be reported if the current leaf rcu_node structure is not blocking the current grace period and its parent thinks that it is, regardless of how RCU managed to get itself into this state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 4.0.x Tested-by: Sasha Levin <sasha.levin@oracle.com>
2015-03-12rcu: Add diagnostics to grace-period cleanupPaul E. McKenney1-0/+2
At grace-period initialization time, RCU checks that all quiescent states were really reported for the previous grace period. Now that grace-period cleanup has been split out of grace-period initialization, this commit also performs those checks at grace-period cleanup time. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12rcu: Handle outgoing CPUs on exit from idle loopPaul E. McKenney1-10/+31
This commit informs RCU of an outgoing CPU just before that CPU invokes arch_cpu_idle_dead() during its last pass through the idle loop (via a new CPU_DYING_IDLE notifier value). This change means that RCU need not deal with outgoing CPUs passing through the scheduler after informing RCU that they are no longer online. Note that removing the CPU from the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time, and orphaning callbacks is still done at CPU_DEAD time, the reason being that at CPU_DEAD time we have another CPU that can adopt them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12rcu: Eliminate ->onoff_mutex from rcu_node structurePaul E. McKenney1-15/+0
Because that RCU grace-period initialization need no longer exclude CPU-hotplug operations, this commit eliminates the ->onoff_mutex and its uses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12rcu: Process offlining and onlining only at grace-period startPaul E. McKenney1-33/+121
Races between CPU hotplug and grace periods can be difficult to resolve, so the ->onoff_mutex is used to exclude the two events. Unfortunately, this means that it is impossible for an outgoing CPU to perform the last bits of its offlining from its last pass through the idle loop, because sleeplocks cannot be acquired in that context. This commit avoids these problems by buffering online and offline events in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a grace period starts, the events accumulated in this mask are applied to the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special case of all CPUs corresponding to a given leaf rcu_node structure being offline while there are still elements in that structure's ->blkd_tasks list is handled using a new ->wait_blkd_tasks field. In this case, propagating the offline bits up the tree is deferred until the beginning of the grace period after all of the tasks have exited their RCU read-side critical sections and removed themselves from the list, at which point the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node structure's CPUs comes back online before the list empties, then the ->wait_blkd_tasks flag is simply cleared. This of course means that RCU's notion of which CPUs are offline can be out of date. This is OK because RCU need only wait on CPUs that were online at the time that the grace period started. In addition, RCU's force-quiescent-state actions will handle the case where a CPU goes offline after the grace period starts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12rcu: Move rcu_report_unblock_qs_rnp() to common codePaul E. McKenney1-0/+39
The rcu_report_unblock_qs_rnp() function is invoked when the last task blocking the current grace period exits its outermost RCU read-side critical section. Previously, this was called only from rcu_read_unlock_special(), and was therefore defined only when CONFIG_RCU_PREEMPT=y. However, this function will be invoked even when CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at the beginnings of RCU grace periods. The reason for this change is that the last task on a given leaf rcu_node structure's ->blkd_tasks list might well exit its RCU read-side critical section between the time that recent CPU-hotplug operations were applied and when the new grace period was initialized. This situation could result in RCU waiting forever on that leaf rcu_node structure, because if all that structure's CPUs were already offline, there would be no quiescent-state events to drive that structure's part of the grace period. This commit therefore moves rcu_report_unblock_qs_rnp() to common code that is built unconditionally so that the quiescent-state-forcing code can clean up after this situation, avoiding the grace-period stall. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUsPaul E. McKenney1-2/+0
Offline CPUs cannot safely invoke trace events, but such CPUs do execute within rcu_cpu_notify(). Therefore, this commit removes the trace events from rcu_cpu_notify(). These trace events are for utilization, against which rcu_cpu_notify() execution time should be negligible. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11rcu: Provide diagnostic option to slow down grace-period initializationPaul E. McKenney1-0/+10
Grace-period initialization normally proceeds quite quickly, so that it is very difficult to reproduce races against grace-period initialization. This commit therefore allows grace-period initialization to be artificially slowed down, increasing race-reproduction probability. A pair of new Kconfig parameters are provided, CONFIG_RCU_TORTURE_TEST_SLOW_INIT to enable the slowdowns, and CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY to specify the number of jiffies of slowdown to apply. A boot-time parameter named rcutree.gp_init_delay allows boot-time delay to be specified. By default, no delay will be applied even if CONFIG_RCU_TORTURE_TEST_SLOW_INIT is set. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11rcu: Detect stalls caused by failure to propagate up rcu_node treePaul E. McKenney1-2/+3
If all CPUs have passed through quiescent states, then stalls might be due to starvation of the grace-period kthread or to failure to propagate the quiescent states up the rcu_node combining tree. The current stall warning messages do not differentiate, so this commit adds a printout of the root rcu_node structure's ->qsmask field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11rcu: Put all orphan-callback-related code under same commentPaul E. McKenney1-1/+1
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-11rcu: Consolidate offline-CPU callback initializationPaul E. McKenney1-4/+5
Currently, both rcu_cleanup_dead_cpu() and rcu_send_cbs_to_orphanage() initialize the outgoing CPU's callback list. However, only rcu_cleanup_dead_cpu() invokes rcu_send_cbs_to_orphanage(), and it does so unconditionally, which means that only one of these initializations is required. This commit therefore consolidates the callback-list initialization with the rest of the callback handling in rcu_send_cbs_to_orphanage(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Remove redundant check of cpu_online()Yao Dongdong1-1/+1
Because invoke_cpu_core() checks whether the current CPU is online, there is no need for __call_rcu_core() to redundantly check it. There should not be any performance degradation because the called function is visible to the compiler. This commit therefore removes the redundant check. Signed-off-by: Yao Dongdong <yaodongdong@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Get rcu_sched_force_quiescent_state() where it belongsPaul E. McKenney1-9/+9
The very similar functions rcu_force_quiescent_state(), rcu_bh_force_quiescent_state(), and rcu_sched_force_quiescent_state() are supposed to be together, but have drifted apart. This commit restores rcu_sched_force_quiescent_state() to its rightful place. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Use IS_ENABLED() to CONFIG_RCU_FANOUT_EXACT #ifdefPaul E. McKenney1-18/+14
This commit uses IS_ENABLED() to remove the #ifdef from the rcu_init_levelspread() functions. No effect on executable code. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Move early boot callback tests earlierPaul E. McKenney1-2/+2
Because callbacks can now be posted quite early in boot, move the early boot callback tests to precede RCU initialization. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-03rcu: Move early-boot callbacks to no-CBs lists for no-CBs CPUsPaul E. McKenney1-0/+1
When a CPU is first determined to be a no-CBs CPUs, this commit causes any early boot callbacks to be moved to the no-CBs callback list, allowing them to be invoked. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Tighten up affinity and check for sysidlePaul E. McKenney1-1/+1
If the RCU grace-period kthread invoking rcu_sysidle_check_cpu() happens to be running on the tick_do_timer_cpu initially, then rcu_bind_gp_kthread() won't bind it. This kthread might then migrate before invoking rcu_gp_fqs(), which will trigger the WARN_ON_ONCE() in rcu_sysidle_check_cpu(). This commit therefore makes rcu_bind_gp_kthread() do the binding even if the kthread is currently on the same CPU. Because this incurs added overhead, this commit also causes each RCU grace-period kthread to invoke rcu_bind_gp_kthread() once at boot rather than at the beginning of each grace period. And as long as rcu_bind_gp_kthread() is being modified, this commit eliminates its #ifdef. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Fixes to NO_HZ_FULL sysidle accountingPaul E. McKenney1-3/+3
On second and subsequent passes through quiescent-state forcing, the isidle variable was initialized to false, which would prevent full sysidle state from being reached if a grace period needed more than one round of quiescent-state forcing (which most should not). However, the check for offline CPUs in the quiescent-state forcing main loop had the wrong sense, which could prevent CPUs from ever entering full sysidle state. This commit fixes both of these bugs. Given that sysidle is not yet wired up, this has no effect in old kernels, but might have proven frustrating had anyone attempted to wire it up. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Update from rcu_expedited variable to rcu_gp_is_expedited()Paul E. McKenney1-4/+5
This commit updates open-coded tests of the rcu_expedited variable to instead use rcu_gp_is_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Fix a couple of typos in rcu_all_qs() comment headerPaul E. McKenney1-2/+2
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Avoid clobbering early boot callbacksPaul E. McKenney1-1/+2
When a CPU comes online, it initializes its callback list. This is a bad thing if this is the first time that the CPU has come online and if that CPU has early boot callbacks. This commit therefore avoid initializing the callback list if there are callbacks present, in which case the initial call_rcu() did the initialization for us. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Prevent early-boot RCU callbacks from splattingPaul E. McKenney1-5/+15
Currently, a call_rcu() that precedes rcu_init() will splat due to the callback lists not having yet been initialized. This commit causes the first such callback to initialize the boot CPU's RCU callback list. Note that this commit does not change rcu_init()-time initialization, which means that the callback will be discarded at rcu_init() time. Fixing this is the job of later commits. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Wire ->rda pointers at compile timePaul E. McKenney1-3/+3
This commit wires up the rcu_state structures' ->rda pointers to the per-CPU rcu_data structures at compile time, thus ensuring that this linkage is present at early boot, in turn allowing posting of callbacks before rcu_init() is executed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26rcu: Abstract default callback-list initialization from init_callback_list()Paul E. McKenney1-4/+14
In preparation for early-boot posting of callbacks, this commit abstracts initialization of the default (non-no-CB) callbacks list from the init_callback_list() function into a new init_default_callback_list() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-25rcu_tree: Avoid touching rnp->completed when a new GP is startedLai Jiangshan1-2/+2
In rcu_gp_init(), rnp->completed equals to rsp->completed in THEORY, we don't need to touch it normally. If something goes wrong, it will complain and fixup rnp->completed and avoid oops. This commit thus avoids the normal needless store to rnp->completed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEADPaul E. McKenney1-80/+209
doc.2015.01.07a: Documentation updates. fixes.2015.01.15a: Miscellaneous fixes. preempt.2015.01.06a: Changes to handling of lists of preempted tasks. srcu.2015.01.06a: SRCU updates. stall.2015.01.16a: RCU CPU stall-warning updates and fixes. torture.2015.01.11a: RCU torture-test updates and fixes.
2015-01-15rcu: Add GP-kthread-starvation checks to CPU stall warningsPaul E. McKenney1-1/+20
This commit adds a message that is printed if the relevant grace-period kthread has not been able to run for the two seconds preceding the stall warning. (The two seconds is double the maximum interval between successive bouts of quiescent-state forcing.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15rcu: Make cond_resched_rcu_qs() apply to normal RCU flavorsPaul E. McKenney1-5/+33
Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used in places where it would be useful for it to apply to the normal RCU flavors, rcu_preempt, rcu_sched, and rcu_bh. This is especially the case for workloads that aggressively overload the system, particularly those that generate large numbers of RCU updates on systems running NO_HZ_FULL CPUs. This commit therefore communicates quiescent states from cond_resched_rcu_qs() to the normal RCU flavors. Note that it is unfortunately necessary to leave the old ->passed_quiesce mechanism in place to allow quiescent states that apply to only one flavor to be recorded. (Yes, we could decrement ->rcu_qs_ctr_snap in that case, but that is not so good for debugging of RCU internals.) In addition, if one of the RCU flavor's grace period has stalled, this will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight quiescent state visible from other CPUs. Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu() was used in preemptible code. ]
2015-01-15rcu: Optionally run grace-period kthreads at real-time priorityPaul E. McKenney1-1/+23
Recent testing has shown that under heavy load, running RCU's grace-period kthreads at real-time priority can improve performance (according to 0day test robot) and reduce the incidence of RCU CPU stall warnings. However, most systems do just fine with the default non-realtime priorities for these kthreads, and it does not make sense to expose the entire user base to any risk stemming from this change, given that this change is of use only to a few users running extremely heavy workloads. Therefore, this commit allows users to specify realtime priorities for the grace-period kthreads, but leaves them running SCHED_OTHER by default. The realtime priority may be specified at build time via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the rcutree.kthread_prio parameter. Either way, 0 says to continue the default SCHED_OTHER behavior and values from 1-99 specify that priority of SCHED_FIFO behavior. Note that a value of 0 is not permitted when the RCU_BOOST Kconfig parameter is specified. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10rcutorture: Check from beginning to end of grace periodPaul E. McKenney1-2/+38
Currently, rcutorture's Reader Batch checks measure from the end of the previous grace period to the end of the current one. This commit tightens up these checks by measuring from the start and end of the same grace period. This involves adding rcu_batches_started() and friends corresponding to the existing rcu_batches_completed() and friends. We leave SRCU alone for the moment, as it does not yet have a way of tracking both ends of its grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10rcu: Make _batches_completed() functions return unsigned longPaul E. McKenney1-2/+2
Long ago, the various ->completed fields were of type long, but now are unsigned long due to signed-integer-overflow concerns. However, the various _batches_completed() functions remained of type long, even though their only purpose in life is to return the corresponding ->completed field. This patch cleans this up by changing these functions' return types to unsigned long. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Handle gpnum/completed wrap while dyntick idlePaul E. McKenney1-5/+12
Subtle race conditions can result if a CPU stays in dyntick-idle mode long enough for the ->gpnum and ->completed fields to wrap. For example, consider the following sequence of events: o CPU 1 encounters a quiescent state while waiting for grace period 5 to complete, but then enters dyntick-idle mode. o While CPU 1 is in dyntick-idle mode, the grace-period counters wrap around so that the grace period number is now 4. o Just as CPU 1 exits dyntick-idle mode, grace period 4 completes and grace period 5 begins. o The quiescent state that CPU 1 passed through during the old grace period 5 looks like it applies to the new grace period 5. Therefore, the new grace period 5 completes without CPU 1 having passed through a quiescent state. This could clearly be a fatal surprise to any long-running RCU read-side critical section that happened to be running on CPU 1 at the time. At one time, this was not a problem, given that it takes significant time for the grace-period counters to overflow even on 32-bit systems. However, with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long idle periods are now becoming quite feasible. It is therefore time to close this race. This commit therefore avoids this race condition by having the quiescent-state forcing code detect when a CPU is falling too far behind, and setting a new rcu_data field ->gpwrap when this happens. Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed fields are known to be untrustworthy, and can be ignored, along with any associated quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Improve diagnostics for spurious RCU CPU stall warningsPaul E. McKenney1-5/+27
The current RCU CPU stall warning code will print "Stall ended before state dump start" any time that the stall-warning code is triggered on a CPU that has already reported a quiescent state for the current grace period and if all quiescent states have been reported for the current grace period. However, a true stall can result in these symptoms, for example, by preventing RCU's grace-period kthreads from ever running This commit therefore checks for this condition, reporting the end of the stall only if one of the grace-period counters has actually advanced. Otherwise, it reports the last time that the grace-period kthread made meaningful progress. (In normal situations, the grace-period kthread should make meaningful progress at least every jiffies_till_next_fqs jiffies.) Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Miroslav Benes <mbenes@suse.cz>
2015-01-06rcu: Make RCU_CPU_STALL_INFO include number of fqs attemptsPaul E. McKenney1-0/+1
One way that an RCU CPU stall warning can happen is if the grace-period kthread is not allowed to execute. One proxy for this kthread's forward progress is the number of force-quiescent-state (fqs) scans. This commit therefore adds the number of fqs scans to the RCU CPU stall warning printouts when CONFIG_RCU_CPU_STALL_INFO=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Remove redundant callback-list initializationPaul E. McKenney1-3/+0
The RCU callback lists are initialized in both rcu_boot_init_percpu_data() and rcu_init_percpu_data(). The former is intended for initializing immutable data, so this commit removes the initialization from rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data(). This change prepares for permitting callbacks to be queued very early in boot. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Don't scan root rcu_node structure for stalled tasksPaul E. McKenney1-9/+0
Now that blocked tasks are no longer migrated to the root rcu_node structure, there is no need to scan the root rcu_node structure for blocked tasks stalling the current grace period. This commit therefore removes this scan. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Note quiescent state when CPU goes offlinePaul E. McKenney1-1/+1
The rcu_cleanup_dead_cpu() function (called after a CPU has gone completely offline) has not reported a quiescent state because there was probably at least one synchronize_rcu() between the time the CPU went offline and the CPU_DEAD notifier, and this would have detected the CPU's offline state via quiescent-state forcing. However, the plan is for CPUs to take themselves offline, at which point it makes sense for them to report their own quiescent state. This commit makes this change in preparation for the new CPU-hotplug setup. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Don't initiate RCU priority boosting on root rcu_nodePaul E. McKenney1-6/+0
Because there is no longer any preempted tasks on the root rcu_node, and because there is no longer ever an rcub kthread for the root rcu_node, this commit drops the code in force_qs_rnp() that attempts to awaken the non-existent root rcub kthread. This is strictly a performance enhancement, removing a root rcu_node ->lock acquisition and release along with some tests in rcu_initiate_boost(), ending with the test that notes that there is no rcub kthread. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu()Paul E. McKenney1-2/+2
Now that we are not migrating callbacks, there is no need to hold the ->orphan_lock across the the ->qsmaskinit bit-clearing process. This commit therefore releases ->orphan_lock immediately after adopting the orphaned RCU callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Don't migrate blocked tasks even if all corresponding CPUs offlinePaul E. McKenney1-18/+2
When the last CPU associated with a given leaf rcu_node structure goes offline, something must be done about the tasks queued on that rcu_node structure. Each of these tasks has been preempted on one of the leaf rcu_node structure's CPUs while in an RCU read-side critical section that it have not yet exited. Handling these tasks is the job of rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node structure to the root rcu_node structure. Unfortunately, this migration has to be done one task at a time because each tasks allegiance must be shifted from the original leaf rcu_node to the root, so that future attempts to deal with these tasks will acquire the root rcu_node structure's ->lock rather than that of the leaf. Worse yet, this migration must be done with interrupts disabled, which is not so good for realtime response, especially given that there is no bound on the number of tasks on a given rcu_node structure's list. (OK, OK, there is a bound, it is just that it is unreasonably large, especially on 64-bit systems.) This was not considered a problem back when rcu_preempt_offline_tasks() was first written because realtime systems were assumed not to do CPU-hotplug operations while real-time applications were running. This assumption has proved of dubious validity given that people are starting to run multiple realtime applications on a single SMP system and that it is common practice to offline then online a CPU before starting its real-time application in order to clear extraneous processing off of that CPU. So we now need CPU hotplug operations to avoid undue latencies. This commit therefore avoids migrating these tasks, instead letting them be dequeued one by one from the original leaf rcu_node structure by rcu_read_unlock_special(). This means that the clearing of bits from the upper-level rcu_node structures must be deferred until the last such task has been dequeued, because otherwise subsequent grace periods won't wait on them. This commit has the beneficial side effect of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in CONFIG_RCU_BOOST builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearingPaul E. McKenney1-0/+4
This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit bit clearing up the rcu_node tree once a given rcu_node structure's blkd_tasks list becomes empty. This is the final commit in preparation for the rework of RCU priority boosting: It enables preempted tasks to remain queued on their rcu_node structure even after all of that rcu_node structure's CPUs have gone offline. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()Paul E. McKenney1-19/+48
This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() in preparation for the rework of RCU priority boosting. This new function will be invoked from rcu_read_unlock_special() in the reworked scheme, which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node structure's ->qsmaskinit field has already been updated. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Fix rcu_barrier() race that could result in too-short waitPaul E. McKenney1-0/+1
The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions. It checks a given CPU's lists of callbacks, and if all three no-CBs lists are empty, ignores that CPU. However, these three lists could potentially be empty even when callbacks are present if the check executed just as the callbacks were being moved from one list to another. It turns out that recent versions of rcutorture can spot this race. This commit plugs this hole by consolidating the per-list counts of no-CBs callbacks into a single count, which is incremented before the corresponding callback is posted and after it is invoked. Then rcu_barrier() checks this single count to reliably determine whether the corresponding CPU has no-CBs callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>