aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/rcu/tree.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-10-07Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEADPaul E. McKenney1-20/+28
exp.2015.10.07a: Reduce OS jitter of RCU-sched expedited grace periods. fixes.2015.10.06a: Miscellaneous fixes.
2015-10-07rcu: Add online/offline info to expedited stall warning messagePaul E. McKenney1-0/+1
This commit makes the RCU CPU stall warning message print online/offline indications immediately after the CPU number. A "O" indicates global offline, a "." global online, and a "o" indicates RCU believes that the CPU is offline for the current grace period and "." otherwise, and an "N" indicates that RCU believes that the CPU will be offline for the next grace period, and "." otherwise, all right after the CPU number. So for CPU 10, you would normally see "10-...:" indicating that everything believes that the CPU is online. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07rcu: Stop silencing lockdep false positive for expedited grace periodsPaul E. McKenney1-8/+0
This reverts commit af859beaaba4 (rcu: Silence lockdep false positive for expedited grace periods). Because synchronize_rcu_expedited() no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex acquisition is no longer nested, so the false positive no longer happens. This commit therefore removes the extra lockdep data structures, as they are no longer needed.
2015-10-07rcu: Switch synchronize_sched_expedited() to IPIPaul E. McKenney1-3/+0
This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait() to smp_call_function_single(), thus moving from an IPI and a pair of context switches to an IPI and a single pass through the scheduler. Of course, if the scheduler actually does decide to switch to a different task, there will still be a pair of context switches, but there would likely have been a pair of context switches anyway, just a bit later. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06rcu: Correct comment for values of ->gp_state fieldPaul E. McKenney1-1/+1
This commit corrects the comment for the values of the ->gp_state field, which previously incorrectly said that these were for the ->gp_flags field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu: Finish folding ->fqs_state into ->gp_statePetr Mladek1-11/+3
Commit commit 4cdfc175c25c89ee ("rcu: Move quiescent-state forcing into kthread") started the process of folding the old ->fqs_state into ->gp_state, but did not complete it. This situation does not cause any malfunction, but can result in extremely confusing trace output. This commit completes this task of eliminating ->fqs_state in favor of ->gp_state. The old ->fqs_state was also used to decide when to collect dyntick-idle snapshots. For this purpose, we add a boolean variable into the kthread, which is set on the first call to rcu_gp_fqs() for a given grace period and clear otherwise. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu: Use call_rcu_func_t to replace explicit type equivalentsBoqun Feng1-2/+1
We have had the call_rcu_func_t typedef for a quite awhile, but we still use explicit function pointer types in some places. These types can confuse cscope and can be hard to read. This patch therefore replaces these types with the call_rcu_func_t typedef. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06rcu: Use rcu_callback_t in call_rcu*() and friendsBoqun Feng1-1/+1
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we should use it in call_rcu*() and friends as the type of parameters. This could save us a few lines of code and make it clear which function requires an rcu callbacks rather than other callbacks as its argument. Besides, this can also help cscope to generate a better database for code reading. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-09-20rcu: Make ->cpu_no_qs be a union for aggregate ORPaul E. McKenney1-1/+13
This commit converts the rcu_data structure's ->cpu_no_qs field to a union. The bytewise side of this union allows individual access to indications as to whether this CPU needs to find a quiescent state for a normal (.norm) and/or expedited (.exp) grace period. The setwise side of the union allows testing whether or not a quiescent state is needed at all, for either type of grace period. For now, only .norm is used. A later commit will introduce the expedited usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Invert passed_quiesce and rename to cpu_no_qsPaul E. McKenney1-1/+1
This commit inverts the sense of the rcu_data structure's ->passed_quiesce field and renames it to ->cpu_no_qs. This will allow a later commit to use an "aggregate OR" operation to test expedited as well as normal grace periods without added overhead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Rename qs_pending to core_needs_qsPaul E. McKenney1-2/+2
An upcoming commit needs to invert the sense of the ->passed_quiesce rcu_data structure field, so this commit is taking this opportunity to clarify things a bit by renaming ->qs_pending to ->core_needs_qs. So if !rdp->core_needs_qs, then this CPU need not concern itself with quiescent states, in particular, it need not acquire its leaf rcu_node structure's ->lock to check. Otherwise, it needs to report the next quiescent state. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Move synchronize_sched_expedited() to combining treePaul E. McKenney1-1/+0
Currently, synchronize_sched_expedited() uses a single global counter to track the number of remaining context switches that the current expedited grace period must wait on. This is problematic on large systems, where the resulting memory contention can be pathological. This commit therefore makes synchronize_sched_expedited() instead use the combining tree in the same manner as synchronize_rcu_expedited(), keeping memory contention down to a dull roar. This commit creates a temporary function sync_sched_exp_select_cpus() that is very similar to sync_rcu_exp_select_cpus(). A later commit will consolidate these two functions, which becomes possible when synchronize_sched_expedited() switches from stop_one_cpu_nowait() to smp_call_function_single(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20rcu: Consolidate tree setup for synchronize_rcu_expedited()Paul E. McKenney1-5/+12
This commit replaces sync_rcu_preempt_exp_init1(() and sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug() and sync_exp_reset_tree(), which will also be used by synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which contains code specific to synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-08-04rcu,locking: Privatize smp_mb__after_unlock_lock()Paul E. McKenney1-0/+12
RCU is the only thing that uses smp_mb__after_unlock_lock(), and is likely the only thing that ever will use it, so this commit makes this macro private to RCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
2015-08-04rcu: Silence lockdep false positive for expedited grace periodsPaul E. McKenney1-0/+8
In a CONFIG_PREEMPT=y kernel, synchronize_rcu_expedited() acquires the ->exp_funnel_mutex in rcu_preempt_state, then invokes synchronize_sched_expedited, which acquires the ->exp_funnel_mutex in rcu_sched_state. There can be no deadlock because rcu_preempt_state ->exp_funnel_mutex acquisition always precedes that of rcu_sched_state. But lockdep does not know that, so it gives false-positive splats. This commit therefore associates a separate lock_class_key structure with the rcu_sched_state structure's ->exp_funnel_mutex, allowing lockdep to see the lock ordering, avoiding the false positives. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Add fastpath bypassing funnel lockingPaul E. McKenney1-1/+1
In the common case, there will be only one expedited grace period in the system at a given time, in which case it is not helpful to use funnel locking. This commit therefore adds a fastpath that bypasses funnel locking when the root ->exp_funnel_mutex is not held. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQSPaul E. McKenney1-1/+1
The grace-period kthread sleeps waiting to do a force-quiescent-state scan, and when awakened sets rsp->gp_state to RCU_GP_DONE_FQS. However, this is confusing because the kthread has not done the force-quiescent-state, but is instead just starting to do it. This commit therefore renames RCU_GP_DONE_FQS to RCU_GP_DOING_FQS in order to make things a bit easier on reviewers. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Add stall warnings to synchronize_sched_expedited()Paul E. McKenney1-0/+1
Although synchronize_sched_expedited() historically has no RCU CPU stall warnings, the availability of the rcupdate.rcu_expedited boot parameter invalidates the old assumption that synchronize_sched()'s stall warnings would suffice. This commit therefore adds RCU CPU stall warnings to synchronize_sched_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Extend expedited funnel locking to rcu_data structurePaul E. McKenney1-1/+3
The strictly rcu_node based funnel-locking scheme works well in many cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get all that much concurrency. This commit therefore extends the funnel locking into the per-CPU rcu_data structure, providing concurrency equal to the number of CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Apply rcu_seq operations to _rcu_barrier()Paul E. McKenney1-1/+1
The rcu_seq operations were open-coded in _rcu_barrier(), so this commit replaces the open-coding with the shiny new rcu_seq operations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Make expedited GP CPU stoppage asynchronousPeter Zijlstra1-0/+6
Sequentially stopping the CPUs slows down expedited grace periods by at least a factor of two, based on rcutorture's grace-period-per-second rate. This is a conservative measure because rcutorture uses unusually long RCU read-side critical sections and because rcutorture periodically quiesces the system in order to test RCU's ability to ramp down to and up from the idle state. This commit therefore replaces the stop_one_cpu() with stop_one_cpu_nowait(), using an atomic-counter scheme to determine when all CPUs have passed through the stopped state. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Get rid of synchronize_sched_expedited()'s polling loopPaul E. McKenney1-2/+6
This commit gets rid of synchronize_sched_expedited()'s mutex_trylock() polling loop in favor of a funnel-locking scheme based on the rcu_node tree. The work-done check is done at each level of the tree, allowing high-contention situations to be resolved quickly with reasonable levels of mutex contention. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Rework synchronize_sched_expedited() counter handlingPaul E. McKenney1-7/+2
Now that synchronize_sched_expedited() have a mutex, it can use simpler work-already-done detection scheme. This commit simplifies this scheme by using something similar to the sequence-locking counter scheme. A counter is incremented before and after each grace period, so that the counter is odd in the midst of the grace period and even otherwise. So if the counter has advanced to the second even number that is greater than or equal to the snapshot, the required grace period has already happened. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Switch synchronize_sched_expedited() to stop_one_cpu()Peter Zijlstra1-0/+1
The synchronize_sched_expedited() currently invokes try_stop_cpus(), which schedules the stopper kthreads on each online non-idle CPU, and waits until all those kthreads are running before letting any of them stop. This is disastrous for real-time workloads, which get hit with a preemption that is as long as the longest scheduling latency on any CPU, including any non-realtime housekeeping CPUs. This commit therefore switches to using stop_one_cpu() on each CPU in turn. This avoids inflicting the worst-case scheduling latency on the worst-case CPU onto all other CPUs, and also simplifies the code a little bit. Follow-up commits will simplify the counter-snapshotting algorithm and convert a number of the counters that are now protected by the new ->expedited_mutex to non-atomic. Signed-off-by: Peter Zijlstra <peterz@infradead.org> [ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Remove CONFIG_RCU_CPU_STALL_INFOPaul E. McKenney1-4/+0
The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of releases with no complaints, so it is time to eliminate this Kconfig option entirely, so that the long-form RCU CPU stall warnings cannot be disabled. This commit does just that. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-17rcu: Shut up bogus gcc array bounds warningAlexander Gordeev1-1/+3
Because gcc does not realize a loop would not be entered ever (i.e. in case of rcu_num_lvls == 1): for (i = 1; i < rcu_num_lvls; i++) rsp->level[i] = rsp->level[i - 1] + levelcnt[i - 1]; some compiler (pre- 5.x?) versions give a bogus warning: kernel/rcu/tree.c: In function ‘rcu_init_one.isra.55’: kernel/rcu/tree.c:4108:13: warning: array subscript is above array bounds [-Warray-bounds] rsp->level[i] = rsp->level[i - 1] + rsp->levelcnt[i - 1]; ^ Fix that warning by adding an extra item to rcu_state::level[] array. Once the bogus warning is fixed in gcc and kernel drops support of older versions, the dummy item may be removed from the array. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Simplify arithmetic to calculate number of RCU nodesAlexander Gordeev1-13/+4
This update makes arithmetic to calculate number of RCU nodes more straight and easy to read. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Limit count of static data to the number of RCU levelsAlexander Gordeev1-0/+12
Although a number of RCU levels may be less than the current maximum of four, some static data associated with each level are allocated for all four levels. As result, the extra data never get accessed and just wast memory. This update limits count of allocated items to the number of used RCU levels. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Remove unnecessary fields from rcu_state structureAlexander Gordeev1-2/+0
Members rcu_state::levelcnt[] and rcu_state::levelspread[] are only used at init. There is no reason to keep them afterwards. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Limit rcu_capacity[] size to RCU_NUM_LVLS itemsAlexander Gordeev1-2/+0
Number of items in rcu_capacity[] array is defined by macro MAX_RCU_LVLS. However, that array is never accessed beyond RCU_NUM_LVLS index. Therefore, we can limit the array to RCU_NUM_LVLS items and eliminate MAX_RCU_LVLS. As result, in most cases the memory is conserved. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Limit rcu_state::levelcnt[] to RCU_NUM_LVLS itemsAlexander Gordeev1-1/+1
Variable rcu_num_lvls is limited by RCU_NUM_LVLS macro. In turn, rcu_state::levelcnt[] array is never accessed beyond rcu_num_lvls. Thus, rcu_state::levelcnt[] is safe to limit to RCU_NUM_LVLS items. Since rcu_num_lvls could be changed during boot (as result of rcutree.rcu_fanout_leaf kernel parameter update) one might assume a new value could overflow the value of RCU_NUM_LVLS. However, that is not the case, since leaf-level fanout is only permitted to increase, resulting in rcu_num_lvls possibly to decrease. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15rcu: Provide more diagnostics for stalled GP kthreadPaul E. McKenney1-1/+5
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27Merge branches 'array.2015.05.27a', 'doc.2015.05.27a', 'fixes.2015.05.27a', 'hotplug.2015.05.27a', 'init.2015.05.27a', 'tiny.2015.05.27a' and 'torture.2015.05.27a' into HEADPaul E. McKenney1-9/+26
array.2015.05.27a: Remove all uses of RCU-protected array indexes. doc.2015.05.27a: Docuemntation updates. fixes.2015.05.27a: Miscellaneous fixes. hotplug.2015.05.27a: CPU-hotplug updates. init.2015.05.27a: Initialization/Kconfig updates. tiny.2015.05.27a: Updates to Tiny RCU. torture.2015.05.27a: Torture-testing updates.
2015-05-27rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAFPaul E. McKenney1-1/+11
This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined. The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT_LEAF depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT_LEAF unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUTPaul E. McKenney1-3/+15
This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set to 32 for 32-bit systems and 64 for 64-bit systems. This commit then makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about CONFIG_RCU_FANOUT unless they want to be. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2015-05-27rcu: Make rcu_*_data variables staticNicolas Iooss1-3/+0
rcu_bh_data, rcu_sched_data and rcu_preempt_data are never used outside kernel/rcu/tree.c and thus can be made static. Doing so fixes a section mismatch warning reported by clang when building LLVMLinux with -Wsection, because these variables were declared in .data..percpu and defined in .data..percpu..shared_aligned since commit 11bbb235c26f ("rcu: Use DEFINE_PER_CPU_SHARED_ALIGNED for rcu_data"). Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-05-27rcu: Eliminate a few RCU_BOOST #ifdefs in favor of IS_ENABLED()Paul E. McKenney1-2/+0
This commit removes a few RCU_BOOST #ifdefs, replacing them with IS_ENABLED()-protected return statements. This relies on the optimizer to remove any resulting dead code. There are several other RCU_BOOST #ifdefs, however these rely on some per-CPU variables that are available only under RCU_BOOST. These might be converted later, if the simplification proves to outweigh the increase in memory footprint. One hoped-for advantage is more easily locating compiler errors in obscure combinations of Kconfig parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-rt-users@vger.kernel.org>
2015-03-12rcu: Eliminate ->onoff_mutex from rcu_node structurePaul E. McKenney1-2/+0
Because that RCU grace-period initialization need no longer exclude CPU-hotplug operations, this commit eliminates the ->onoff_mutex and its uses. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-03-12rcu: Process offlining and onlining only at grace-period startPaul E. McKenney1-0/+9
Races between CPU hotplug and grace periods can be difficult to resolve, so the ->onoff_mutex is used to exclude the two events. Unfortunately, this means that it is impossible for an outgoing CPU to perform the last bits of its offlining from its last pass through the idle loop, because sleeplocks cannot be acquired in that context. This commit avoids these problems by buffering online and offline events in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a grace period starts, the events accumulated in this mask are applied to the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special case of all CPUs corresponding to a given leaf rcu_node structure being offline while there are still elements in that structure's ->blkd_tasks list is handled using a new ->wait_blkd_tasks field. In this case, propagating the offline bits up the tree is deferred until the beginning of the grace period after all of the tasks have exited their RCU read-side critical sections and removed themselves from the list, at which point the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node structure's CPUs comes back online before the list empties, then the ->wait_blkd_tasks flag is simply cleared. This of course means that RCU's notion of which CPUs are offline can be out of date. This is OK because RCU need only wait on CPUs that were online at the time that the grace period started. In addition, RCU's force-quiescent-state actions will handle the case where a CPU goes offline after the grace period starts. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-15Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEADPaul E. McKenney1-47/+15
doc.2015.01.07a: Documentation updates. fixes.2015.01.15a: Miscellaneous fixes. preempt.2015.01.06a: Changes to handling of lists of preempted tasks. srcu.2015.01.06a: SRCU updates. stall.2015.01.16a: RCU CPU stall-warning updates and fixes. torture.2015.01.11a: RCU torture-test updates and fixes.
2015-01-15rcu: Make cond_resched_rcu_qs() apply to normal RCU flavorsPaul E. McKenney1-0/+2
Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used in places where it would be useful for it to apply to the normal RCU flavors, rcu_preempt, rcu_sched, and rcu_bh. This is especially the case for workloads that aggressively overload the system, particularly those that generate large numbers of RCU updates on systems running NO_HZ_FULL CPUs. This commit therefore communicates quiescent states from cond_resched_rcu_qs() to the normal RCU flavors. Note that it is unfortunately necessary to leave the old ->passed_quiesce mechanism in place to allow quiescent states that apply to only one flavor to be recorded. (Yes, we could decrement ->rcu_qs_ctr_snap in that case, but that is not so good for debugging of RCU internals.) In addition, if one of the RCU flavor's grace period has stalled, this will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight quiescent state visible from other CPUs. Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu() was used in preemptible code. ]
2015-01-10rcu: Remove redundant rcu_batches_completed() declarationPaul E. McKenney1-1/+0
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-10rcu: Make _batches_completed() functions return unsigned longPaul E. McKenney1-1/+1
Long ago, the various ->completed fields were of type long, but now are unsigned long due to signed-integer-overflow concerns. However, the various _batches_completed() functions remained of type long, even though their only purpose in life is to return the corresponding ->completed field. This patch cleans this up by changing these functions' return types to unsigned long. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Handle gpnum/completed wrap while dyntick idlePaul E. McKenney1-0/+1
Subtle race conditions can result if a CPU stays in dyntick-idle mode long enough for the ->gpnum and ->completed fields to wrap. For example, consider the following sequence of events: o CPU 1 encounters a quiescent state while waiting for grace period 5 to complete, but then enters dyntick-idle mode. o While CPU 1 is in dyntick-idle mode, the grace-period counters wrap around so that the grace period number is now 4. o Just as CPU 1 exits dyntick-idle mode, grace period 4 completes and grace period 5 begins. o The quiescent state that CPU 1 passed through during the old grace period 5 looks like it applies to the new grace period 5. Therefore, the new grace period 5 completes without CPU 1 having passed through a quiescent state. This could clearly be a fatal surprise to any long-running RCU read-side critical section that happened to be running on CPU 1 at the time. At one time, this was not a problem, given that it takes significant time for the grace-period counters to overflow even on 32-bit systems. However, with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long idle periods are now becoming quite feasible. It is therefore time to close this race. This commit therefore avoids this race condition by having the quiescent-state forcing code detect when a CPU is falling too far behind, and setting a new rcu_data field ->gpwrap when this happens. Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed fields are known to be untrustworthy, and can be ignored, along with any associated quiescent states. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Improve diagnostics for spurious RCU CPU stall warningsPaul E. McKenney1-0/+2
The current RCU CPU stall warning code will print "Stall ended before state dump start" any time that the stall-warning code is triggered on a CPU that has already reported a quiescent state for the current grace period and if all quiescent states have been reported for the current grace period. However, a true stall can result in these symptoms, for example, by preventing RCU's grace-period kthreads from ever running This commit therefore checks for this condition, reporting the end of the stall only if one of the grace-period counters has actually advanced. Otherwise, it reports the last time that the grace-period kthread made meaningful progress. (In normal situations, the grace-period kthread should make meaningful progress at least every jiffies_till_next_fqs jiffies.) Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Miroslav Benes <mbenes@suse.cz>
2015-01-06rcu: Make RCU_CPU_STALL_INFO include number of fqs attemptsPaul E. McKenney1-0/+2
One way that an RCU CPU stall warning can happen is if the grace-period kthread is not allowed to execute. One proxy for this kthread's forward progress is the number of force-quiescent-state (fqs) scans. This commit therefore adds the number of fqs scans to the RCU CPU stall warning printouts when CONFIG_RCU_CPU_STALL_INFO=y. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversionLai Jiangshan1-5/+0
The patch dfeb9765ce3c ("Allow post-unlock reference for rt_mutex") ensured rcu-boost safe even the rt_mutex has post-unlock reference. But rt_mutex allowing post-unlock reference is definitely a bug and it was fixed by the commit 27e35715df54 ("rtmutex: Plug slow unlock race"). This fix made the previous patch (dfeb9765ce3c) useless. And even worse, the priority-inversion introduced by the the previous patch still exists. rcu_read_unlock_special() { rt_mutex_unlock(&rnp->boost_mtx); /* Priority-Inversion: * the current task had been deboosted and preempted as a low * priority task immediately, it could wait long before reschedule in, * and the rcu-booster also waits on this low priority task and sleeps. * This priority-inversion makes rcu-booster can't work * as expected. */ complete(&rnp->boost_completion); } Just revert the patch to avoid it. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Don't migrate blocked tasks even if all corresponding CPUs offlinePaul E. McKenney1-18/+0
When the last CPU associated with a given leaf rcu_node structure goes offline, something must be done about the tasks queued on that rcu_node structure. Each of these tasks has been preempted on one of the leaf rcu_node structure's CPUs while in an RCU read-side critical section that it have not yet exited. Handling these tasks is the job of rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node structure to the root rcu_node structure. Unfortunately, this migration has to be done one task at a time because each tasks allegiance must be shifted from the original leaf rcu_node to the root, so that future attempts to deal with these tasks will acquire the root rcu_node structure's ->lock rather than that of the leaf. Worse yet, this migration must be done with interrupts disabled, which is not so good for realtime response, especially given that there is no bound on the number of tasks on a given rcu_node structure's list. (OK, OK, there is a bound, it is just that it is unreasonably large, especially on 64-bit systems.) This was not considered a problem back when rcu_preempt_offline_tasks() was first written because realtime systems were assumed not to do CPU-hotplug operations while real-time applications were running. This assumption has proved of dubious validity given that people are starting to run multiple realtime applications on a single SMP system and that it is common practice to offline then online a CPU before starting its real-time application in order to clear extraneous processing off of that CPU. So we now need CPU hotplug operations to avoid undue latencies. This commit therefore avoids migrating these tasks, instead letting them be dequeued one by one from the original leaf rcu_node structure by rcu_read_unlock_special(). This means that the clearing of bits from the upper-level rcu_node structures must be deferred until the last such task has been dequeued, because otherwise subsequent grace periods won't wait on them. This commit has the beneficial side effect of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in CONFIG_RCU_BOOST builds. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()Paul E. McKenney1-0/+1
This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() in preparation for the rework of RCU priority boosting. This new function will be invoked from rcu_read_unlock_special() in the reworked scheme, which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node structure's ->qsmaskinit field has already been updated. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-01-06rcu: Remove "select IRQ_WORK" from config TREE_RCULai Jiangshan1-1/+0
The 48a7639ce80c ("rcu: Make callers awaken grace-period kthread") removed the irq_work_queue(), so the TREE_RCU doesn't need irq work any more. This commit therefore updates RCU's Kconfig and Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>