linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-04-09	Merge branches 'consolidate.2019.04.09a', 'doc.2019.03.26b', 'fixes.2019.03.26b', 'srcu.2019.03.26b', 'stall.2019.03.26b' and 'torture.2019.03.26b' into HEAD	Paul E. McKenney	1	-2/+34
	consolidate.2019.04.09a: Lingering RCU flavor consolidation cleanups. doc.2019.03.26b: Documentation updates. fixes.2019.03.26b: Miscellaneous fixes. srcu.2019.03.26b: SRCU updates. stall.2019.03.26b: RCU CPU stall warning updates. torture.2019.03.26b: Torture-test updates.
2019-03-26	rcu: Move rcu_print_task_exp_stall() to tree_exp.h	Paul E. McKenney	1	-0/+32
	Because expedited CPU stall warnings are contained within the kernel/rcu/tree_exp.h file, rcu_print_task_exp_stall() should live there too. This commit carries out the required code motion. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26	rcu: Correct READ_ONCE()/WRITE_ONCE() for ->rcu_read_unlock_special	Paul E. McKenney	1	-1/+1
	The task_struct structure's ->rcu_read_unlock_special field is only ever read or written by the owning task, but it is accessed both at process and interrupt levels. It may therefore be accessed using plain reads and writes while interrupts are disabled, but must be accessed using READ_ONCE() and WRITE_ONCE() or better otherwise. This commit makes a few adjustments to align with this discipline. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-03-26	rcu: Fix typo in tree_exp.h comment	Paul E. McKenney	1	-1/+1
	This commit changes a rcu_exp_handler() comment from rcu_preempt_defer_qs() to rcu_preempt_deferred_qs() in order to better match reality. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-02-09	Merge branches 'doc.2019.01.26a', 'fixes.2019.01.26a', 'sil.2019.01.26a', 'spdx.2019.02.09a', 'srcu.2019.01.26a' and 'torture.2019.01.26a' into HEAD	Paul E. McKenney	1	-17/+2
	doc.2019.01.26a: Documentation updates. fixes.2019.01.26a: Miscellaneous fixes. sil.2019.01.26a: Removal of a few more spin_is_locked() instances. spdx.2019.02.09a: Add SPDX identifiers to RCU files srcu.2019.01.26a: SRCU updates. torture.2019.01.26a: Torture-test updates.
2019-02-09	rcu/tree: Convert to SPDX license identifier	Paul E. McKenney	1	-15/+2
	Replace the license boiler plate with a SPDX license identifier. While in the area, update an email address. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> [ paulmck: Update .h file SPDX comment format per Joe Perches. ] Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2019-01-25	rcu: Remove preemption disabling from expedited CPU selection	Paul E. McKenney	1	-2/+0
	It turns out that it is queue_delayed_work_on() rather than queue_work_on() that has difficulties when used concurrently with CPU-hotplug removal operations. It is therefore unnecessary to protect CPU identification and queue_work_on() with preempt_disable(). This commit therefore removes the preempt_disable() and preempt_enable() from sync_rcu_exp_select_cpus(), which has the further benefit of reducing the number of changes that must be maintained in the -rt patchset. Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sebastian Siewior <bigeasy@linutronix.de> Suggested-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25	rcu: Inline _synchronize_rcu_expedited() into synchronize_rcu_expedited()	Paul E. McKenney	1	-45/+36
	Now that _synchronize_rcu_expedited() has only one caller, and given that this is a tail call, this commit inlines _synchronize_rcu_expedited() into synchronize_rcu_expedited(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25	rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu()	Paul E. McKenney	1	-27/+0
	Now that rcu_blocking_is_gp() makes the correct immediate-return decision for both PREEMPT and !PREEMPT, a single implementation of synchronize_rcu() will work correctly under both configurations. This commit therefore eliminates a few lines of code by consolidating the two implementations of synchronize_rcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25	rcu: Consolidate PREEMPT and !PREEMPT synchronize_rcu_expedited()	Paul E. McKenney	1	-56/+49
	The CONFIG_PREEMPT=n and CONFIG_PREEMPT=y implementations of synchronize_rcu_expedited() are quite similar, and with small modifications to rcu_blocking_is_gp() can be made identical. This commit therefore makes this change in order to save a few lines of code and to reduce the amount of duplicate code. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25	rcu: Determine expedited-GP IPI handler at build time	Paul E. McKenney	1	-16/+14
	Back when there could be multiple RCU flavors running in the same kernel at the same time, it was necessary to specify the expedited grace-period IPI handler at runtime. Now that there is only one RCU flavor, the IPI handler can be determined at build time. There is therefore no longer any reason for the RCU-preempt and RCU-sched IPI handlers to have different names, nor is there any reason to pass these handlers in function arguments and in the data structures enclosing workqueues. This commit therefore makes all these changes, pushing the specification of the expedited grace-period IPI handler down to the point of use. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2019-01-25	rcu: Make expedited IPI handler return after handling critical section	Paul E. McKenney	1	-0/+1
	During expedited RCU grace-period initialization, IPIs are sent to all non-idle online CPUs. The IPI handler checks to see if the CPU is in quiescent state, reporting one if so. This handler looks at three different cases: (1) The CPU is not in an rcu_read_lock()-based critical section, (2) The CPU is in the process of exiting an rcu_read_lock()-based critical section, and (3) The CPU is in an rcu_read_lock()-based critical section. In case (2), execution falls through into case (3). This is harmless from a functionality viewpoint, but can result in needless overhead during an improbable corner case. This commit therefore adds the "return" statement needed to prevent fall-through. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
2018-11-12	rcu: Speed up expedited GPs when interrupting RCU reader	Paul E. McKenney	1	-1/+3
	In PREEMPT kernels, an expedited grace period might send an IPI to a CPU that is executing an RCU read-side critical section. In that case, it would be nice if the rcu_read_unlock() directly interacted with the RCU core code to immediately report the quiescent state. And this does happen in the case where the reader has been preempted. But it would also be a nice performance optimization if immediate reporting also happened in the preemption-free case. This commit therefore adds an ->exp_hint field to the task_struct structure's ->rcu_read_unlock_special field. The IPI handler sets this hint when it has interrupted an RCU read-side critical section, and this causes the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(), which, if preemption is enabled, reports the quiescent state immediately. If preemption is disabled, then the report is required to be deferred until preemption (or bottom halves or interrupts or whatever) is re-enabled. Because this is a hint, it does nothing for more complicated cases. For example, if the IPI interrupts an RCU reader, but interrupts are disabled across the rcu_read_unlock(), but another rcu_read_lock() is executed before interrupts are re-enabled, the hint will already have been cleared. If you do crazy things like this, reporting will be deferred until some later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar. Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
2018-11-11	rcu: Stop expedited grace periods from relying on stop-machine	Paul E. McKenney	1	-2/+4
	The CPU-selection code in sync_rcu_exp_select_cpus() disables preemption to prevent the cpu_online_mask from changing. However, this relies on the stop-machine mechanism in the CPU-hotplug offline code, which is not desirable (it would be good to someday remove the stop-machine mechanism). This commit therefore instead uses the relevant leaf rcu_node structure's ->ffmask, which has a bit set for all CPUs that are fully functional. A given CPU's bit is cleared very early during offline processing by rcutree_offline_cpu() and set very late during online processing by rcutree_online_cpu(). Therefore, if a CPU's bit is set in this mask, and preemption is disabled, we have to be before the synchronize_sched() in the CPU-hotplug offline code, which means that the CPU is guaranteed to be workqueue-ready throughout the duration of the enclosing preempt_disable() region of code. This also has the side-effect of using WORK_CPU_UNBOUND if all the CPUs for this leaf rcu_node structure are offline, which is an acceptable difference in behavior. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks	Paul E. McKenney	1	-4/+2
	This commit move ->dynticks from the rcu_dynticks structure to the rcu_data structure, replacing the field of the same name. It also updates the code to access ->dynticks from the rcu_data structure and to use the rcu_data structure rather than following to now-gone ->dynticks field to the now-gone rcu_dynticks structure. While in the area, this commit also fixes up comments. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Switch urgent quiescent-state requests to rcu_data structure	Paul E. McKenney	1	-1/+1
	This commit removes ->rcu_need_heavy_qs and ->rcu_urgent_qs from the rcu_dynticks structure and updates the code to access them from the rcu_data structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Avoid resched_cpu() when rescheduling the current CPU	Paul E. McKenney	1	-7/+10
	The resched_cpu() interface is quite handy, but it does acquire the specified CPU's runqueue lock, which does not come for free. This commit therefore substitutes the following when directing resched_cpu() at the current CPU: set_tsk_need_resched(current); set_preempt_need_resched(); Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
2018-08-30	rcu: Clean up flavor-related definitions and comments in tree_exp.h	Paul E. McKenney	1	-11/+11
	Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Remove rsp parameter from rcu_node tree accessor macros	Paul E. McKenney	1	-9/+9
	There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's rcu_node tree's accessor macros. This commit therefore removes the rsp parameter from those macros in kernel/rcu/rcu.h, and removes some now-unused rsp local variables while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Remove rsp parameter from expedited grace-period functions	Paul E. McKenney	1	-99/+86
	There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from the code in kernel/rcu/tree_exp.h, and removes all of the rsp local variables while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Remove rsp parameter from rcu_get_root()	Paul E. McKenney	1	-3/+3
	There now is only one rcu_state structure in a given build of the Linux kernel, so there is no need to pass it as a parameter to RCU's functions. This commit therefore removes the rsp parameter from rcu_get_root(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Remove rcu_state_p pointer to default rcu_state structure	Paul E. McKenney	1	-1/+1
	The rcu_state_p pointer references the default rcu_state structure, that is, the one that call_rcu() uses, as opposed to call_rcu_bh() and sometimes call_rcu_sched(). But there is now only one rcu_state structure, so that one structure is by definition the default, which means that the rcu_state_p pointer no longer serves any useful purpose. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Remove rcu_state structure's ->rda field	Paul E. McKenney	1	-10/+9
	The rcu_state structure's ->rda field was used to find the per-CPU rcu_data structures corresponding to that rcu_state structure. But now there is only one rcu_state structure (creatively named "rcu_state") and one set of per-CPU rcu_data structures (creatively named "rcu_data"). Therefore, uses of the ->rda field can always be replaced by "rcu_data, and this commit makes that change and removes the ->rda field. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Eliminate rcu_state structure's ->call field	Paul E. McKenney	1	-1/+1
	The rcu_state structure's ->call field references the corresponding RCU flavor's call_rcu() function. However, now that there is only ever one rcu_state structure in a given build of the Linux kernel, and that flavor uses plain old call_rcu(), there is not a lot of point in continuing to have the ->call field. This commit therefore removes it. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds	Paul E. McKenney	1	-76/+77
	Now that RCU-preempt knows about preemption disabling, its implementation of synchronize_rcu() works for synchronize_sched(), and likewise for the other RCU-sched update-side API members. This commit therefore confines the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines RCU-sched's update-side API members in terms of those of RCU-preempt. This means that any given build of the Linux kernel has only one update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds and RCU-sched for CONFIG_PREEMPT=n builds. This in turn means that kernels built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com>
2018-08-30	rcu: Drop "wake" parameter from rcu_report_exp_rdp()	Paul E. McKenney	1	-5/+4
	The rcu_report_exp_rdp() function is always invoked with its "wake" argument set to "true", so this commit drops this parameter. The only potential call site that would use "false" is in the code driving the expedited grace period, and that code uses rcu_report_exp_cpu_mult() instead, which therefore retains its "wake" parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30	rcu: Defer reporting RCU-preempt quiescent states when disabled	Paul E. McKenney	1	-16/+55
	This commit defers reporting of RCU-preempt quiescent states at rcu_read_unlock_special() time when any of interrupts, softirq, or preemption are disabled. These deferred quiescent states are reported at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug offline operation. Of course, if another RCU read-side critical section has started in the meantime, the reporting of the quiescent state will be further deferred. This also means that disabling preemption, interrupts, and/or softirqs will act as an RCU-preempt read-side critical section. This is enforced by checking preempt_count() as needed. Some special cases must be handled on an ad-hoc basis, for example, context switch is a quiescent state even though both the scheduler and do_exit() disable preemption. In these cases, additional calls to rcu_preempt_deferred_qs() override the preemption disabling. Similar logic overrides disabled interrupts in rcu_preempt_check_callbacks() because in this case the quiescent state happened just before the corresponding scheduling-clock interrupt. In theory, this change lifts a long-standing restriction that required that if interrupts were disabled across a call to rcu_read_unlock() that the matching rcu_read_lock() also be contained within that interrupts-disabled region of code. Because the reporting of the corresponding RCU-preempt quiescent state is now deferred until after interrupts have been enabled, it is no longer possible for this situation to result in deadlocks involving the scheduler's runqueue and priority-inheritance locks. This may allow some code simplification that might reduce interrupt latency a bit. Unfortunately, in practice this would also defer deboosting a low-priority task that had been subjected to RCU priority boosting, so real-time-response considerations might well force this restriction to remain in place. Because RCU-preempt grace periods are now blocked not only by RCU read-side critical sections, but also by disabling of interrupts, preemption, and softirqs, it will be possible to eliminate RCU-bh and RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may require some additional plumbing to provide the network denial-of-service guarantees that have been traditionally provided by RCU-bh. Once these are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh into RCU-sched. This would mean that all kernels would have but one flavor of RCU, which would open the door to significant code cleanup. Moving to a single flavor of RCU would also have the beneficial effect of reducing the NOCB kthreads by at least a factor of two. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback from Joel Fernandes. ] [ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in response to bug reports from kbuild test robot. ] [ paulmck: Fix bug located by kbuild test robot involving recursion via rcu_preempt_deferred_qs(). ]
2018-08-13	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	Linus Torvalds	1	-2/+2
	Pull scheduler updates from Thomas Gleixner: - Cleanup and improvement of NUMA balancing - Refactoring and improvements to the PELT (Per Entity Load Tracking) code - Watchdog simplification and related cleanups - The usual pile of small incremental fixes and improvements * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) watchdog: Reduce message verbosity stop_machine: Reflow cpu_stop_queue_two_works() sched/numa: Move task_numa_placement() closer to numa_migrate_preferred() sched/numa: Use group_weights to identify if migration degrades locality sched/numa: Update the scan period without holding the numa_group lock sched/numa: Remove numa_has_capacity() sched/numa: Modify migrate_swap() to accept additional parameters sched/numa: Remove unused task_capacity from 'struct numa_stats' sched/numa: Skip nodes that are at 'hoplimit' sched/debug: Reverse the order of printing faults sched/numa: Use task faults only if numa_group is not yet set up sched/numa: Set preferred_node based on best_cpu sched/numa: Simplify load_too_imbalanced() sched/numa: Evaluate move once per node sched/numa: Remove redundant field sched/debug: Show the sum wait time of a task group sched/fair: Remove #ifdefs from scale_rt_capacity() sched/core: Remove get_cpu() from sched_fork() sched/cpufreq: Clarify sugov_get_util() sched/sysctl: Remove unused sched_time_avg_ms sysctl ...
2018-07-12	rcu: Make expedited GPs handle CPU 0 being offline	Boqun Feng	1	-1/+8
	Currently, the parallelized initialization of expedited grace periods uses the workqueue associated with each rcu_node structure's ->grplo field. This works fine unless that CPU is offline. This commit therefore uses the CPU corresponding to the lowest-numbered online CPU, or just queues the work on WORK_CPU_UNBOUND if there are no online CPUs corresponding to this rcu_node structure. Note that this patch uses cpu_is_offline() instead of the usual approach of checking bits in the rcu_node structure's ->qsmaskinitnext field. This is safe because preemption is disabled across both the cpu_is_offline() check and the call to queue_work_on(). Signed-off-by: Boqun Feng <boqun.feng@gmail.com> [ paulmck: Disable preemption to close offline race window. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Apply Peter Zijlstra feedback on CPU selection. ] Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2018-06-25	rcu: Make expedited grace period use direct call on last leaf	Paul E. McKenney	1	-2/+3
	During expedited grace-period initialization, a work item is scheduled for each leaf rcu_node structure. However, that initialization code is itself (normally) executing from a workqueue, so one of the leaf rcu_node structures could just as well be handled by that pre-existing workqueue, and with less overhead. This commit therefore uses a shiny new rcu_is_leaf_node() macro to execute the last leaf rcu_node structure's initialization directly from the pre-existing workqueue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-20	sched/swait: Rename to exclusive	Peter Zijlstra	1	-2/+2
	Since swait basically implemented exclusive waits only, make sure the API reflects that. $ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]" -e "\<prepare_to_swait\>" \| while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done With a few manual touch-ups. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
2018-05-15	Merge branches 'exp.2018.05.15a', 'fixes.2018.05.15a', 'lock.2018.05.15a' and 'torture.2018.05.15a' into HEAD	Paul E. McKenney	1	-8/+5
	exp.2018.05.15a: Parallelize expedited grace-period initialization. fixes.2018.05.15a: Miscellaneous fixes. lock.2018.05.15a: Decrease lock contention on root rcu_node structure, which is a step towards merging RCU flavors. torture.2018.05.15a: Torture-test updates.
2018-05-15	rcu: Remove deprecated RCU debugfs tracing code	Byungchul Park	1	-8/+5
	Commit ae91aa0adb14 ("rcu: Remove debugfs tracing") removed the RCU debugfs tracing code, but did not remove the no-longer used ->exp_workdone{0,1,2,3} fields in the srcu_data structure. This commit therefore removes these fields along with the code that uselessly updates them. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15	rcu: exp: Protect all sync_rcu_preempt_exp_done() with rcu_node lock	Boqun Feng	1	-3/+25
	Currently some callsites of sync_rcu_preempt_exp_done() are not called with the corresponding rcu_node's ->lock held, which could introduces bugs as per Paul: o CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and sees that it is NULL. o CPU 1 blocks within an RCU read-side critical section, so it enqueues the task and points ->exp_tasks at it and clears CPU 1's bit in ->expmask. o All other CPUs clear their bits in ->expmask. o CPU 0 reads ->expmask, sees that it is zero, so incorrectly concludes that all quiescent states have completed, despite the fact that ->exp_tasks is non-NULL. To fix this, sync_rcu_preempt_exp_unlocked() is introduced to replace lockless callsites of sync_rcu_preempt_exp_done(). Further, a lockdep annotation is added into sync_rcu_preempt_exp_done() to prevent mis-use in the future. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15	rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions	Boqun Feng	1	-7/+3
	Since commit d9a3da0699b2 ("rcu: Add expedited grace-period support for preemptible RCU"), there are comments for some funtions in rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its predecessors needs to be held. However, exp_mutex and its predecessors were used only to synchronize between GPs, and it is clear that all variables visited by those functions are under the protection of rcu_node's ->lock. Moreover, those functions are currently called without held exp_mutex, and seems that doesn't introduce any trouble. So this patch fixes this problem by updating the comments to match the current code. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Fixes: d9a3da0699b2 ("rcu: Add expedited grace-period support for preemptible RCU") Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15	rcu: Parallelize expedited grace-period initialization	Paul E. McKenney	1	-78/+106
	The latency of RCU expedited grace periods grows with increasing numbers of CPUs, eventually failing to be all that expedited. Much of the growth in latency is in the initialization phase, so this commit uses workqueues to carry out this initialization concurrently on a rcu_node-by-rcu_node basis. This change makes use of a new rcu_par_gp_wq because flushing a work item from another work item running from the same workqueue can result in deadlock. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-02-23	rcu: Create RCU-specific workqueues with rescuers	Paul E. McKenney	1	-1/+1
	RCU's expedited grace periods can participate in out-of-memory deadlocks due to all available system_wq kthreads being blocked and there not being memory available to create more. This commit prevents such deadlocks by allocating an RCU-specific workqueue_struct at early boot time, and providing it with a rescuer to ensure forward progress. This uses the shiny new init_rescuer() function provided by Tejun (but indirectly). This commit also causes SRCU to use this new RCU-specific workqueue_struct. Note that SRCU's use of workqueues never blocks them waiting for readers, so this should be safe from a forward-progress viewpoint. Note that this moves SRCU from system_power_efficient_wq to a normal workqueue. In the unlikely event that this results in measurable degradation, a separate power-efficient workqueue will be creates for SRCU. Reported-by: Prateek Sood <prsood@codeaurora.org> Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org>
2018-02-20	rcu: Make expedited RCU CPU selection avoid unnecessary stores	Paul E. McKenney	1	-7/+14
	This commit reworks the first loop in sync_rcu_exp_select_cpus() to avoid doing unnecssary stores to other CPUs' rcu_data structures. This speeds up that first loop by roughly a factor of two on an old x86 system. In the case where the system is mostly idle, this loop incurs a large fraction of the overhead of the synchronize_rcu_expedited(). There is less benefit on busy systems because the overhead of the smp_call_function_single() in the second loop dominates in that case. However, it is not unusual to do configuration chances involving RCU grace periods (both expedited and normal) while the system is mostly idle, so this optimization is worth doing. While we are in the area, this commit also adds parentheses to arguments used by the for_each_leaf_node_possible_cpu() macro. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20	rcu: Trace expedited GP delays due to transitioning CPUs	Paul E. McKenney	1	-1/+2
	If a CPU is transitioning to or from offline state, an expedited grace period may undergo a timed wait. This timed wait can unduly delay grace periods, so this commit adds a trace statement to make it visible. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20	rcu: Add more tracing of expedited grace periods	Paul E. McKenney	1	-0/+12
	This commit adds more tracing of expedited grace periods to enable improved debugging of slowdowns. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-07-25	rcu: Make expedited GPs correctly handle hardware CPU insertion	Paul E. McKenney	1	-1/+1
	The update of the ->expmaskinitnext and of ->ncpus are unsynchronized, with the value of ->ncpus being incremented long before the corresponding ->expmaskinitnext mask is updated. If an RCU expedited grace period sees ->ncpus change, it will update the ->expmaskinit masks from the new ->expmaskinitnext masks. But it is possible that ->ncpus has already been updated, but the ->expmaskinitnext masks still have their old values. For the current expedited grace period, no harm done. The CPU could not have been online before the grace period started, so there is no need to wait for its non-existent pre-existing readers. But the next RCU expedited grace period is in a world of hurt. The value of ->ncpus has already been updated, so this grace period will assume that the ->expmaskinitnext masks have not changed. But they have, and they won't be taken into account until the next never-been-online CPU comes online. This means that RCU will be ignoring some CPUs that it should be paying attention to. The solution is to update ->ncpus and ->expmaskinitnext while holding the ->lock for the rcu_node structure containing the ->expmaskinitnext mask. Because smp_store_release() is now used to update ->ncpus and smp_load_acquire() is now used to locklessly read it, if the expedited grace period sees ->ncpus change, then the updating CPU has to already be holding the corresponding ->lock. Therefore, when the expedited grace period later acquires that ->lock, it is guaranteed to see the new value of ->expmaskinitnext. On the other hand, if the expedited grace period loads ->ncpus just before an update, earlier full memory barriers guarantee that the incoming CPU isn't far enough along to be running any RCU readers. This commit therefore makes the required change. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-06-08	rcu: Make sync_rcu_preempt_exp_done() return bool	Paul E. McKenney	1	-1/+1
	The sync_rcu_preempt_exp_done() function returns a logical expression, but its return type is nevertheless int. This commit therefore changes the return type to bool. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18	srcu: Improve rcu_seq grace-period-counter abstraction	Paul E. McKenney	1	-5/+4
	The expedited grace-period code contains several open-coded shifts know the format of an rcu_seq grace-period counter, which is not particularly good style. This commit therefore creates a new rcu_seq_ctr() function that extracts the counter portion of the counter, and an rcu_seq_state() function that extracts the low-order state bit. This commit prepares for SRCU callback parallelization, which will require two state bits. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18	rcu: Expedited wakeups need to be fully ordered	Paul E. McKenney	1	-0/+2
	Expedited grace periods use workqueue handlers that wake up the requesters, but there is no lock mediating this wakeup. Therefore, memory barriers are required to ensure that the handler's memory references are seen by all to occur before synchronize_*_expedited() returns to its caller. Possibly detected by syzkaller. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18	srcu: Allow SRCU to access rcu_scheduler_active	Paul E. McKenney	1	-12/+0
	This is primarily a code-movement commit in preparation for allowing SRCU to handle early-boot SRCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-04-18	rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions	Paul E. McKenney	1	-0/+2
	The rcu_all_qs() and rcu_note_context_switch() do a series of checks, taking various actions to supply RCU with quiescent states, depending on the outcomes of the various checks. This is a bit much for scheduling fastpaths, so this commit creates a separate ->rcu_urgent_qs field in the rcu_dynticks structure that acts as a global guard for these checks. Thus, in the common case, rcu_all_qs() and rcu_note_context_switch() check the ->rcu_urgent_qs field, find it false, and simply return. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
2017-01-25	Merge branches 'doc.2017.01.15b', 'dyntick.2017.01.23a', 'fixes.2017.01.23a', 'srcu.2017.01.25a' and 'torture.2017.01.15b' into HEAD	Paul E. McKenney	1	-7/+31
	doc.2017.01.15b: Documentation updates dyntick.2017.01.23a: Dyntick tracking consolidation fixes.2017.01.23a: Miscellaneous fixes srcu.2017.01.25a: SRCU rewrite, fixes, and verification torture.2017.01.15b: Torture-test updates
2017-01-23	rcu: Abstract extended quiescent state determination	Paul E. McKenney	1	-3/+3
	This commit is the fourth step towards full abstraction of all accesses to the ->dynticks counter, implementing previously open-coded checks and comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since() functions. This abstraction will ease changes to the ->dynticks counter operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23	rcu: Add lockdep checks to synchronous expedited primitives	Paul E. McKenney	1	-0/+10
	The non-expedited synchronize_rcu() primitives have lockdep checks, but their expedited counterparts lack these checks. This commit therefore adds these checks to the expedited synchronize_rcu() primitives. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23	rcu: Add comment headers to expedited-grace-period counter functions	Paul E. McKenney	1	-1/+17
	These functions (rcu_exp_gp_seq_start(), rcu_exp_gp_seq_end(), rcu_exp_gp_seq_snap(), and rcu_exp_gp_seq_done() seemed too obvious to comment when written, but not so much when being documented. This commit therefore adds header comments to each of them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>