aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/rcu (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-04-27rcu: Add comments marking transitions between RCU watching and notPaul E. McKenney1-4/+25
It is not as clear as it might be just where in RCU's idle entry/exit code RCU stops and starts watching the current CPU. This commit therefore adds comments calling out the transitions. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcutorture: Add test of holding scheduler locks across rcu_read_unlock()Paul E. McKenney1-1/+9
Now that it should be safe to hold scheduler locks across rcu_read_unlock(), even in cases where the corresponding RCU read-side critical section might have been preempted and boosted, the commit adds a test of this capability to rcutorture. This has been tested on current mainline (which can deadlock in this situation), and lockdep duly reported the expected deadlock. On -rcu, lockdep is silent, thus far, anyway. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Don't use negative nesting depth in __rcu_read_unlock()Lai Jiangshan2-41/+12
Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_read_unlock() being invoked within such a handler that interrupted an __rcu_read_unlock_special(), in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for __rcu_read_unlock() to set the nesting level negative. This commit therefore removes this recursion-protection code from __rcu_read_unlock(). [ paulmck: Let rcu_exp_handler() continue to call rcu_report_exp_rdp(). ] [ paulmck: Adjust other checks given no more negative nesting. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Remove unused ->rcu_read_unlock_special.b.deferred_qs fieldLai Jiangshan1-1/+0
The ->rcu_read_unlock_special.b.deferred_qs field is set to true in rcu_read_unlock_special() but never set to false. This is not particularly useful, so this commit removes this field. The only possible justification for this field is to ease debugging of RCU deferred quiscent states, but the combination of the other ->rcu_read_unlock_special fields plus ->rcu_blocked_node and of course ->rcu_read_lock_nesting should cover debugging needs. And if this last proves incorrect, this patch can always be reverted, along with the required setting of ->rcu_read_unlock_special.b.deferred_qs to false in rcu_preempt_deferred_qs_irqrestore(). Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Don't set nesting depth negative in rcu_preempt_deferred_qs()Lai Jiangshan1-5/+0
Now that RCU flavors have been consolidated, an RCU-preempt rcu_read_unlock() in an interrupt or softirq handler cannot possibly end the RCU read-side critical section. Consider the old vulnerability involving rcu_preempt_deferred_qs() being invoked within such a handler that interrupted an extended RCU read-side critical section, in which a wakeup might be invoked with a scheduler lock held. Because rcu_read_unlock_special() no longer does wakeups in such situations, it is no longer necessary for rcu_preempt_deferred_qs() to set the nesting level negative. This commit therefore removes this recursion-protection code from rcu_preempt_deferred_qs(). [ paulmck: Fix typo in commit log per Steve Rostedt. ] Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Make rcu_read_unlock_special() safe for rq/pi locksPaul E. McKenney1-9/+8
The scheduler is currently required to hold rq/pi locks across the entire RCU read-side critical section or not at all. This is inconvenient and leaves traps for the unwary, including the author of this commit. But now that excessively long grace periods enable scheduling-clock interrupts for holdout nohz_full CPUs, the nohz_full rescue logic in rcu_read_unlock_special() can be dispensed with. In other words, the rcu_read_unlock_special() function can refrain from doing wakeups unless such wakeups are guaranteed safe. This commit therefore avoids unsafe wakeups, freeing the scheduler to hold rq/pi locks across rcu_read_unlock() even if the corresponding RCU read-side critical section might have been preempted. This commit also updates RCU's requirements documentation. This commit is inspired by a patch from Lai Jiangshan: https://lore.kernel.org/lkml/20191102124559.1135-2-laijs@linux.alibaba.com This commit is further intended to be a step towards his goal of permitting the inlining of RCU-preempt's rcu_read_lock() and rcu_read_unlock(). Cc: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add KCSAN stubs to update.cPaul E. McKenney1-0/+13
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add rcu_gp_might_be_stalled()Paul E. McKenney1-4/+36
This commit adds rcu_gp_might_be_stalled(), which returns true if there is some reason to believe that the RCU grace period is stalled. The use case is where an RCU free-memory path needs to allocate memory in order to free it, a situation that should be avoided where possible. But where it is necessary, there is always the alternative of using synchronize_rcu() to wait for a grace period in order to avoid the allocation. And if the grace period is stalled, allocating memory to asynchronously wait for it is a bad idea of epic proportions: Far better to let others use the memory, because these others might actually be able to free that memory before the grace period ends. Thus, rcu_gp_might_be_stalled() can be used to help decide whether allocating memory on an RCU free path is a semi-reasonable course of action. Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Uladzislau Rezki <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu/tree: Count number of batched kfree_rcu() locklesslyJoel Fernandes (Google)1-6/+4
We can relax the correctness of counting of number of queued objects in favor of not hurting performance, by locklessly sampling per-cpu counters. This should be Ok since under high memory pressure, it should not matter if we are off by a few objects while counting. The shrinker will still do the reclaim. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> [ paulmck: Remove unused "flags" variable. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu/tree: Add a shrinker to prevent OOM due to kfree_rcu() batchingJoel Fernandes (Google)1-0/+60
To reduce grace periods and improve kfree() performance, we have done batching recently dramatically bringing down the number of grace periods while giving us the ability to use kfree_bulk() for efficient kfree'ing. However, this has increased the likelihood of OOM condition under heavy kfree_rcu() flood on small memory systems. This patch introduces a shrinker which starts grace periods right away if the system is under memory pressure due to existence of objects that have still not started a grace period. With this patch, I do not observe an OOM anymore on a system with 512MB RAM and 8 CPUs, with the following rcuperf options: rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_mult=2 Otherwise it easily OOMs with the above parameters. NOTE: 1. On systems with no memory pressure, the patch has no effect as intended. 2. In the future, we can use this same mechanism to prevent grace periods from happening even more, by relying on shrinkers carefully. Cc: urezki@gmail.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcuperf: Add ability to increase object allocation sizeJoel Fernandes (Google)1-1/+4
This allows us to increase memory pressure dynamically using a new rcuperf boot command line parameter called 'rcumult'. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert rcu_nohz_full_cpu() ULONG_CMP_LT() to time_before()Paul E. McKenney1-1/+1
This commit converts the ULONG_CMP_LT() in rcu_nohz_full_cpu() to time_before() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert rcu_initiate_boost() ULONG_CMP_GE() to time_after()Paul E. McKenney1-1/+1
This commit converts the ULONG_CMP_GE() in rcu_initiate_boost() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Convert ULONG_CMP_GE() to time_after() for jiffy comparisonPaul E. McKenney1-1/+1
This commit converts the ULONG_CMP_GE() in rcu_gp_fqs_loop() to time_after() to reflect the fact that it is comparing a timestamp to the jiffies counter. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Replace 1 by trueJules Irenge1-1/+1
Coccinelle reports a warning at use_softirq declaration WARNING: Assignment of 0/1 to bool variable The root cause is use_softirq a variable of bool type is initialised with the integer 1 Replacing 1 with value true solve the issue. Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Replace assigned pointer ret value by corresponding boolean valueJules Irenge1-3/+3
Coccinelle reports warnings at rcu_read_lock_held_common() WARNING: Assignment of 0/1 to bool variable To fix this, the assigned pointer ret values are replaced by corresponding boolean value. Given that ret is a pointer of bool type Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Mark rcu_state.gp_seq to detect more concurrent writesPaul E. McKenney1-1/+3
The rcu_state structure's gp_seq field is only to be modified by the RCU grace-period kthread, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. This commit applies KCSAN-specific primitives, so cannot go upstream until KCSAN does. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Get rid of some doc warnings in update.cMauro Carvalho Chehab1-4/+4
This commit escapes *ret, because otherwise the documentation system thinks that this is an incomplete emphasis block: ./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:65: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:70: WARNING: Inline emphasis start-string without end-string. ./kernel/rcu/update.c:82: WARNING: Inline emphasis start-string without end-string. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Fix the (t=0 jiffies) false positiveZhaolong Zhang1-6/+6
It is possible that an over-long grace period will end while the RCU CPU stall warning message is printing. In this case, the estimate of the offending grace period's duration can be erroneous due to refetching of rcu_state.gp_start, which will now be the time of the newly started grace period. Computation of this duration clearly needs to use the start time for the old over-long grace period, not the fresh new one. This commit avoids such errors by causing both print_other_cpu_stall() and print_cpu_stall() to reuse the value previously fetched by their caller. Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Expedite first two FQS scans under callback-overload conditionsPaul E. McKenney2-4/+16
Even if some CPUs have excessive numbers of callbacks, RCU's grace-period kthread will still wait normally between successive force-quiescent-state scans. The first two are the most important, as they are the ones that enlist aid from the scheduler when overloaded. This commit therefore omits the wait before the first and the second force-quiescent-state scan under callback-overload conditions. This approach was inspired by a discussion with Jeff Roberson. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Use data_race() for RCU CPU stall-warning printsPaul E. McKenney1-13/+13
Although the accesses used to determine whether or not a stall should be printed are an integral part of the concurrency algorithm governing use of the corresponding variables, the values that are simply printed are ancillary. As such, it is best to use data_race() for these accesses in order to provide the greatest latitude in the use of KCSAN for the other accesses that are an integral part of the algorithm. This commit therefore changes the relevant uses of READ_ONCE() to data_race(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add WRITE_ONCE() to rcu_node ->boost_tasksPaul E. McKenney1-2/+2
The rcu_node structure's ->boost_tasks field is read locklessly, so this commit adds the WRITE_ONCE() to an update in order to provide proper documentation and READ_ONCE()/WRITE_ONCE() pairing. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27srcu: Add data_race() to ->srcu_lock_count and ->srcu_unlock_count arraysPaul E. McKenney1-4/+4
The srcu_data structure's ->srcu_lock_count and ->srcu_unlock_count arrays are read and written locklessly, so this commit adds the data_race() to the diagnostic-print loads from these arrays in order mark them as known and approved data-racy accesses. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being used only by rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add READ_ONCE and data_race() to rcu_node ->boost_tasksPaul E. McKenney1-2/+3
The rcu_node structure's ->boost_tasks field is read locklessly, so this commit adds the READ_ONCE() to one load in order to avoid destructive compiler optimizations. The other load is from a diagnostic print, so data_race() suffices. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add *_ONCE() and data_race() to rcu_node ->exp_tasks plus lockingPaul E. McKenney2-12/+15
There are lockless loads from the rcu_node structure's ->exp_tasks field, so this commit causes all stores to use WRITE_ONCE() and all lockless loads to use READ_ONCE() or data_race(), with the latter for debug prints. This code also did a unprotected traversal of the linked list pointed into by ->exp_tasks, so this commit also acquires the rcu_node structure's ->lock to properly protect this traversal. This list was traversed unprotected only when printing an RCU CPU stall warning for an expedited grace period, so the odds of seeing this in production are not all that high. This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Mark rcu_state.ncpus to detect concurrent writesPaul E. McKenney1-0/+1
The rcu_state structure's ncpus field is only to be modified by the CPU-hotplug CPU-online code path, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27srcu: Add KCSAN stubsPaul E. McKenney1-0/+13
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27rcu: Add KCSAN stubsPaul E. McKenney1-0/+13
This commit adds stubs for KCSAN's data_race(), ASSERT_EXCLUSIVE_WRITER(), and ASSERT_EXCLUSIVE_ACCESS() macros to allow code using these macros to move ahead. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-14Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgentIngo Molnar1-1/+1
Pull RCU fix from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-05rcu: Don't acquire lock in NMI handler in rcu_nmi_enter_common()Paul E. McKenney1-1/+1
The rcu_nmi_enter_common() function can be invoked both in interrupt and NMI handlers. If it is invoked from process context (as opposed to userspace or idle context) on a nohz_full CPU, it might acquire the CPU's leaf rcu_node structure's ->lock. Because this lock is held only with interrupts disabled, this is safe from an interrupt handler, but doing so from an NMI handler can result in self-deadlock. This commit therefore adds "irq" to the "if" condition so as to only acquire the ->lock from irq handlers or process context, never from an NMI handler. Fixes: 5b14557b073c ("rcu: Avoid tick_dep_set_cpu() misordering") Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> # 5.5.x
2020-03-30Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-6/+19
Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Continued user-access cleanups in the futex code. - percpu-rwsem rewrite that uses its own waitqueue and atomic_t instead of an embedded rwsem. This addresses a couple of weaknesses, but the primary motivation was complications on the -rt kernel. - Introduce raw lock nesting detection on lockdep (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal lock differences. This too originates from -rt. - Reuse lockdep zapped chain_hlocks entries, to conserve RAM footprint on distro-ish kernels running into the "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep chain-entries pool. - Misc cleanups, smaller fixes and enhancements - see the changelog for details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t Documentation/locking/locktypes: Minor copy editor fixes Documentation/locking/locktypes: Further clarifications and wordsmithing m68knommu: Remove mm.h include from uaccess_no.h x86: get rid of user_atomic_cmpxchg_inatomic() generic arch_futex_atomic_op_inuser() doesn't need access_ok() x86: don't reload after cmpxchg in unsafe_atomic_op2() loop x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end() objtool: whitelist __sanitizer_cov_trace_switch() [parisc, s390, sparc64] no need for access_ok() in futex handling sh: no need of access_ok() in arch_futex_atomic_op_inuser() futex: arch_futex_atomic_op_inuser() calling conventions change completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() lockdep: Add posixtimer context tracing bits lockdep: Annotate irq_work lockdep: Add hrtimer context tracing bits lockdep: Introduce wait-type checks completion: Use simple wait queues sched/swait: Prepare usage in completions ...
2020-03-21Merge branches 'doc.2020.02.27a', 'fixes.2020.03.21a', 'kfree_rcu.2020.02.20a', 'locktorture.2020.02.20a', 'ovld.2020.02.20a', 'rcu-tasks.2020.02.20a', 'srcu.2020.02.20a' and 'torture.2020.02.20a' into HEADPaul E. McKenney12-181/+512
doc.2020.02.27a: Documentation updates. fixes.2020.03.21a: Miscellaneous fixes. kfree_rcu.2020.02.20a: Updates to kfree_rcu(). locktorture.2020.02.20a: Lock torture-test updates. ovld.2020.02.20a: Updates to callback-overload handling. rcu-tasks.2020.02.20a: RCU-tasks updates. srcu.2020.02.20a: SRCU updates. torture.2020.02.20a: Torture-test updates.
2020-03-21rcu: Make rcu_barrier() account for offline no-CBs CPUsPaul E. McKenney1-12/+24
Currently, rcu_barrier() ignores offline CPUs, However, it is possible for an offline no-CBs CPU to have callbacks queued, and rcu_barrier() must wait for those callbacks. This commit therefore makes rcu_barrier() directly invoke the rcu_barrier_func() with interrupts disabled for such CPUs. This requires passing the CPU number into this function so that it can entrain the rcu_barrier() callback onto the correct CPU's callback list, given that the code must instead execute on the current CPU. While in the area, this commit fixes a bug where the first CPU's callback might have been invoked before rcu_segcblist_entrain() returned, which would also result in an early wakeup. Fixes: 5d6742b37727 ("rcu/nocb: Use rcu_segcblist for no-CBs CPUs") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Apply optimization feedback from Boqun Feng. ] Cc: <stable@vger.kernel.org> # 5.5.x
2020-03-21rcu: Mark rcu_state.gp_seq to detect concurrent writesPaul E. McKenney1-14/+8
The rcu_state structure's gp_seq field is only to be modified by the RCU grace-period kthread, which is single-threaded. This commit therefore enlists KCSAN's help in enforcing this restriction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-03-21lockdep: Annotate irq_workSebastian Andrzej Siewior1-0/+1
Mark irq_work items with IRQ_WORK_HARD_IRQ which should be invoked in hardirq context even on PREEMPT_RT. IRQ_WORK without this flag will be invoked in softirq context on PREEMPT_RT. Set ->irq_config to 1 for the IRQ_WORK items which are invoked in softirq context so lockdep knows that these can safely acquire a spinlock_t. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200321113242.643576700@linutronix.de
2020-03-21lockdep: Introduce wait-type checksPeter Zijlstra1-6/+18
Extend lockdep to validate lock wait-type context. The current wait-types are: LD_WAIT_FREE, /* wait free, rcu etc.. */ LD_WAIT_SPIN, /* spin loops, raw_spinlock_t etc.. */ LD_WAIT_CONFIG, /* CONFIG_PREEMPT_LOCK, spinlock_t etc.. */ LD_WAIT_SLEEP, /* sleeping locks, mutex_t etc.. */ Where lockdep validates that the current lock (the one being acquired) fits in the current wait-context (as generated by the held stack). This ensures that there is no attempt to acquire mutexes while holding spinlocks, to acquire spinlocks while holding raw_spinlocks and so on. In other words, its a more fancy might_sleep(). Obviously RCU made the entire ordeal more complex than a simple single value test because RCU can be acquired in (pretty much) any context and while it presents a context to nested locks it is not the same as it got acquired in. Therefore its necessary to split the wait_type into two values, one representing the acquire (outer) and one representing the nested context (inner). For most 'normal' locks these two are the same. [ To make static initialization easier we have the rule that: .outer == INV means .outer == .inner; because INV == 0. ] It further means that its required to find the minimal .inner of the held stack to compare against the outer of the new lock; because while 'normal' RCU presents a CONFIG type to nested locks, if it is taken while already holding a SPIN type it obviously doesn't relax the rules. Below is an example output generated by the trivial test code: raw_spin_lock(&foo); spin_lock(&bar); spin_unlock(&bar); raw_spin_unlock(&foo); [ BUG: Invalid wait context ] ----------------------------- swapper/0/1 is trying to lock: ffffc90000013f20 (&bar){....}-{3:3}, at: kernel_init+0xdb/0x187 other info that might help us debug this: 1 lock held by swapper/0/1: #0: ffffc90000013ee0 (&foo){+.+.}-{2:2}, at: kernel_init+0xd1/0x187 The way to read it is to look at the new -{n,m} part in the lock description; -{3:3} for the attempted lock, and try and match that up to the held locks, which in this case is the one: -{2,2}. This tells that the acquiring lock requires a more relaxed environment than presented by the lock stack. Currently only the normal locks and RCU are converted, the rest of the lockdep users defaults to .inner = INV which is ignored. More conversions can be done when desired. The check for spinlock_t nesting is not enabled by default. It's a separate config option for now as there are known problems which are currently addressed. The config option allows to identify these problems and to verify that the solutions found are indeed solving them. The config switch will be removed and the checks will permanently enabled once the vast majority of issues has been addressed. [ bigeasy: Move LD_WAIT_FREE,… out of CONFIG_LOCKDEP to avoid compile failure with CONFIG_DEBUG_SPINLOCK + !CONFIG_LOCKDEP] [ tglx: Add the config option ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200321113242.427089655@linutronix.de
2020-02-20rcutorture: Manually clean up after rcu_barrier() failurePaul E. McKenney1-1/+15
Currently, if rcu_barrier() returns too soon, the test waits 100ms and then does another instance of the test. However, if rcu_barrier() were to have waited for more than 100ms too short a time, this could cause the test's rcu_head structures to be reused while they were still on RCU's callback lists. This can result in knock-on errors that obscure the original rcu_barrier() test failure. This commit therefore adds code that attempts to wait until all of the test's callbacks have been invoked. Of course, if RCU completely lost track of the corresponding rcu_head structures, this wait could be forever. This commit therefore also complains if this attempted recovery takes more than one second, and it also gives up when the test ends. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPUPaul E. McKenney1-3/+13
Currently, rcu_torture_barrier_cbs() posts callbacks from whatever CPU it is running on, which means that all these kthreads might well be posting from the same CPU, which would drastically reduce the effectiveness of this test. This commit therefore uses IPIs to make the callbacks be posted from the corresponding CPU (given by local variable myid). If the IPI fails (which can happen if the target CPU is offline or does not exist at all), the callback is posted on whatever CPU is currently running. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcuperf: Measure memory footprint during kfree_rcu() testJoel Fernandes (Google)1-2/+12
During changes to kfree_rcu() code, we often check the amount of free memory. As an alternative to checking this manually, this commit adds a measurement in the test itself. It measures four times during the test for available memory, digitally filters these measurements to produce a running average with a weight of 0.5, and compares this digitally filtered value with the amount of available memory at the beginning of the test. Something like the following is printed at the end of the run: Total time taken by all kfree'ers: 6369738407 ns, loops: 10000, batches: 764, memory footprint: 216MB Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Annotation lockless accesses to rcu_torture_currentPaul E. McKenney1-4/+6
The rcutorture global variable rcu_torture_current is accessed locklessly, so it must use the RCU pointer load/store primitives. This commit therefore adds several that were missed. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being used only by rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batchPaul E. McKenney1-2/+2
The rcutorture rcu_torture_count and rcu_torture_batch per-CPU variables are read locklessly, so this commit adds the READ_ONCE() to a load in order to avoid various types of compiler vandalism^Woptimization. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Fix stray access to rcu_fwd_cb_nodelayPaul E. McKenney1-1/+1
The rcu_fwd_cb_nodelay variable suppresses excessively long read-side delays while carrying out an rcutorture forward-progress test. As such, it is accessed both by readers and updaters, and most of the accesses therefore use *_ONCE(). Except for one in rcu_read_delay(), which this commit fixes. This data race was reported by KCSAN. Not appropriate for backporting due to this being rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data racePaul E. McKenney1-4/+6
The ->rtort_pipe_count field in the rcu_torture structure checks for too-short grace periods, and is therefore read by rcutorture's readers while being updated by rcutorture's writers. This commit therefore adds the needed READ_ONCE() and WRITE_ONCE() invocations. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely and due to this being rcutorture. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Suppress boottime bad-sequence warningsPaul E. McKenney1-1/+2
In normal production, an excessively long wait on a grace period (synchronize_rcu(), for example) at boottime is often just as bad as at any other time. In fact, given the desire for fast boot, any sort of long wait at boot is a bad idea. However, heavy rcutorture testing on large hyperthreaded systems can generate such long waits during boot as a matter of course. This commit therefore causes the rcupdate.rcu_cpu_stall_suppress_at_boot kernel boot parameter to suppress reporting of bootime bad-sequence warning due to excessively long grace-period waits. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Allow boottime stall warnings to be suppressedPaul E. McKenney5-6/+29
In normal production, an RCU CPU stall warning at boottime is often just as bad as at any other time. In fact, given the desire for fast boot, any sort of long-term stall at boot is a bad idea. However, heavy rcutorture testing on large hyperthreaded systems can generate boottime RCU CPU stalls as a matter of course. This commit therefore provides a kernel boot parameter that suppresses reporting of boottime RCU CPU stall warnings and similarly of rcutorture writer stalls. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Refrain from callback flooding during bootPaul E. McKenney1-2/+5
Additional rcutorture aggression can result in, believe it or not, boot times in excess of three minutes on large hyperthreaded systems. This is long enough for rcutorture to decide to do some callback flooding, which seems a bit excessive given that userspace cannot have started until long after boot, and it is userspace that does the real-world callback flooding. Worse yet, because Tiny RCU lacks forward-progress functionality, the looping-in-the-kernel tests can also be problematic during early boot. This commit therefore causes rcutorture to hold off on callback flooding until about the time that init is spawned, and the same for looping-in-the-kernel tests for Tiny RCU. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20rcutorture: Suppress forward-progress complaints during early bootPaul E. McKenney3-2/+20
Some larger systems can take in excess of 50 seconds to complete their early boot initcalls prior to spawing init. This does not in any way help the forward-progress judgments of built-in rcutorture (when rcutorture is built as a module, the insmod or modprobe command normally cannot happen until some time after boot completes). This commit therefore suppresses such complaints until about the time that init is spawned. This also includes a fix to a stupid error located by kbuild test robot. [ paulmck: Apply kbuild test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Fix to nohz_full slow-expediting recovery logic, per bpetkov. ] [ paulmck: Restrict splat to CONFIG_PREEMPT_RT=y kernels and simplify. ] Tested-by: Borislav Petkov <bp@alien8.de>
2020-02-20srcu: Hold srcu_struct ->lock when updating ->srcu_gp_seqPaul E. McKenney1-1/+3
A read of the srcu_struct structure's ->srcu_gp_seq field should not need READ_ONCE() when that structure's ->lock is held. Except that this lock is not always held when updating this field. This commit therefore acquires the lock around updates and removes a now-unneeded READ_ONCE(). This data race was reported by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> [ paulmck: Switch from READ_ONCE() to lock per Peter Zilstra question. ] Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2020-02-20srcu: Fix process_srcu()/srcu_batches_completed() dataracePaul E. McKenney1-1/+1
The srcu_struct structure's ->srcu_idx field is accessed locklessly, so reads must use READ_ONCE(). This commit therefore adds the needed READ_ONCE() invocation where it was missed. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-02-20srcu: Fix __call_srcu()/srcu_get_delay() dataracePaul E. McKenney1-3/+3
The srcu_struct structure's ->srcu_gp_seq_needed_exp field is accessed locklessly, so updates must use WRITE_ONCE(). This commit therefore adds the needed WRITE_ONCE() invocations. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>