aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-08-31rcu: Set rcu_data structures' initial ->gpwrap value to truePaul E. McKenney1-0/+1
It would be good do reduce the size of the rcu_gp_oldstate structure from three unsigned long instances to two, but this requires that the boot-time optimized grace periods update the various ->gp_seq fields. Updating these fields in the rcu_state structure and in all of the rcu_node structures is at least semi-reasonable, but updating them in all of the rcu_data structures is a bridge too far. This means that if there are too many early boot-time grace periods, the ->gp_seq field in the rcu_data structure cannot be trusted. This commit therefore sets each rcu_data structure's ->gpwrap field to provide the necessary impetus for a suitable level of distrust. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Disable run-time single-CPU grace-period optimizationPaul E. McKenney1-31/+9
The run-time single-CPU grace-period optimization applies only to kernels built with CONFIG_SMP=y && CONFIG_PREEMPTION=y that are running on a single-CPU system. But a kernel intended for a single-CPU system should instead be built with CONFIG_SMP=n, and in any case, single-CPU systems running Linux no longer appear to be the common case. Plus this optimization results in the rcu_gp_oldstate structure being half again larger than it needs to be. This commit therefore disables the run-time single-CPU grace-period optimization, so that this optimization applies only during the pre-scheduler portion of the boot sequence. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Add full-sized polling for cond_sync_exp_full()Paul E. McKenney4-22/+83
The cond_synchronize_rcu_expedited() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the cond_synchronize_rcu_exp_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Add full-sized polling for cond_sync_full()Paul E. McKenney4-21/+80
The cond_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the cond_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. [ paulmck: Apply feedback from kernel test robot and Julia Lawall. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Remove blank line from poll_state_synchronize_rcu() docbook headerPaul E. McKenney1-3/+2
This commit removes the blank line preceding the oldstate parameter to the docbook header for the poll_state_synchronize_rcu() function and marks uses of this parameter later in that header. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Add full-sized polling for start_poll_expedited()Paul E. McKenney4-10/+65
The start_poll_synchronize_rcu_expedited() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the start_poll_synchronize_rcu_expedited_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. [ paulmck: Apply feedback from kernel test robot and Julia Lawall. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Add full-sized polling for start_poll()Paul E. McKenney4-22/+92
The start_poll_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the next member of the full-state RCU grace-period polling API, namely the start_poll_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcutorture: Verify long-running reader prevents full polling from completingPaul E. McKenney1-0/+10
This commit adds full-state polling checks to accompany the old-style polling checks in the rcu_torture_one_read() function. If a polling cycle within an RCU reader completes, a WARN_ONCE() is triggered. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcutorture: Remove redundant RTWS_DEF_FREE checkPaul E. McKenney1-2/+1
This check does nothing because the state at this point in the code because the rcu_torture_writer_state value is guaranteed to instead be RTWS_REPLACE. This commit therefore removes this check. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcutorture: Verify RCU reader prevents full polling from completingPaul E. McKenney1-3/+16
This commit adds a test to rcu_torture_writer() that verifies that a ->get_gp_state_full() and ->poll_gp_state_full() polled grace-period sequence does not claim that a grace period elapsed within the confines of the corresponding read-side critical section. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcutorture: Allow per-RCU-flavor polled double-GP checkPaul E. McKenney1-1/+8
Only vanilla RCU needs a double grace period for its compressed polled grace-period old-state cookie. This commit therefore adds an rcu_torture_ops per-flavor function ->poll_need_2gp to allow this check to be adapted to the RCU flavor under test. A NULL pointer for this function says that doubled grace periods are never needed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcutorture: Abstract synchronous and polled API testingPaul E. McKenney1-12/+36
This commit abstracts a do_rtws_sync() function that does synchronous grace-period testing, but also testing the polled API 25% of the time each for the normal and full-state variants of the polled API. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Add full-sized polling for get_state()Paul E. McKenney4-4/+46
The get_state_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the next member of the full-state RCU grace-period polling API, namely the get_state_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Add full-sized polling for get_completed*() and poll_state*()Paul E. McKenney6-4/+111
The get_completed_synchronize_rcu() and poll_state_synchronize_rcu() APIs compress the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the first members of the full-state RCU grace-period polling API, namely the get_completed_synchronize_rcu_full() and poll_state_synchronize_rcu_full() functions. These use up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but which are guaranteed not to miss grace periods, at least in situations where the single-CPU grace-period optimization does not apply. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu/nocb: Add CPU number to CPU-{,de}offload failure messagesPaul E. McKenney1-2/+2
Offline CPUs cannot be offloaded or deoffloaded. Any attempt to offload or deoffload an offline CPU causes a message to be printed on the console, which is good, but this message does not contain the CPU number, which is bad. Such a CPU number can be helpful when debugging, as it gives a clear indication that the CPU in question is in fact offline. This commit therefore adds the CPU number to the CPU-{,de}offload failure messages. Cc: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu/nocb: Choose the right rcuog/rcuop kthreads to outputZqiang1-3/+3
The show_rcu_nocb_gp_state() function is supposed to dump out the rcuog kthread and the show_rcu_nocb_state() function is supposed to dump out the rcuo[ps] kthread. Currently, both do a mixture, which is not optimal for debugging, even though it does not affect functionality. This commit therefore adjusts these two functions to focus on their respective kthreads. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu/kvfree: Update KFREE_DRAIN_JIFFIES intervalUladzislau Rezki (Sony)1-4/+19
Currently the monitor work is scheduled with a fixed interval of HZ/20, which is roughly 50 milliseconds. The drawback of this approach is low utilization of the 512 page slots in scenarios with infrequence kvfree_rcu() calls. For example on an Android system: <snip> kworker/3:3-507 [003] .... 470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6 kworker/6:1-76 [006] .... 470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1 kworker/6:1-76 [006] .... 470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9 kworker/3:3-507 [003] .... 471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48 kworker/1:1-73 [001] .... 471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3 kworker/1:1-73 [001] .... 471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76 kworker/0:4-1411 [000] .... 472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1 kworker/0:4-1411 [000] .... 472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5 kworker/6:1-76 [006] .... 472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102 <snip> In many cases, out of 512 slots, fewer than 10 were actually used. In order to improve batching and make utilization more efficient this commit sets a drain interval to a fixed 5-seconds interval. Floods are detected when a page fills quickly, and in that case, the reclaim work is re-scheduled for the next scheduling-clock tick (jiffy). After this change: <snip> kworker/7:1-371 [007] .... 5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121 kworker/7:1-371 [007] .... 5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47 kworker/7:1-371 [007] .... 5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510 kworker/7:1-371 [007] .... 5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169 kworker/7:1-371 [007] .... 5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510 kworker/5:6-9428 [005] .... 5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123 kworker/4:7-9434 [004] .... 5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322 kworker/4:7-9434 [004] .... 5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185 kworker/7:1-371 [007] .... 5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510 <snip> Here, all but one of the cases, more than one hundreds slots were used, representing an order-of-magnitude improvement. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu/kfree: Fix kfree_rcu_shrink_count() return valueJoel Fernandes (Google)1-1/+1
As per the comments in include/linux/shrinker.h, .count_objects callback should return the number of freeable items, but if there are no objects to free, SHRINK_EMPTY should be returned. The only time 0 is returned should be when we are unable to determine the number of objects, or the cache should be skipped for another reason. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Back off upon fill_page_cache_func() allocation failureMichal Hocko1-8/+9
The fill_page_cache_func() function allocates couple of pages to store kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY) allocation which can fail under memory pressure. The function will, however keep retrying even when the previous attempt has failed. This retrying is in theory correct, but in practice the allocation is invoked from workqueue context, which means that if the memory reclaim gets stuck, these retries can hog the worker for quite some time. Although the workqueues subsystem automatically adjusts concurrency, such adjustment is not guaranteed to happen until the worker context sleeps. And the fill_page_cache_func() function's retry loop is not guaranteed to sleep (see the should_reclaim_retry() function). And we have seen this function cause workqueue lockups: kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s! [...] kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146 kernel: pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5 kernel: in-flight: 1917:fill_page_cache_func kernel: pending: dbs_work_handler, free_work, kfree_rcu_monitor Originally, we thought that the root cause of this lockup was several retries with direct reclaim, but this is not yet confirmed. Furthermore, we have seen similar lockups without any heavy memory pressure. This suggests that there are other factors contributing to these lockups. However, it is not really clear that endless retries are desireable. So let's make the fill_page_cache_func() function back off after allocation failure. Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Exclude outgoing CPU when it is the last to leavePaul E. McKenney1-1/+4
The rcu_boost_kthread_setaffinity() function removes the outgoing CPU from the set_cpus_allowed() mask for the corresponding leaf rcu_node structure's rcub priority-boosting kthread. Except that if the outgoing CPU will leave that structure without any online CPUs, the mask is set to the housekeeping CPU mask from housekeeping_cpumask(). Which is fine unless the outgoing CPU happens to be a housekeeping CPU. This commit therefore removes the outgoing CPU from the housekeeping mask. This would of course be problematic if the outgoing CPU was the last online housekeeping CPU, but in that case you are in a world of hurt anyway. If someone comes up with a valid use case for a system needing all the housekeeping CPUs to be offline, further adjustments can be made. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Avoid triggering strict-GP irq-work when RCU is idleZqiang1-1/+2
Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger irq-work from rcu_read_unlock(), and the resulting irq-work handler invokes rcu_preempt_deferred_qs_handle(). The point of this triggering is to force grace periods to end quickly in order to give tools like KASAN a better chance of detecting RCU usage bugs such as leaking RCU-protected pointers out of an RCU read-side critical section. However, this irq-work triggering is unconditional. This works, but there is no point in doing this irq-work unless the current grace period is waiting on the running CPU or task, which is not the common case. After all, in the common case there are many rcu_read_unlock() calls per CPU per grace period. This commit therefore triggers the irq-work only when the current grace period is waiting on the running CPU or task. This change was tested as follows on a four-CPU system: echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter echo 1 > /sys/kernel/debug/tracing/function_profile_enabled insmod rcutorture.ko sleep 20 rmmod rcutorture.ko echo 0 > /sys/kernel/debug/tracing/function_profile_enabled echo > /sys/kernel/debug/tracing/set_ftrace_filter This procedure produces results in this per-CPU set of files: /sys/kernel/debug/tracing/trace_stat/function* Sample output from one of these files is as follows: Function Hit Time Avg s^2 -------- --- ---- --- --- rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us The baseline sum of the "Hit" values (the number of calls to this function) was 3,319,015. With this commit, that sum was 1,140,359, for a 2.9x reduction. The worst-case variance across the CPUs was less than 25%, so this large effect size is statistically significant. The raw data is available in the Link: URL. Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/ Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31sched/debug: Show the registers of 'current' in dump_cpu_task()Zhen Lei1-0/+11
The dump_cpu_task() function does not print registers on architectures that do not support NMIs. However, registers can be useful for debugging. Fortunately, in the case where dump_cpu_task() is invoked from an interrupt handler and is dumping the current CPU's stack, the get_irq_regs() function can be used to get the registers. Therefore, this commit makes dump_cpu_task() check to see if it is being asked to dump the current CPU's stack from within an interrupt handler, and, if so, it uses the get_irq_regs() function to obtain the registers. On systems that do support NMIs, this commit has the further advantage of avoiding a self-NMI in this case. This is an example of rcu self-detected stall on arm64, which does not support NMIs: [ 27.501721] rcu: INFO: rcu_preempt self-detected stall on CPU [ 27.502238] rcu: 0-....: (1250 ticks this GP) idle=4f7/1/0x4000000000000000 softirq=2594/2594 fqs=619 [ 27.502632] (t=1251 jiffies g=2989 q=29 ncpus=4) [ 27.503845] CPU: 0 PID: 306 Comm: test0 Not tainted 5.19.0-rc7-00009-g1c1a6c29ff99-dirty #46 [ 27.504732] Hardware name: linux,dummy-virt (DT) [ 27.504947] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 27.504998] pc : arch_counter_read+0x18/0x24 [ 27.505301] lr : arch_counter_read+0x18/0x24 [ 27.505328] sp : ffff80000b29bdf0 [ 27.505345] x29: ffff80000b29bdf0 x28: 0000000000000000 x27: 0000000000000000 [ 27.505475] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 27.505553] x23: 0000000000001f40 x22: ffff800009849c48 x21: 000000065f871ae0 [ 27.505627] x20: 00000000000025ec x19: ffff80000a6eb300 x18: ffffffffffffffff [ 27.505654] x17: 0000000000000001 x16: 0000000000000000 x15: ffff80000a6d0296 [ 27.505681] x14: ffffffffffffffff x13: ffff80000a29bc18 x12: 0000000000000426 [ 27.505709] x11: 0000000000000162 x10: ffff80000a2f3c18 x9 : ffff80000a29bc18 [ 27.505736] x8 : 00000000ffffefff x7 : ffff80000a2f3c18 x6 : 00000000759bd013 [ 27.505761] x5 : 01ffffffffffffff x4 : 0002dc6c00000000 x3 : 0000000000000017 [ 27.505787] x2 : 00000000000025ec x1 : ffff80000b29bdf0 x0 : 0000000075a30653 [ 27.505937] Call trace: [ 27.506002] arch_counter_read+0x18/0x24 [ 27.506171] ktime_get+0x48/0xa0 [ 27.506207] test_task+0x70/0xf0 [ 27.506227] kthread+0x10c/0x110 [ 27.506243] ret_from_fork+0x10/0x20 This is a marked improvement over the old output: [ 27.944550] rcu: INFO: rcu_preempt self-detected stall on CPU [ 27.944980] rcu: 0-....: (1249 ticks this GP) idle=cbb/1/0x4000000000000000 softirq=2610/2610 fqs=614 [ 27.945407] (t=1251 jiffies g=2681 q=28 ncpus=4) [ 27.945731] Task dump for CPU 0: [ 27.945844] task:test0 state:R running task stack: 0 pid: 306 ppid: 2 flags:0x0000000a [ 27.946073] Call trace: [ 27.946151] dump_backtrace.part.0+0xc8/0xd4 [ 27.946378] show_stack+0x18/0x70 [ 27.946405] sched_show_task+0x150/0x180 [ 27.946427] dump_cpu_task+0x44/0x54 [ 27.947193] rcu_dump_cpu_stacks+0xec/0x130 [ 27.947212] rcu_sched_clock_irq+0xb18/0xef0 [ 27.947231] update_process_times+0x68/0xac [ 27.947248] tick_sched_handle+0x34/0x60 [ 27.947266] tick_sched_timer+0x4c/0xa4 [ 27.947281] __hrtimer_run_queues+0x178/0x360 [ 27.947295] hrtimer_interrupt+0xe8/0x244 [ 27.947309] arch_timer_handler_virt+0x38/0x4c [ 27.947326] handle_percpu_devid_irq+0x88/0x230 [ 27.947342] generic_handle_domain_irq+0x2c/0x44 [ 27.947357] gic_handle_irq+0x44/0xc4 [ 27.947376] call_on_irq_stack+0x2c/0x54 [ 27.947415] do_interrupt_handler+0x80/0x94 [ 27.947431] el1_interrupt+0x34/0x70 [ 27.947447] el1h_64_irq_handler+0x18/0x24 [ 27.947462] el1h_64_irq+0x64/0x68 <--- the above backtrace is worthless [ 27.947474] arch_counter_read+0x18/0x24 [ 27.947487] ktime_get+0x48/0xa0 [ 27.947501] test_task+0x70/0xf0 [ 27.947520] kthread+0x10c/0x110 [ 27.947538] ret_from_fork+0x10/0x20 Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com>
2022-08-31sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()Zhen Lei3-5/+6
The trigger_all_cpu_backtrace() function attempts to send an NMI to the target CPU, which usually provides much better stack traces than the dump_cpu_task() function's approach of dumping that stack from some other CPU. So much so that most calls to dump_cpu_task() only happen after a call to trigger_all_cpu_backtrace() has failed. And the exception to this rule really should attempt to use trigger_all_cpu_backtrace() first. Therefore, move the trigger_all_cpu_backtrace() invocation into dump_cpu_task(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com>
2022-08-31rcu: Update rcu_access_pointer() header for rcu_dereference_protected()Paul E. McKenney1-5/+13
The rcu_access_pointer() docbook header correctly notes that it may be used during post-grace-period teardown. However, it is usually better to use rcu_dereference_protected() for this purpose. This commit therefore calls out this preferred usage. Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Document reason for rcu_all_qs() call to preempt_disable()Paul E. McKenney1-1/+1
Given that rcu_all_qs() is in non-preemptible kernels, why on earth should it invoke preempt_disable()? This commit adds the reason, which is to work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels. Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Make tiny RCU support leak callbacks for debug-object errorsZqiang1-1/+16
Currently, only Tree RCU leaks callbacks setting when it detects a duplicate call_rcu(). This commit causes Tiny RCU to also leak callbacks in this situation. Because this is Tiny RCU, kernel size is important: 1. CONFIG_TINY_RCU=y and CONFIG_DEBUG_OBJECTS_RCU_HEAD=n (Production kernel) Original: text data bss dec hex filename 26290663 20159823 15212544 61663030 3ace736 vmlinux With this commit: text data bss dec hex filename 26290663 20159823 15212544 61663030 3ace736 vmlinux 2. CONFIG_TINY_RCU=y and CONFIG_DEBUG_OBJECTS_RCU_HEAD=y (Debugging kernel) Original: text data bss dec hex filename 26291319 20160143 15212544 61664006 3aceb06 vmlinux With this commit: text data bss dec hex filename 26291319 20160431 15212544 61664294 3acec26 vmlinux These results show that the kernel size is unchanged for production kernels, as desired. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Add QS check in rcu_exp_handler() for non-preemptible kernelsZqiang1-1/+3
Kernels built with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y maintain preempt_count() state. Because such kernels map __rcu_read_lock() and __rcu_read_unlock() to preempt_disable() and preempt_enable(), respectively, this allows the expedited grace period's !CONFIG_PREEMPT_RCU version of the rcu_exp_handler() IPI handler function to use preempt_count() to detect quiescent states. This preempt_count() usage might seem to risk failures due to use of implicit RCU readers in portions of the kernel under #ifndef CONFIG_PREEMPTION, except that rcu_core() already disallows such implicit RCU readers. The moral of this story is that you must use explicit read-side markings such as rcu_read_lock() or preempt_disable() even if the code knows that this kernel does not support preemption. This commit therefore adds a preempt_count()-based check for a quiescent state in the !CONFIG_PREEMPT_RCU version of the rcu_exp_handler() function for kernels built with CONFIG_PREEMPT_COUNT=y, reporting an immediate quiescent state when the interrupted code had both preemption and softirqs enabled. This change results in about a 2% reduction in expedited grace-period latency in kernels built with both CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT_COUNT=y. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/all/20220622103549.2840087-1-qiang1.zhang@intel.com/
2022-08-31rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernelsZqiang1-4/+7
In non-premptible kernels, tasks never do context switches within RCU read-side critical sections. Therefore, in such kernels, each leaf rcu_node structure's ->blkd_tasks list will always be empty. The comment on the non-preemptible version of rcu_preempt_deferred_qs() confuses this point, so this commit therefore fixes it. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31rcu: Fix rcu_read_unlock_strict() strict QS reportingZqiang1-0/+1
Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y report the quiescent state directly from the outermost rcu_read_unlock(). However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm might still be set, in which case rcu_report_qs_rdp() will exit early, thus failing to report quiescent state. This commit therefore causes rcu_read_unlock_strict() to clear CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp(). Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31doc/rcu: Update LWN article URLs and add 2019 articleShao-Tse Hung1-7/+9
This patch adds LWN articles about RCU APIs which were released in 2019. Also, HTTP URLs are replaced by HTTPS. Signed-off-by: Shao-Tse Hung <ccs100203@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31doc: SLAB_TYPESAFE_BY_RCU uses cannot rely on spinlocksPaul E. McKenney1-7/+12
Because the SLAB_TYPESAFE_BY_RCU code does not zero pages that are to be broken up into slabs, the memory returned by kmem_cache_alloc() must be fully initialized, including any spinlocks included in the newly allocated structure. This means that readers attempting to look up an SLAB_TYPESAFE_BY_RCU object must use a reference-counting approach. A spinlock may be acquired only after a reference is obtained, which prevents that object from being passed to kmem_struct_free(), but only while that reference continues to be held. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31doc: Update rcu_access_pointer() advice in rcu_dereference.rstPaul E. McKenney1-4/+10
This commit updates the rcu_access_pointer() advice, noting that its return value should not be assigned to a local variable, and also noting that there is little point in using rcu_access_pointer() within an RCU read-side critical section. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31doc: Fix list: rcu_access_pointer() is not lockdep-checkedPaul E. McKenney1-3/+9
The rcu_access_pointer() macro does not consult lockdep by design because it is intended to be used outside of RCU read-side critical sections. This commit therefore makes a separate list for it in whatisRCU.rst. Similarly, RCU_LOCKDEP_WARN(), rcu_sleep_check(), and RCU_NONIDLE() do not do anything with pointer access. This commit therefore creates a separate utility-API list for them. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31doc: Use rcu_barrier() to rate-limit RCU callbacksPaul E. McKenney1-1/+2
The checklist.rst document advises periodic synchronize_rcu() invocations to prevent callback flooding. However, rcu_barrier() is often a better choice. This commit therefore adds words to this effect. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31doc: Call out queue_rcu_work() for blocking RCU callbacksPaul E. McKenney1-0/+3
The current checklist.rst file correctly notes that RCU callbacks execute in BH context, and cannot block. This commit adds words advising people needing callbacks to block to use workqueues, for example, by replacing call_rcu() with queue_rcu_work(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31doc: Emphasize the need for explicit RCU read-side markersPaul E. McKenney1-2/+7
This commit updates checklist.rst to emphasize the need for explicit markers for RCU read-side critical sections. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-08-31phy: lan966x: add support for QUSGMIIMaxime Chevallier1-0/+3
Makes so that the serdes driver also takes QUSGMII in consideration. It's configured exactly as QSGMII as far as the serdes driver is concerned. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-31gpio: pca953x: Add mutex_lock for regcache sync in PMHaibo Chen1-1/+7
The regcache sync will set the cache_bypass = true, at that time, when there is regmap write operation, it will bypass the regmap cache, then the regcache sync will write back the value from cache to register, which is not as our expectation. Though regmap already use its internal lock to avoid such issue, but this driver force disable the regmap internal lock in its regmap config: disable_locking = true To avoid this issue, use the driver's own lock to do the protect in system PM. Fixes: b76574300504 ("gpio: pca953x: Restore registers after suspend/resume cycle") Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-08-31gpio: ucb1400: Use proper headerLinus Walleij2-1/+2
The UCB1400 implements a GPIO driver so it needs to include the <linux/gpio/driver.h> header, not the legacy <linux/gpio.h> header. Compile tested on pxa_defconfig. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-08-31netfilter: nf_tables: clean up hook list when offload flags check failsPablo Neira Ayuso1-1/+3
splice back the hook list so nft_chain_release_hook() has a chance to release the hooks. BUG: memory leak unreferenced object 0xffff88810180b100 (size 96): comm "syz-executor133", pid 3619, jiffies 4294945714 (age 12.690s) hex dump (first 32 bytes): 28 64 23 02 81 88 ff ff 28 64 23 02 81 88 ff ff (d#.....(d#..... 90 a8 aa 83 ff ff ff ff 00 00 b5 0f 81 88 ff ff ................ backtrace: [<ffffffff83a8c59b>] kmalloc include/linux/slab.h:600 [inline] [<ffffffff83a8c59b>] nft_netdev_hook_alloc+0x3b/0xc0 net/netfilter/nf_tables_api.c:1901 [<ffffffff83a9239a>] nft_chain_parse_netdev net/netfilter/nf_tables_api.c:1998 [inline] [<ffffffff83a9239a>] nft_chain_parse_hook+0x33a/0x530 net/netfilter/nf_tables_api.c:2073 [<ffffffff83a9b14b>] nf_tables_addchain.constprop.0+0x10b/0x950 net/netfilter/nf_tables_api.c:2218 [<ffffffff83a9c41b>] nf_tables_newchain+0xa8b/0xc60 net/netfilter/nf_tables_api.c:2593 [<ffffffff83a3d6a6>] nfnetlink_rcv_batch+0xa46/0xd20 net/netfilter/nfnetlink.c:517 [<ffffffff83a3db79>] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:638 [inline] [<ffffffff83a3db79>] nfnetlink_rcv+0x1f9/0x220 net/netfilter/nfnetlink.c:656 [<ffffffff83a13b17>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] [<ffffffff83a13b17>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 [<ffffffff83a13fd6>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 [<ffffffff83865ab6>] sock_sendmsg_nosec net/socket.c:714 [inline] [<ffffffff83865ab6>] sock_sendmsg+0x56/0x80 net/socket.c:734 [<ffffffff8386601c>] ____sys_sendmsg+0x36c/0x390 net/socket.c:2482 [<ffffffff8386a918>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 [<ffffffff8386aaa8>] __sys_sendmsg+0x88/0x100 net/socket.c:2565 [<ffffffff845e5955>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff845e5955>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook") Reported-by: syzbot+5fcdbfab6d6744c57418@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-31drm/i915: move backlight to display.backlightJani Nikula4-18/+20
Move display backlight related members under drm_i915_private display sub-struct. Prefer adding anonymous sub-structs even for single members that aren't our own structs. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/026241565dad12e0024c443419fa5e0caac41b2d.1661779055.git.jani.nikula@intel.com
2022-08-31drm/i915: move and group cdclk under display.cdclkJani Nikula15-142/+141
Move display cdclk related members under drm_i915_private display sub-struct. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/7df23655be5dc70fb1a2b43ce41e1682e40395d8.1661779055.git.jani.nikula@intel.com
2022-08-31drm/i915: move opregion to display.opregionJani Nikula6-28/+30
Move display opregion related members under drm_i915_private display sub-struct. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a0ceb5148835fa3e0828786ae491fcd11e2e77ff.1661779055.git.jani.nikula@intel.com
2022-08-31drm/i915: move and group max_bw and bw_obj under display.bwJani Nikula4-42/+44
Move display bandwidth related members under drm_i915_private display sub-struct. v2: Rebase Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/c8b9e2fdc5c226ffb71759a20e561c832a774ba5.1661779055.git.jani.nikula@intel.com
2022-08-31drm/i915: move and group hdcp under display.hdcpJani Nikula4-74/+77
Move display hdcp related members under drm_i915_private display sub-struct. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Arun R Murthy <arun.r.murthy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1083f5a58cce1507bd19bf7f98bf85e9351b741e.1661779055.git.jani.nikula@intel.com
2022-08-31regulator: Fix qcom,spmi-regulator schemaJean-Philippe Brucker1-2/+1
The DT validator reports an error in the schema: Documentation/devicetree/bindings/regulator/qcom,spmi-regulator.yaml: ignoring, error in schema: patternProperties: ^(5vs[1-2]|(l|s)[1-9][0-9]?|lvs[1-3])$: properties Move the unevaluatedProperties statement out of the properties section to fix it. Fixes: 0b3bbd7646b0 ("regulator: qcom,spmi-regulator: Convert to dtschema") Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Link: https://lore.kernel.org/r/20220831080503.17600-1-jean-philippe@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2022-08-31spi: stm32-qspi: Refactor dual flash mode enable check in ->setup()Andy Shevchenko1-2/+1
gpiod_count() either returns positive number of the CS or negative error code. In the stm32_qspi_setup() we check that configuration has enough CS for the dual flash mode and SPI mode is not changing over the lines of the code. Taking all above into considertion, refactor dual flash mode enable check by dropping unneeded CS check and reusing local mode variable. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Patrice Chotard <patrice.chotard@foss.st.com> Link: https://lore.kernel.org/r/20220830182821.47919-2-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-08-31spi: stm32-qspi: Replace of_gpio_named_count() by gpiod_count()Andy Shevchenko1-2/+2
As a preparation to unexport of_gpio_named_count(), convert the driver to use gpiod_count() instead. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Patrice Chotard <patrice.chotard@foss.st.com> Link: https://lore.kernel.org/r/20220830182821.47919-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2022-08-31ASoC: max98396: Make data monitor features configurableDaniel Mack2-0/+116
Allow the data monitor features to be enabled explicitly, and enable control over their details. Signed-off-by: Daniel Mack <daniel@zonque.org> Link: https://lore.kernel.org/r/20220826085927.2336224-2-daniel@zonque.org Signed-off-by: Mark Brown <broonie@kernel.org>
2022-08-31ASoC: dt-bindings: max98396: Document data monitor propertiesDaniel Mack1-0/+34
This device features a data monitor that puts the device in software reset upon a configurable set of events. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220826085927.2336224-1-daniel@zonque.org Signed-off-by: Mark Brown <broonie@kernel.org>