aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-08-15perf: Merge consecutive conditionals in perf_mmap()Peter Zijlstra1-22/+19
if (cond) { A; } else { B; } if (cond) { C; } else { D; } into: if (cond) { A; C; } else { B; D; } Notably the conditions are not identical in form, but are equivalent. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.900078502@infradead.org
2025-08-15perf: Move perf_mmap_calc_limits() into both rb and aux branchesPeter Zijlstra1-8/+18
if (cond) { A; } else { B; } C; into if (cond) { A; C; } else { B; C; } Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.781244099@infradead.org
2025-08-15perf: Split out VM accountingThomas Gleixner1-4/+9
Similarly to the mlock limit calculation the VM accounting is required for both the ringbuffer and the AUX buffer allocations. To prepare for splitting them out into separate functions, move the accounting into a helper function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.660347811@infradead.org
2025-08-15perf: Split out mlock limit handlingThomas Gleixner1-37/+38
To prepare for splitting the buffer allocation out into separate functions for the ring buffer and the AUX buffer, split out mlock limit handling into a helper function, which can be called from both. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.541975109@infradead.org
2025-08-15perf: Remove redundant condition for AUX buffer sizeThomas Gleixner1-1/+1
It is already checked whether the VMA size is the same as nr_pages * PAGE_SIZE, so later checking both: aux_size == vma_size && aux_size == nr_pages * PAGE_SIZE is redundant. Remove the vma_size check as nr_pages is what is actually used in the allocation function. That prepares for splitting out the buffer allocation into separate functions, so that only nr_pages needs to be handed in. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20250812104018.424519320@infradead.org
2025-08-15perf: Avoid undefined behavior from stopping/starting inactive eventsYunseong Kim1-0/+6
Calling pmu->start()/stop() on perf events in PERF_EVENT_STATE_OFF can leave event->hw.idx at -1. When PMU drivers later attempt to use this negative index as a shift exponent in bitwise operations, it leads to UBSAN shift-out-of-bounds reports. The issue is a logical flaw in how event groups handle throttling when some members are intentionally disabled. Based on the analysis and the reproducer provided by Mark Rutland (this issue on both arm64 and x86-64). The scenario unfolds as follows: 1. A group leader event is configured with a very aggressive sampling period (e.g., sample_period = 1). This causes frequent interrupts and triggers the throttling mechanism. 2. A child event in the same group is created in a disabled state (.disabled = 1). This event remains in PERF_EVENT_STATE_OFF. Since it hasn't been scheduled onto the PMU, its event->hw.idx remains initialized at -1. 3. When throttling occurs, perf_event_throttle_group() and later perf_event_unthrottle_group() iterate through all siblings, including the disabled child event. 4. perf_event_throttle()/unthrottle() are called on this inactive child event, which then call event->pmu->start()/stop(). 5. The PMU driver receives the event with hw.idx == -1 and attempts to use it as a shift exponent. e.g., in macros like PMCNTENSET(idx), leading to the UBSAN report. The throttling mechanism attempts to start/stop events that are not actively scheduled on the hardware. Move the state check into perf_event_throttle()/perf_event_unthrottle() so that inactive events are skipped entirely. This ensures only active events with a valid hw.idx are processed, preventing undefined behavior and silencing UBSAN warnings. The corrected check ensures true before proceeding with PMU operations. The problem can be reproduced with the syzkaller reproducer: Fixes: 9734e25fbf5a ("perf: Fix the throttle logic for a group") Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20250812181046.292382-2-ysk@kzalloc.com
2025-08-15bpf: Check the helper function is valid in get_helper_protoJiri Olsa2-2/+5
kernel test robot reported verifier bug [1] where the helper func pointer could be NULL due to disabled config option. As Alexei suggested we could check on that in get_helper_proto directly. Marking tail_call helper func with BPF_PTR_POISON, because it is unused by design. [1] https://lore.kernel.org/oe-lkp/202507160818.68358831-lkp@intel.com Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: syzbot+a9ed3d9132939852d0df@syzkaller.appspotmail.com Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250814200655.945632-1-jolsa@kernel.org Closes: https://lore.kernel.org/oe-lkp/202507160818.68358831-lkp@intel.com
2025-08-15bpf, cpumap: Disable page_pool direct xdp_return need larger scopeJesper Dangaard Brouer1-2/+2
When running an XDP bpf_prog on the remote CPU in cpumap code then we must disable the direct return optimization that xdp_return can perform for mem_type page_pool. This optimization assumes code is still executing under RX-NAPI of the original receiving CPU, which isn't true on this remote CPU. The cpumap code already disabled this via helpers xdp_set_return_frame_no_direct() and xdp_clear_return_frame_no_direct(), but the scope didn't include xdp_do_flush(). When doing XDP_REDIRECT towards e.g devmap this causes the function bq_xmit_all() to run with direct return optimization enabled. This can lead to hard to find bugs. The issue only happens when bq_xmit_all() cannot ndo_xdp_xmit all frames and them frees them via xdp_return_frame_rx_napi(). Fix by expanding scope to include xdp_do_flush(). This was found by Dragos Tatulea. Fixes: 11941f8a8536 ("bpf: cpumap: Implement generic cpumap") Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Reported-by: Chris Arges <carges@cloudflare.com> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Chris Arges <carges@cloudflare.com> Link: https://patch.msgid.link/175519587755.3008742.1088294435150406835.stgit@firesoul
2025-08-14rcutorture: Delay forward-progress testing until boot completesPaul E. McKenney1-0/+2
Forward-progress testing can hog CPUs, which is not a great thing to do before boot has completed. This commit therefore makes the CPU-hotplug operations hold off until boot has completed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-08-14torture: Delay CPU-hotplug operations until boot completesPaul E. McKenney1-0/+2
CPU-hotplug operations invoke stop-machine, which can hog CPUs, which is not a great thing to do before boot has completed. This commit therefore makes the CPU-hotplug operations hold off until boot has completed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-08-14rcutorture: Delay rcutorture readers and writers until boot completesPaul E. McKenney1-4/+14
The rcutorture writers and (especially) readers are the biggest CPU hogs of the bunch, so this commit therefore makes them wait until boot has completed. This makes the current setting of the boot_ended local variable dead code, so while in the area, this commit removes that as well. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-08-14rcutorture: Suppress "Writer stall state" reports during bootPaul E. McKenney1-1/+2
When rcutorture is running on only the one boot-time CPU while that CPU is busy invoking initcall() functions, the added load is quite likely to unduly delay the RCU grace-period kthread, rcutorture readers, and much else besides. This can result in rcu_torture_stats_print() reporting rcutorture writer stalls, which are not really a bug in that environment. After all, one CPU can only do so much. This commit therefore suppresses rcutorture writer stalls while the kernel is booting, that is, while rcu_inkernel_boot_has_ended() continues returning false. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-08-14torture: Announce kernel boot status at torture-test startupPaul E. McKenney1-2/+3
Sometimes a given system takes surprisingly long to boot, for example, in one recent case, 70 seconds instead of three seconds. It would be good to fix these slow-boot issues, but it would also be good for the torture tests to announce that the system was still booting at the start of the test. Especially for tests that have a greater probability of false positives when run in the single-CPU boot-time environment. Yes, those tests should defend themselves, but we should also make this situation easier to diagnose. This commit therefore causes torture_print_module_parms() to print "still booting" at the end of its printk() that dumps out the values of its module parameters. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-08-14rcu: Remove local_irq_save/restore() in rcu_preempt_deferred_qs_handler()Zqiang1-4/+1
The per-CPU rcu_data structure's ->defer_qs_iw field is initialized by IRQ_WORK_INIT_HARD(), which means that the subsequent invocation of rcu_preempt_deferred_qs_handler() will always be executed with interrupts disabled. This commit therefore removes the local_irq_save/restore() operations from rcu_preempt_deferred_qs_handler() and adds a call to lockdep_assert_irqs_disabled() in order to enable lockdep to diagnose mistaken invocations of this function from interrupts-enabled code. Signed-off-by: Zqiang <qiang.zhang@linux.dev> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-08-14Merge tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds1-0/+1
Pull networking fixes from Paolo Abeni: "Including fixes from Netfilter and IPsec. Current release - regressions: - netfilter: nft_set_pipapo: - don't return bogus extension pointer - fix null deref for empty set Current release - new code bugs: - core: prevent deadlocks when enabling NAPIs with mixed kthread config - eth: netdevsim: Fix wild pointer access in nsim_queue_free(). Previous releases - regressions: - page_pool: allow enabling recycling late, fix false positive warning - sched: ets: use old 'nbands' while purging unused classes - xfrm: - restore GSO for SW crypto - bring back device check in validate_xmit_xfrm - tls: handle data disappearing from under the TLS ULP - ptp: prevent possible ABBA deadlock in ptp_clock_freerun() - eth: - bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE - hv_netvsc: fix panic during namespace deletion with VF Previous releases - always broken: - netfilter: fix refcount leak on table dump - vsock: do not allow binding to VMADDR_PORT_ANY - sctp: linearize cloned gso packets in sctp_rcv - eth: - hibmcge: fix the division by zero issue - microchip: fix KSZ8863 reset problem" * tag 'net-6.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (54 commits) net: usb: asix_devices: add phy_mask for ax88772 mdio bus net: kcm: Fix race condition in kcm_unattach() selftests: net/forwarding: test purge of active DWRR classes net/sched: ets: use old 'nbands' while purging unused classes bnxt: fill data page pool with frags if PAGE_SIZE > BNXT_RX_PAGE_SIZE netdevsim: Fix wild pointer access in nsim_queue_free(). net: mctp: Fix bad kfree_skb in bind lookup test netfilter: nf_tables: reject duplicate device on updates ipvs: Fix estimator kthreads preferred affinity netfilter: nft_set_pipapo: fix null deref for empty set selftests: tls: test TCP stealing data from under the TLS socket tls: handle data disappearing from under the TLS ULP ptp: prevent possible ABBA deadlock in ptp_clock_freerun() ixgbe: prevent from unwanted interface name changes devlink: let driver opt out of automatic phys_port_name generation net: prevent deadlocks when enabling NAPIs with mixed kthread config net: update NAPI threaded config even for disabled NAPIs selftests: drv-net: don't assume device has only 2 queues docs: Fix name for net.ipv4.udp_child_hash_entries riscv: dts: thead: Add APB clocks for TH1520 GMACs ...
2025-08-13rcu: Document that rcu_barrier() hurries lazy callbacksPaul E. McKenney1-0/+5
This commit adds to the rcu_barrier() kerneldoc header stating that this function hurries lazy callbacks and that it does not normally result in additional RCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2025-08-13cpuset: remove redundant CS_ONLINE flagChen Ridong2-5/+2
The CS_ONLINE flag was introduced prior to the CSS_ONLINE flag in the cpuset subsystem. Currently, the flag setting sequence is as follows: 1. cpuset_css_online() sets CS_ONLINE 2. css->flags gets CSS_ONLINE set ... 3. cgroup->kill_css sets CSS_DYING 4. cpuset_css_offline() clears CS_ONLINE 5. css->flags clears CSS_ONLINE The is_cpuset_online() check currently occurs between steps 1 and 3. However, it would be equally safe to perform this check between steps 2 and 3, as CSS_ONLINE provides the same synchronization guarantee as CS_ONLINE. Since CS_ONLINE is redundant with CSS_ONLINE and provides no additional synchronization benefits, we can safely remove it to simplify the code. Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-13audit: add a missing tabDan Carpenter1-1/+1
Someone got a bit carried away deleting tabs. Add it back. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-13dma/pool: Ensure DMA_DIRECT_REMAP allocations are decryptedShanker Donthineni1-2/+2
When CONFIG_DMA_DIRECT_REMAP is enabled, atomic pool pages are remapped via dma_common_contiguous_remap() using the supplied pgprot. Currently, the mapping uses pgprot_dmacoherent(PAGE_KERNEL), which leaves the memory encrypted on systems with memory encryption enabled (e.g., ARM CCA Realms). This can cause the DMA layer to fail or crash when accessing the memory, as the underlying physical pages are not configured as expected. Fix this by requesting a decrypted mapping in the vmap() call: pgprot_decrypted(pgprot_dmacoherent(PAGE_KERNEL)) This ensures that atomic pool memory is consistently mapped unencrypted. Cc: stable@vger.kernel.org Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250811181759.998805-1-sdonthineni@nvidia.com
2025-08-13locking: Fix __clear_task_blocked_on() warning from __ww_mutex_wound() pathJohn Stultz1-1/+5
The __clear_task_blocked_on() helper added a number of sanity checks ensuring we hold the mutex wait lock and that the task we are clearing blocked_on pointer (if set) matches the mutex. However, there is an edge case in the _ww_mutex_wound() logic where we need to clear the blocked_on pointer for the task that owns the mutex, not the task that is waiting on the mutex. For this case the sanity checks aren't valid, so handle this by allowing a NULL lock to skip the additional checks. K Prateek Nayak and Maarten Lankhorst also pointed out that in this case where we don't hold the owner's mutex wait_lock, we need to be a bit more careful using READ_ONCE/WRITE_ONCE in both the __clear_task_blocked_on() and __set_task_blocked_on() implementations to avoid accidentally tripping WARN_ONs if two instances race. So do that here as well. This issue was easier to miss, I realized, as the test-ww_mutex driver only exercises the wait-die class of ww_mutexes. I've sent a patch[1] to address this so the logic will be easier to test. [1]: https://lore.kernel.org/lkml/20250801023358.562525-2-jstultz@google.com/ Fixes: a4f0b6fef4b0 ("locking/mutex: Add p->blocked_on wrappers for correctness checks") Closes: https://lore.kernel.org/lkml/68894443.a00a0220.26d0e1.0015.GAE@google.com/ Reported-by: syzbot+602c4720aed62576cd79@syzkaller.appspotmail.com Reported-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20250805001026.2247040-1-jstultz@google.com
2025-08-13ipvs: Fix estimator kthreads preferred affinityFrederic Weisbecker1-0/+1
The estimator kthreads' affinity are defined by sysctl overwritten preferences and applied through a plain call to the scheduler's affinity API. However since the introduction of managed kthreads preferred affinity, such a practice shortcuts the kthreads core code which eventually overwrites the target to the default unbound affinity. Fix this with using the appropriate kthread's API. Fixes: d1a89197589c ("kthread: Default affine kthread to its preferred NUMA node") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-08-12bpf: Replace kvfree with kfree for kzalloc memoryQianfeng Rong1-1/+1
The 'backedge' pointer is allocated with kzalloc(), which returns physically contiguous memory. Using kvfree() to deallocate such memory is functionally safe but semantically incorrect. Replace kvfree() with kfree() to avoid unnecessary is_vmalloc_addr() check in kvfree(). Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20250811123949.552885-1-rongqianfeng@vivo.com
2025-08-12bpf: Remove redundant __GFP_NOWARNQianfeng Rong2-2/+2
Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20250804122731.460158-1-rongqianfeng@vivo.com
2025-08-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfMartin KaFai Lau35-522/+748
Cross merge bpf/master after 6.17-rc1. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2025-08-12cgroup: Replace deprecated strcpy() with strscpy()Thorsten Blum1-1/+2
strcpy() is deprecated; use strscpy() instead. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-11sched/ext: Fix invalid task state transitions on class switchAndrea Righi1-0/+4
When enabling a sched_ext scheduler, we may trigger invalid task state transitions, resulting in warnings like the following (which can be easily reproduced by running the hotplug selftest in a loop): sched_ext: Invalid task state transition 0 -> 3 for fish[770] WARNING: CPU: 18 PID: 787 at kernel/sched/ext.c:3862 scx_set_task_state+0x7c/0xc0 ... RIP: 0010:scx_set_task_state+0x7c/0xc0 ... Call Trace: <TASK> scx_enable_task+0x11f/0x2e0 switching_to_scx+0x24/0x110 scx_enable.isra.0+0xd14/0x13d0 bpf_struct_ops_link_create+0x136/0x1a0 __sys_bpf+0x1edd/0x2c30 __x64_sys_bpf+0x21/0x30 do_syscall_64+0xbb/0x370 entry_SYSCALL_64_after_hwframe+0x77/0x7f This happens because we skip initialization for tasks that are already dead (with their usage counter set to zero), but we don't exclude them during the scheduling class transition phase. Fix this by also skipping dead tasks during class swiching, preventing invalid task state transitions. Fixes: a8532fac7b5d2 ("sched_ext: TASK_DEAD tasks must be switched into SCX on ops_enable") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-11futex: Use user_write_access_begin/_end() in futex_put_value()Waiman Long1-3/+3
Commit cec199c5e39b ("futex: Implement FUTEX2_NUMA") introduced the futex_put_value() helper to write a value to the given user address. However, it uses user_read_access_begin() before the write. For architectures that differentiate between read and write accesses, like PowerPC, futex_put_value() fails with -EFAULT. Fix that by using the user_write_access_begin/user_write_access_end() pair instead. Fixes: cec199c5e39b ("futex: Implement FUTEX2_NUMA") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250811141147.322261-1-longman@redhat.com
2025-08-11audit: fix typo in auditfilter.c commentKieran Moy1-1/+1
Correct the misspelling of "searching" (was "serarching") in the function documentation for audit_update_lsm_rules. Found via code inspection, no functional impact. Signed-off-by: Kieran Moy <kfatyuip@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11audit: Replace deprecated strcpy() with strscpy()Thorsten Blum1-2/+4
strcpy() is deprecated; use strscpy() instead. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11audit: fix indentation in audit_log_exit()Casey Schaufler1-3/+4
Fix two indentation errors in audit_log_exit(). Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subject tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11of: reserved_mem: Restructure call site for dma_contiguous_early_fixup()Oreoluwa Babatunde1-2/+0
Restructure the call site for dma_contiguous_early_fixup() to where the reserved_mem nodes are being parsed from the DT so that dma_mmu_remap[] is populated before dma_contiguous_remap() is called. Fixes: 8a6e02d0c00e ("of: reserved_mem: Restructure how the reserved memory regions are processed") Signed-off-by: Oreoluwa Babatunde <oreoluwa.babatunde@oss.qualcomm.com> Tested-by: William Zhang <william.zhang@broadcom.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250806172421.2748302-1-oreoluwa.babatunde@oss.qualcomm.com
2025-08-11swiotlb: Remove redundant __GFP_NOWARNQianfeng Rong1-1/+1
Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250805023222.332920-1-rongqianfeng@vivo.com
2025-08-11dma-direct: clean up the logic in __dma_direct_alloc_pages()Petr Tesarik1-18/+13
Convert a goto-based loop to a while() loop. To allow the simplification, return early when allocation from CMA is successful. As a bonus, this early return avoids a repeated dma_coherent_ok() check. No functional change. Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250710083829.1853466-1-ptesarik@suse.com
2025-08-11rcu: Fix racy re-initialization of irq_work causing hangsFrederic Weisbecker3-2/+9
RCU re-initializes the deferred QS irq work everytime before attempting to queue it. However there are situations where the irq work is attempted to be queued even though it is already queued. In that case re-initializing messes-up with the irq work queue that is about to be handled. The chances for that to happen are higher when the architecture doesn't support self-IPIs and irq work are then all lazy, such as with the following sequence: 1) rcu_read_unlock() is called when IRQs are disabled and there is a grace period involving blocked tasks on the node. The irq work is then initialized and queued. 2) The related tasks are unblocked and the CPU quiescent state is reported. rdp->defer_qs_iw_pending is reset to DEFER_QS_IDLE, allowing the irq work to be requeued in the future (note the previous one hasn't fired yet). 3) A new grace period starts and the node has blocked tasks. 4) rcu_read_unlock() is called when IRQs are disabled again. The irq work is re-initialized (but it's queued! and its node is cleared) and requeued. Which means it's requeued to itself. 5) The irq work finally fires with the tick. But since it was requeued to itself, it loops and hangs. Fix this with initializing the irq work only once before the CPU boots. Fixes: b41642c87716 ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202508071303.c1134cce-lkp@intel.com Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
2025-08-10Merge tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-4/+1
Pull smp fixes from Borislav Petkov: - Remove an obsolete comment and fix spelling * tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Remove obsolete comment from takedown_cpu() smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment
2025-08-10Merge tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+3
Pull irq fixes from Borislav Petkov: - Fix a wrong ioremap size in mvebu-gicp - Remove yet another compile-test case for a driver which needs an additional dependency - Fix a lock inversion scenario in the IRQ unit test suite - Remove an impossible flag situation in gic-v5 - Do not iounmap resources in gic-v5 which are managed by devm - Make sure stale, left-over interrupts in mvebu-gicp are cleared on driver init - Fix a reference counting mishap in msi-lib - Fix a dereference-before-null-ptr-check case in the riscv-imsic irqchip driver * tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mvebu-gicp: Use resource_size() for ioremap() irqchip: Build IMX_MU_MSI only on ARM genirq/test: Resolve irq lock inversion warnings irqchip/gic-v5: Remove IRQD_RESEND_WHEN_IN_PROGRESS for ITS IRQs irqchip/gic-v5: iwb: Fix iounmap probe failure path irqchip/mvebu-gicp: Clear pending interrupts on init irqchip/msi-lib: Fix fwnode refcount in msi_lib_irq_domain_select() irqchip/riscv-imsic: Don't dereference before NULL pointer check
2025-08-10Merge tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull locking fix from Borislav Petkov: - Prevent a futex hash leak due to different mm lifetimes * tag 'locking_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Move futex cleanup to __mmdrop()
2025-08-09cgroup: avoid null de-ref in css_rstat_exit()JP Kobryn1-0/+3
css_rstat_exit() may be called asynchronously in scenarios where preceding calls to css_rstat_init() have not completed. One such example is this sequence below: css_create(...) { ... init_and_link_css(css, ...); err = percpu_ref_init(...); if (err) goto err_free_css; err = cgroup_idr_alloc(...); if (err) goto err_free_css; err = css_rstat_init(css, ...); if (err) goto err_free_css; ... err_free_css: INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn); queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork); return ERR_PTR(err); } If any of the three goto jumps are taken, async cleanup will begin and css_rstat_exit() will be invoked on an uninitialized css->rstat_cpu. Avoid accessing the unitialized field by returning early in css_rstat_exit() if this is the case. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Suggested-by: Michal Koutný <mkoutny@suse.com> Fixes: 5da3bfa029d68 ("cgroup: use separate rstat trees for each subsystem") Cc: stable@vger.kernel.org # v6.16 Reported-by: syzbot+8d052e8b99e40bc625ed@syzkaller.appspotmail.com Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-09cgroup/cpuset: Remove the unnecessary css_get/put() in cpuset_partition_write()Waiman Long1-2/+0
The css_get/put() calls in cpuset_partition_write() are unnecessary as an active reference of the kernfs node will be taken which will prevent its removal and guarantee the existence of the css. Only the online check is needed. Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-09cgroup/cpuset: Fix a partition error with CPU hotplugWaiman Long1-3/+4
It was found during testing that an invalid leaf partition with an empty effective exclusive CPU list can become a valid empty partition with no CPU afer an offline/online operation of an unrelated CPU. An empty partition root is allowed in the special case that it has no task in its cgroup and has distributed out all its CPUs to its child partitions. That is certainly not the case here. The problem is in the cpumask_subsets() test in the hotplug case (update with no new mask) of update_parent_effective_cpumask() as it also returns true if the effective exclusive CPU list is empty. Fix that by addding the cpumask_empty() test to root out this exception case. Also add the cpumask_empty() test in cpuset_hotplug_update_tasks() to avoid calling update_parent_effective_cpumask() for this special case. Fixes: 0c7f293efc87 ("cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-09cgroup/cpuset: Use static_branch_enable_cpuslocked() on cpusets_insane_config_keyWaiman Long1-1/+1
The following lockdep splat was observed. [ 812.359086] ============================================ [ 812.359089] WARNING: possible recursive locking detected [ 812.359097] -------------------------------------------- [ 812.359100] runtest.sh/30042 is trying to acquire lock: [ 812.359105] ffffffffa7f27420 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0xe/0x20 [ 812.359131] [ 812.359131] but task is already holding lock: [ 812.359134] ffffffffa7f27420 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_write_resmask+0x98/0xa70 : [ 812.359267] Call Trace: [ 812.359272] <TASK> [ 812.359367] cpus_read_lock+0x3c/0xe0 [ 812.359382] static_key_enable+0xe/0x20 [ 812.359389] check_insane_mems_config.part.0+0x11/0x30 [ 812.359398] cpuset_write_resmask+0x9f2/0xa70 [ 812.359411] cgroup_file_write+0x1c7/0x660 [ 812.359467] kernfs_fop_write_iter+0x358/0x530 [ 812.359479] vfs_write+0xabe/0x1250 [ 812.359529] ksys_write+0xf9/0x1d0 [ 812.359558] do_syscall_64+0x5f/0xe0 Since commit d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order"), the ordering of cpu hotplug lock and cpuset_mutex had been reversed. That patch correctly used the cpuslocked version of the static branch API to enable cpusets_pre_enable_key and cpusets_enabled_key, but it didn't do the same for cpusets_insane_config_key. The cpusets_insane_config_key can be enabled in the check_insane_mems_config() which is called from update_nodemask() or cpuset_hotplug_update_tasks() with both cpu hotplug lock and cpuset_mutex held. Deadlock can happen with a pending hotplug event that tries to acquire the cpu hotplug write lock which will block further cpus_read_lock() attempt from check_insane_mems_config(). Fix that by switching to use static_branch_enable_cpuslocked(). Fixes: d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-08-09Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds1-0/+3
Pull bpf fixes from Alexei Starovoitov: - Fix memory leak of bpf_scc_info objects (Eduard Zingerman) - Fix a regression in the 'perf' tool caused by moving UID filtering to BPF (Ilya Leoshkevich) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: perf bpf-filter: Enable events manually libbpf: Add the ability to suppress perf event enablement bpf: Fix memory leak of bpf_scc_info objects
2025-08-07bpf: use realloc in bpf_patch_insn_dataEduard Zingerman1-14/+15
Avoid excessive vzalloc/vfree calls when patching instructions in do_misc_fixups(). bpf_patch_insn_data() uses vzalloc to allocate new memory for env->insn_aux_data for each patch as follows: struct bpf_prog *bpf_patch_insn_data(env, ...) { ... new_data = vzalloc(... O(program size) ...); ... adjust_insn_aux_data(env, new_data, ...); ... } void adjust_insn_aux_data(env, new_data, ...) { ... memcpy(new_data, env->insn_aux_data); vfree(env->insn_aux_data); env->insn_aux_data = new_data; ... } The vzalloc/vfree pair is hot in perf report collected for e.g. pyperf180 test case. It can be replaced with a call to vrealloc in order to reduce the number of actual memory allocations. This is a stop-gap solution, as bpf_patch_insn_data is still hot in the profile. More comprehansive solutions had been discussed before e.g. as in [1]. [1] https://lore.kernel.org/bpf/CAEf4BzY_E8MSL4mD0UPuuiDcbJhh9e2xQo2=5w+ppRWWiYSGvQ@mail.gmail.com/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Tested-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20250807010205.3210608-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-07bpf: removed unused 'env' parameter from is_reg64 and insn_has_def32Eduard Zingerman1-7/+7
Parameter 'env' is not used by is_reg64() and insn_has_def32() functions. Remove the parameter to make it clear that neither function depends on 'env' state, e.g. env->insn_aux_data. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250807010205.3210608-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-06cpu: Remove obsolete comment from takedown_cpu()Waiman Long1-3/+0
takedown_cpu() has a comment about "all preempt/rcu users must observe !cpu_active()" which is kind of meaningless in this function. This comment was originally introduced by commit 6acce3ef8452 ("sched: Remove get_online_cpus() usage") when _cpu_down() was setting cpu_active_mask and synchronize_rcu()/synchronize_sched() were added after that. Later commit 40190a78f85f ("sched/hotplug: Convert cpu_[in]active notifiers to state machine") added a new CPUHP_AP_ACTIVE hotplug state to set/clear cpu_active_mask. The following commit b2454caa8977 ("sched/hotplug: Move sync_rcu to be with set_cpu_active(false)") move the synchronize_*() calls to sched_cpu_deactivate() associated with the new hotplug state, but left the comment behind. Remove this comment as it is no longer relevant in takedown_cpu(). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250729191232.664931-1-longman@redhat.com
2025-08-06bpf: Allow struct_ops to get map id by kdataAmery Hung1-0/+12
Add bpf_struct_ops_id() to enable struct_ops implementors to use struct_ops map id as the unique id of a struct_ops in their subsystem. A subsystem that wishes to create a mapping between id and struct_ops instance pointer can update the mapping accordingly during bpf_struct_ops::reg(), unreg(), and update(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250806162540.681679-2-ameryhung@gmail.com
2025-08-06genirq/test: Resolve irq lock inversion warningsBrian Norris1-1/+3
irq_shutdown_and_deactivate() is normally called with the descriptor lock held, and interrupts disabled. Nested a few levels down, it grabs the global irq_resend_lock. Lockdep rightfully complains when interrupts are not disabled: CPU0 CPU1 ---- ---- lock(irq_resend_lock); local_irq_disable(); lock(&irq_desc_lock_class); lock(irq_resend_lock); <Interrupt> lock(&irq_desc_lock_class); ... _raw_spin_lock+0x2b/0x40 clear_irq_resend+0x14/0x70 irq_shutdown_and_deactivate+0x29/0x80 irq_shutdown_depth_test+0x1ce/0x600 kunit_try_run_case+0x90/0x120 Grab the descriptor lock and disable interrupts, to resolve the problem. Fixes: 66067c3c8a1e ("genirq: Add kunit tests for depth counts") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/all/aJJONEIoIiTSDMqc@google.com Closes: https://lore.kernel.org/lkml/31a761e4-8f81-40cf-aaf5-d220ba11911c@roeck-us.net/
2025-08-06Merge tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds3-79/+69
Pull Kbuild updates from Masahiro Yamada: "This is the last pull request from me. I'm grateful to have been able to continue as a maintainer for eight years. From the next cycle, Nathan and Nicolas will maintain Kbuild. - Fix a shortcut key issue in menuconfig - Fix missing rebuild of kheaders - Sort the symbol dump generated by gendwarfsyms - Support zboot extraction in scripts/extract-vmlinux - Migrate gconfig to GTK 3 - Add TAR variable to allow overriding the default tar command - Hand over Kbuild maintainership" * tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (92 commits) MAINTAINERS: hand over Kbuild maintenance kheaders: make it possible to override TAR kbuild: userprogs: use correct linker when mixing clang and GNU ld kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c kconfig: lxdialog: replace strcpy with snprintf in print_autowrap kconfig: gconf: refactor text_insert_help() kconfig: gconf: remove unneeded variable in text_insert_msg kconfig: gconf: use hyphens in signals kconfig: gconf: replace GtkImageMenuItem with GtkMenuItem kconfig: gconf: Fix Back button behavior kconfig: gconf: fix single view to display dependent symbols correctly scripts: add zboot support to extract-vmlinux gendwarfksyms: order -T symtypes output by name gendwarfksyms: use preferred form of sizeof for allocation kconfig: qconf: confine {begin,end}Group to constructor and destructor kconfig: qconf: fix ConfigList::updateListAllforAll() kconfig: add a function to dump all menu entries in a tree-like format kconfig: gconf: show GTK version in About dialog kconfig: gconf: replace GtkHPaned and GtkVPaned with GtkPaned kconfig: gconf: replace GdkColor with GdkRGBA ...
2025-08-06Merge tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.gitLinus Torvalds1-8/+28
Pull perf fixes from Thomas Gleixner: "Perf fixes for perf_mmap() reference counting to prevent potential reference count leaks which are caused by: - VMA splits, which change the offset or size of a mapping, which causes perf_mmap_close() to ignore the unmap or unmap the wrong buffer. - Several internal issues of perf_mmap(), which can cause reference count leaks in the perf mmap, corrupt accounting or cause leaks in perf drivers. The main fix is to prevent VMA splits by implementing the [may_]split() callback for vm operations. The other issues are addressed by rearranging code, early returns on failure and invocation of cleanups. Also provide a selftest to validate the fixes. The reference counting should be converted to refcount_t, but that requires larger refactoring of the code and will be done once these fixes are upstream" * tag 'perf-fixes-27504' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git: selftests/perf_events: Add a mmap() correctness test perf/core: Prevent VMA split of buffer mappings perf/core: Handle buffer mapping fail correctly in perf_mmap() perf/core: Exit early on perf_mmap() fail perf/core: Don't leak AUX buffer refcount on allocation failure perf/core: Preserve AUX buffer allocation failure result
2025-08-06kheaders: make it possible to override TARMichał Górny1-3/+3
Commit 86cdd2fdc4e3 ("kheaders: make headers archive reproducible") introduced a number of options specific to GNU tar to the `tar` invocation in `gen_kheaders.sh` script. This causes the script to fail to work on systems where `tar` is not GNU tar. This can occur e.g. on recent Gentoo Linux installations that support using bsdtar from libarchive instead. Add a `TAR` make variable to make it possible to override the tar executable used, e.g. by specifying: make TAR=gtar Link: https://bugs.gentoo.org/884061 Reported-by: Sam James <sam@gentoo.org> Tested-by: Sam James <sam@gentoo.org> Co-developed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michał Górny <mgorny@gentoo.org> Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>