aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-25Disable FOP_DONTCACHE for now due to bugsLinus Torvalds1-1/+1
This is kind of last-minute, but Al Viro reported that the new FOP_DONTCACHE flag causes memory corruption due to use-after-free issues. This was triggered by commit 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE"), but that is not the underlying bug - it is just the first user of the flag. Vlastimil Babka suspects the underlying problem stems from the folio_end_writeback() logic introduced in commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached pages when writeback completes"). The most straightforward fix would be to just revert the commit that exposed this, but Matthew Wilcox points out that other filesystems are also starting to enable the FOP_DONTCACHE logic, so this instead disables that bit globally for now. The fix will hopefully end up being trivial and we can just re-enable this logic after more testing, but until such a time we'll have to disable the new FOP_DONTCACHE flag. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20250525083209.GS2023217@ZenIV/ Triggered-by: 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE") Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-05-25mailmap: add Jarkko's employer email addressJarkko Sakkinen1-0/+1
Add the current employer email address to mailmap. Link: https://lkml.kernel.org/r/20250523121105.15850-1-jarkko@kernel.org Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com> Cc: Antonio Quartulli <antonio@openvpn.net> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: fix copy_vma() error handling for hugetlb mappingsRicardo Cañuelo Navarro4-3/+22
If, during a mremap() operation for a hugetlb-backed memory mapping, copy_vma() fails after the source vma has been duplicated and opened (ie. vma_link() fails), the error is handled by closing the new vma. This updates the hugetlbfs reservation counter of the reservation map which at this point is referenced by both the source vma and the new copy. As a result, once the new vma has been freed and copy_vma() returns, the reservation counter for the source vma will be incorrect. This patch addresses this corner case by clearing the hugetlb private page reservation reference for the new vma and decrementing the reference before closing the vma, so that vma_close() won't update the reservation counter. This is also what copy_vma_and_data() does with the source vma if copy_vma() succeeds, so a helper function has been added to do the fixup in both functions. The issue was reported by a private syzbot instance and can be reproduced using the C reproducer in [1]. It's also a possible duplicate of public syzbot report [2]. The WARNING report is: ============================================================ page_counter underflow: -1024 nr_pages=1024 WARNING: CPU: 0 PID: 3287 at mm/page_counter.c:61 page_counter_cancel+0xf6/0x120 Modules linked in: CPU: 0 UID: 0 PID: 3287 Comm: repro__WARNING_ Not tainted 6.15.0-rc7+ #54 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:page_counter_cancel+0xf6/0x120 Code: ff 5b 41 5e 41 5f 5d c3 cc cc cc cc e8 f3 4f 8f ff c6 05 64 01 27 06 01 48 c7 c7 60 15 f8 85 48 89 de 4c 89 fa e8 2a a7 51 ff <0f> 0b e9 66 ff ff ff 44 89 f9 80 e1 07 38 c1 7c 9d 4c 81 RSP: 0018:ffffc900025df6a0 EFLAGS: 00010246 RAX: 2edfc409ebb44e00 RBX: fffffffffffffc00 RCX: ffff8880155f0000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff81c4a23c R09: 1ffff1100330482a R10: dffffc0000000000 R11: ffffed100330482b R12: 0000000000000000 R13: ffff888058a882c0 R14: ffff888058a882c0 R15: 0000000000000400 FS: 0000000000000000(0000) GS:ffff88808fc53000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b33e0 CR3: 00000000076d6000 CR4: 00000000000006f0 Call Trace: <TASK> page_counter_uncharge+0x33/0x80 hugetlb_cgroup_uncharge_counter+0xcb/0x120 hugetlb_vm_op_close+0x579/0x960 ? __pfx_hugetlb_vm_op_close+0x10/0x10 remove_vma+0x88/0x130 exit_mmap+0x71e/0xe00 ? __pfx_exit_mmap+0x10/0x10 ? __mutex_unlock_slowpath+0x22e/0x7f0 ? __pfx_exit_aio+0x10/0x10 ? __up_read+0x256/0x690 ? uprobe_clear_state+0x274/0x290 ? mm_update_next_owner+0xa9/0x810 __mmput+0xc9/0x370 exit_mm+0x203/0x2f0 ? __pfx_exit_mm+0x10/0x10 ? taskstats_exit+0x32b/0xa60 do_exit+0x921/0x2740 ? do_raw_spin_lock+0x155/0x3b0 ? __pfx_do_exit+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? _raw_spin_lock_irq+0xc5/0x100 do_group_exit+0x20c/0x2c0 get_signal+0x168c/0x1720 ? __pfx_get_signal+0x10/0x10 ? schedule+0x165/0x360 arch_do_signal_or_restart+0x8e/0x7d0 ? __pfx_arch_do_signal_or_restart+0x10/0x10 ? __pfx___se_sys_futex+0x10/0x10 syscall_exit_to_user_mode+0xb8/0x2c0 do_syscall_64+0x75/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x422dcd Code: Unable to access opcode bytes at 0x422da3. RSP: 002b:00007ff266cdb208 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: 0000000000000001 RBX: 00007ff266cdbcdc RCX: 0000000000422dcd RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004c7bec RBP: 00007ff266cdb220 R08: 203a6362696c6720 R09: 203a6362696c6720 R10: 0000200000c00000 R11: 0000000000000246 R12: ffffffffffffffd0 R13: 0000000000000002 R14: 00007ffe1cb5f520 R15: 00007ff266cbb000 </TASK> ============================================================ Link: https://lkml.kernel.org/r/20250523-warning_in_page_counter_cancel-v2-1-b6df1a8cfefd@igalia.com Link: https://people.igalia.com/rcn/kernel_logs/20250422__WARNING_in_page_counter_cancel__repro.c [1] Link: https://lore.kernel.org/all/67000a50.050a0220.49194.048d.GAE@google.com/ [2] Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Florent Revest <revest@google.com> Cc: Jann Horn <jannh@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25memcg: always call cond_resched() after fn()Breno Leitao1-4/+2
I am seeing soft lockup on certain machine types when a cgroup OOMs. This is happening because killing the process in certain machine might be very slow, which causes the soft lockup and RCU stalls. This happens usually when the cgroup has MANY processes and memory.oom.group is set. Example I am seeing in real production: [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) .... .... [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) .... [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982] .... Quick look at why this is so slow, it seems to be related to serial flush for certain machine types. For all the crashes I saw, the target CPU was at console_flush_all(). In the case above, there are thousands of processes in the cgroup, and it is soft locking up before it reaches the 1024 limit in the code (which would call the cond_resched()). So, cond_resched() in 1024 blocks is not sufficient. Remove the counter-based conditional rescheduling logic and call cond_resched() unconditionally after each task iteration, after fn() is called. This avoids the lockup independently of how slow fn() is. Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Rik van Riel <riel@surriel.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Michael van der Westhuizen <rmikey@meta.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb foliosGe Yang1-0/+8
A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: vmalloc: only zero-init on vrealloc shrinkKees Cook1-5/+7
The common case is to grow reallocations, and since init_on_alloc will have already zeroed the whole allocation, we only need to zero when shrinking the allocation. Link: https://lkml.kernel.org/r/20250515214217.619685-2-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: Shung-Hsi Yu <shung-hsi.yu@suse.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: vmalloc: actually use the in-place vrealloc regionKees Cook1-0/+1
Patch series "mm: vmalloc: Actually use the in-place vrealloc region". This fixes a performance regression[1] with vrealloc()[1]. The refactoring to not build a new vmalloc region only actually worked when shrinking. Actually return the resized area when it grows. Ugh. Link: https://lkml.kernel.org/r/20250515214217.619685-1-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Closes: https://lore.kernel.org/all/20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg [1] Tested-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25alloc_tag: allocate percpu counters for module tags dynamicallySuren Baghdasaryan5-28/+88
When a module gets unloaded it checks whether any of its tags are still in use and if so, we keep the memory containing module's allocation tags alive until all tags are unused. However percpu counters referenced by the tags are freed by free_module(). This will lead to UAF if the memory allocated by a module is accessed after module was unloaded. To fix this we allocate percpu counters for module allocation tags dynamically and we keep it alive for tags which are still in use after module unloading. This also removes the requirement of a larger PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because percpu memory for counters does not need to be reserved anymore. Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Tested-by: David Wang <00107082@163.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25module: release codetag section when module load failsDavid Wang1-0/+1
When module load fails after memory for codetag section is ready, codetag section memory will not be properly released. This causes memory leak, and if next module load happens to get the same module address, codetag may pick the uninitialized section when manipulating tags during module unload, and leads to "unable to handle page fault" BUG. Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Signed-off-by: David Wang <00107082@163.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/cma: make detection of highmem_start more robustMike Rapoport (Microsoft)1-1/+4
Pratyush Yadav reports the following crash: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/physaddr.c:23! ception 0x06 IP 10:ffffffff812ebbf8 error 0 cr2 0xffff88903ffff000 CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc6+ #231 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:__phys_addr+0x58/0x60 Code: 01 48 89 c2 48 d3 ea 48 85 d2 75 05 e9 91 52 cf 00 0f 0b 48 3d ff ff ff 1f 77 0f 48 8b 05 20 54 55 01 48 01 d0 e9 78 52 cf 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 0000:ffffffff82803dd8 EFLAGS: 00010006 ORIG_RAX: 0000000000000000 RAX: 000000007fffffff RBX: 00000000ffffffff RCX: 0000000000000000 RDX: 000000007fffffff RSI: 0000000280000000 RDI: ffffffffffffffff RBP: ffffffff82803e68 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff83153180 R11: ffffffff82803e48 R12: ffffffff83c9aed0 R13: 0000000000000000 R14: 0000001040000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88903ffff000 CR3: 0000000002838000 CR4: 00000000000000b0 Call Trace: <TASK> ? __cma_declare_contiguous_nid+0x6e/0x340 ? cma_declare_contiguous_nid+0x33/0x70 ? dma_contiguous_reserve_area+0x2f/0x70 ? setup_arch+0x6f1/0x870 ? start_kernel+0x52/0x4b0 ? x86_64_start_reservations+0x29/0x30 ? x86_64_start_kernel+0x7c/0x80 ? common_startup_64+0x13e/0x141 The reason is that __cma_declare_contiguous_nid() does: highmem_start = __pa(high_memory - 1) + 1; If dma_contiguous_reserve_area() (or any other CMA declaration) is called before free_area_init(), high_memory is uninitialized. Without CONFIG_DEBUG_VIRTUAL, it will likely work but use the wrong value for highmem_start. The issue occurs because commit e120d1bc12da ("arch, mm: set high_memory in free_area_init()") moved initialization of high_memory after the call to dma_contiguous_reserve() -> __cma_declare_contiguous_nid() on several architectures. In the case CONFIG_HIGHMEM is enabled, some architectures that actually support HIGHMEM (arm, powerpc and x86) have initialization of high_memory before a possible call to __cma_declare_contiguous_nid() and some initialized high_memory late anyway (arc, csky, microblase, mips, sparc, xtensa) even before the commit e120d1bc12da so they are fine with using uninitialized value of high_memory. And in the case CONFIG_HIGHMEM is disabled high_memory essentially becomes the first address after memory end, so instead of relying on high_memory to calculate highmem_start use memblock_end_of_DRAM() and eliminate the dependency of CMA area creation on high_memory in majority of configurations. Link: https://lkml.kernel.org/r/20250519171805.1288393-1-rppt@kernel.org Fixes: e120d1bc12da ("arch, mm: set high_memory in free_area_init()") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reported-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-23selftests: ublk: add test for UBLK_F_QUIESCEMing Lei6-5/+115
Add test generic_11 for covering new control command of UBLK_U_CMD_QUIESCE_DEV. Add 'quiesce -n dev_id' sub-command on ublk utility for transitioning device state to quiesce states, then verify the feature via generic_10 by doing quiesce and recovery. Cc: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23ublk: add feature UBLK_F_QUIESCEMing Lei2-1/+142
Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV` for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED` or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server cooperation. This feature can help to support to upgrade ublk server application by shutting down ublk server gracefully, meantime keep ublk block device persistent during the upgrading period. The feature is only available for UBLK_F_USER_RECOVERY. Suggested-by: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZEMing Lei5-1/+93
Add test generic_10 for covering new control command of UBLK_U_CMD_UPDATE_SIZE. Add 'update_size -s|--size size_in_bytes' sub-command on ublk utility for supporting this feature, then verify the feature via generic_10. Cc: Omri Mann <omri@nvidia.com> Cc: Jared Holzman <jholzman@nvidia.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23traceevent/block: Add REQ_ATOMIC flag to block trace eventsRitesh Harjani (IBM)2-1/+3
Filesystems like XFS can implement atomic write I/O using either REQ_ATOMIC flag set in the bio or via CoW operation. It will be useful if we have a flag in trace events to distinguish between the two. This patch adds char 'U' (Untorn writes) to rwbs field of the trace events if REQ_ATOMIC flag is set in the bio. <W/ REQ_ATOMIC> ================= xfs_io-4238 [009] ..... 4148.126843: block_rq_issue: 259,0 WFSU 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [009] d.h1. 4148.129864: block_rq_complete: 259,0 WFSU () 768 + 32 none,0,0 [0] <W/O REQ_ATOMIC> =============== xfs_io-4237 [010] ..... 4143.325616: block_rq_issue: 259,0 WS 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [010] d.H1. 4143.329138: block_rq_complete: 259,0 WS () 768 + 32 none,0,0 [0] Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/44317cb2ec4588f6a2c1501a96684e6a1196e8ba.1747921498.git.ritesh.list@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/cmd: warn on reg buf imports by ineligible cmdsPavel Begunkov1-0/+6
For IORING_URING_CMD_FIXED-less commands io_uring doesn't pull buf_index from the sqe, so imports might succeed if the index coincide, e.g. when it's 0, but otherwise it's error prone. Warn if someone tries to import without the flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/a1c2c88e53c3fe96978f23d50c6bc66c2c79c337.1747991070.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23statmount: update STATMOUNT_SUPPORTED macroDmitry V. Levin1-1/+3
According to commit 8f6116b5b77b ("statmount: add a new supported_mask field"), STATMOUNT_SUPPORTED macro shall be updated whenever a new flag is added. Fixes: 7a54947e727b ("Merge patch series "fs: allow changing idmappings"") Signed-off-by: "Dmitry V. Levin" <ldv@strace.io> Link: https://lore.kernel.org/20250511224953.GA17849@strace.io Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23fs: convert mount flags to enumStephen Brennan1-42/+45
In prior kernel versions (5.8-6.8), commit 9f6c61f96f2d9 ("proc/mounts: add cursor") introduced MNT_CURSOR, a flag used by readers from /proc/mounts to keep their place while reading the file. Later, commit 2eea9ce4310d8 ("mounts: keep list of mounts in an rbtree") removed this flag and its value has since been repurposed. For debuggers iterating over the list of mounts, cursors should be skipped as they are irrelevant. Detecting whether an element is a cursor can be difficult. Since the MNT_CURSOR flag is a preprocessor constant, it's not present in debuginfo, and since its value is repurposed, we cannot hard-code it. For this specific issue, cursors are possible to detect in other ways, but ideally, we would be able to read the mount flag definitions out of the debuginfo. For that reason, convert the mount flags to an enum. Link: https://github.com/osandov/drgn/pull/496 Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Link: https://lore.kernel.org/20250507223402.2795029-1-stephen.s.brennan@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23->mnt_devname is never NULLAl Viro2-14/+11
Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()" back in 2018. Get rid of the dead checks... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/20250421033509.GV2023217@ZenIV Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23mount: add a comment about concurrent changes with statmount()/listmount()Christian Brauner1-3/+23
Add some comments in there highlighting a few non-obvious assumptions. Link: https://lore.kernel.org/20250416-zerknirschen-aluminium-14a55639076f@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23io_uring/io-wq: only create a new worker if it can make progressJens Axboe1-0/+28
Hashed work is serialized by io-wq, intended to be used for cases like serializing buffered writes to a regular file, where the file system will serialize the workers anyway with a mutex or similar. Since they would be forcibly serialized and blocked, it's more efficient for io-wq to handle these individually rather than issue them in parallel. If a worker is currently handling a hashed work item and gets blocked, don't create a new worker if the next work item is also hashed and mapped to the same bucket. That new worker would not be able to make any progress anyway. Reported-by: Fengnan Chang <changfengnan@bytedance.com> Reported-by: Diangang Li <lidiangang@bytedance.com> Link: https://lore.kernel.org/io-uring/20250522090909.73212-1-changfengnan@bytedance.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/io-wq: ignore non-busy worker going to sleepJens Axboe1-0/+2
When an io-wq worker goes to sleep, it checks if there's work to do. If there is, it'll create a new worker. But if this worker is currently idle, it'll either get woken right back up immediately, or someone else has already created the necessary worker to handle this work. Only go through the worker creation logic if the current worker is currently handling a work item. That means it's being scheduled out as part of handling that work, not just going to sleep on its own. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/io-wq: move hash helpers to the topJens Axboe1-10/+10
Just in preparation for using them higher up in the function, move these generic helpers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI IDTodd Brandt1-1/+2
The ARL requires that the GMA and NPU devices both be in D3Hot in order for PC10 and S0iX to be achieved in S2idle. The original ARL-H/U addition to the intel_pmc_core driver attempted to do this by switching them to D3 in the init and resume calls of the intel_pmc_core driver. The problem is the ARL-H/U have a different NPU device and thus are not being properly set and thus S0iX does not work properly in ARL-H/U. This patch creates a new ARL-H specific device id that is correct and also adds the D3 fixup to the suspend callback. This way if the PCI devies drop from D3 to D0 after resume they can be corrected for the next suspend. Thus there is no dropout in S0iX. Fixes: bd820906ea9d ("platform/x86/intel/pmc: Add Arrow Lake U/H support to intel_pmc_core driver") Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Link: https://lore.kernel.org/r/a61f78be45c13f39e122dcc684b636f4b21e79a0.1747737446.git.todd.e.brandt@intel.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-23mips, net: ensure that SOCK_COREDUMP is definedChristian Brauner2-11/+1
For historical reasons mips has to override the socket enum values but the defines are all the same. So simply move the ARCH_HAS_SOCKET_TYPES scope. Fixes: a9194f88782a ("coredump: add coredump socket") Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-22drm/xe/ptl: Update the PTL pci id tableMatt Atwood1-0/+4
Update to current bspec table. Bspec: 72574 Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://lore.kernel.org/r/20250520195749.371748-1-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 49c6dc74b5968885f421f9f1b45eb4890b955870) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-22drm/xe: Use xe_mmio_read32() to read mtcfg registerShuicheng Lin1-5/+5
The mtcfg register is a 32-bit register and should therefore be accessed using xe_mmio_read32(). Other 3 changes per codestyle suggestion: " xe_mmio.c:83: CHECK: Alignment should match open parenthesis xe_mmio.c:131: CHECK: Comparison to NULL could be written "!xe->mmio.regs" xe_mmio.c:315: CHECK: line length of 103 exceeds 100 columns " Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250513153010.3464767-1-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit d2662cf8f44a68deb6c76ad9f1d9f29dbf7ba601) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-22drm/xe/mocs: Check if all domains awakeTejas Upadhyay1-5/+6
Check if all domains are awake specially for LNCF regs Fixes: 298661cd9cea ("drm/xe: Fix MOCS debugfs LNCF readout") Improvements-suggested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250506142300.1865783-1-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit a383cf218ef8bb35d4c03958bd956573b65cf778) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-05-22Revert "drm/amd: Keep display off while going into S4"Mario Limonciello1-5/+0
commit 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") attempted to keep displays off during the S4 sequence by not resuming display IP. This however leads to hangs because DRM clients such as the console can try to access registers and cause a hang. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4155 Fixes: 68bfdc8dc0a1a ("drm/amd: Keep display off while going into S4") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250522141328.115095-1-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e485502c37b097b0bd773baa7e2741bf7bd2909a) Cc: stable@vger.kernel.org
2025-05-22trace/io_uring: fix io_uring_local_work_run ctx documentationCaleb Sander Mateos1-1/+1
The comment for the tracepoint io_uring_local_work_run refers to a field "tctx" and a type "io_uring_ctx", neither of which exist. "tctx" looks to mean "ctx" and "io_uring_ctx" should be "io_ring_ctx". Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250522150451.2385652-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22ublk: run auto buf unregisgering in same io_ring_ctx with registeringMing Lei2-4/+21
UBLK_F_AUTO_BUF_REG requires that the buffer registered automatically is unregistered in same `io_ring_ctx`, so check it explicitly. Document this requirement for UBLK_F_AUTO_BUF_REG. Drop WARN_ON_ONCE() which is triggered from userspace code path. Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG") Reported-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522152043.399824-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22io_uring: add helper io_uring_cmd_ctx_handle()Ming Lei1-0/+9
Add helper io_uring_cmd_ctx_handle() for driver to track per-context resource, such as registered kernel io buffer. Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Link: https://lore.kernel.org/r/20250522152043.399824-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22spi: spi-fsl-dspi: Reset SR flags before sending a new messageLarisa Grigore1-0/+2
If, in a previous transfer, the controller sends more data than expected by the DSPI target, SR.RFDF (RX FIFO is not empty) will remain asserted. When flushing the FIFOs at the beginning of a new transfer (writing 1 into MCR.CLR_TXF and MCR.CLR_RXF), SR.RFDF should also be cleared. Otherwise, when running in target mode with DMA, if SR.RFDF remains asserted, the DMA callback will be fired before the controller sends any data. Take this opportunity to reset all Status Register fields. Fixes: 5ce3cc567471 ("spi: spi-fsl-dspi: Provide support for DSPI slave mode operation (Vybryd vf610)") Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-3-bea884630cfb@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-22spi: spi-fsl-dspi: Halt the module after a new message transferBogdan-Gabriel Roman1-0/+24
The XSPI mode implementation in this driver still uses the EOQ flag to signal the last word in a transmission and deassert the PCS signal. However, at speeds lower than ~200kHZ, the PCS signal seems to remain asserted even when SR[EOQF] = 1 indicates the end of a transmission. This is a problem for target devices which require the deassertation of the PCS signal between transfers. Hence, this commit 'forces' the deassertation of the PCS by stopping the module through MCR[HALT] after completing a new transfer. According to the reference manual, the module stops or transitions from the Running state to the Stopped state after the current frame, when any one of the following conditions exist: - The value of SR[EOQF] = 1. - The chip is in Debug mode and the value of MCR[FRZ] = 1. - The value of MCR[HALT] = 1. This shouldn't be done if the last transfer in the message has cs_change set. Fixes: ea93ed4c181b ("spi: spi-fsl-dspi: Use EOQ for last word in buffer even for XSPI mode") Signed-off-by: Bogdan-Gabriel Roman <bogdan-gabriel.roman@nxp.com> Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-2-bea884630cfb@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-22spi: spi-fsl-dspi: restrict register range for regmap accessLarisa Grigore1-1/+19
DSPI registers are NOT continuous, some registers are reserved and accessing them from userspace will trigger external abort, add regmap register access table to avoid below abort. For example on S32G: # cat /sys/kernel/debug/regmap/401d8000.spi/registers Internal error: synchronous external abort: 96000210 1 PREEMPT SMP ... Call trace: regmap_mmio_read32le+0x24/0x48 regmap_mmio_read+0x48/0x70 _regmap_bus_reg_read+0x38/0x48 _regmap_read+0x68/0x1b0 regmap_read+0x50/0x78 regmap_read_debugfs+0x120/0x338 Fixes: 1acbdeb92c87 ("spi/fsl-dspi: Convert to use regmap and add big-endian support") Co-developed-by: Xulin Sun <xulin.sun@windriver.com> Signed-off-by: Xulin Sun <xulin.sun@windriver.com> Signed-off-by: Larisa Grigore <larisa.grigore@nxp.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://patch.msgid.link/20250522-james-nxp-spi-v2-1-bea884630cfb@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-22spi: use container_of_cont() for to_spi_device()Greg Kroah-Hartman1-4/+1
Some places in the spi core pass in a const pointer to a device and the default container_of() casts that away, which is not a good idea. Preserve the proper const attribute by using container_of_const() for to_spi_device() instead, which is what it was designed for. Note, this removes the NULL check for a device pointer in the call, but no one was ever checking for that return value, and a device pointer should never be NULL overall anyway, so this should be a safe change. Cc: Mark Brown <broonie@kernel.org> Fixes: d69d80484598 ("driver core: have match() callback in struct bus_type take a const *") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/2025052230-fidgeting-stooge-66f5@gregkh Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-22ublk: remove io argument from ublk_auto_buf_reg_fallback()Caleb Sander Mateos1-2/+2
The argument has been unused since the function was added, so remove it. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250521160720.1893326-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-22octeontx2-af: Fix APR entry mapping based on APR_LMT_CFGGeetha sowjanya2-6/+14
The current implementation maps the APR table using a fixed size, which can lead to incorrect mapping when the number of PFs and VFs varies. This patch corrects the mapping by calculating the APR table size dynamically based on the values configured in the APR_LMT_CFG register, ensuring accurate representation of APR entries in debugfs. Fixes: 0daa55d033b0 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table"). Signed-off-by: Geetha sowjanya <gakula@marvell.com> Link: https://patch.msgid.link/20250521060834.19780-3-gakula@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22octeontx2-af: Set LMT_ENA bit for APR table entriesSubbaraya Sundeep1-2/+13
This patch enables the LMT line for a PF/VF by setting the LMT_ENA bit in the APR_LMT_MAP_ENTRY_S structure. Additionally, it simplifies the logic for calculating the LMTST table index by consistently using the maximum number of hw supported VFs (i.e., 256). Fixes: 873a1e3d207a ("octeontx2-af: cn10k: Setting up lmtst map table"). Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Geetha sowjanya <gakula@marvell.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250521060834.19780-2-gakula@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_doneWang Liang1-0/+5
Syzbot reported a slab-use-after-free with the following call trace: ================================================================== BUG: KASAN: slab-use-after-free in tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840 Read of size 8 at addr ffff88807a733000 by task kworker/1:0/25 Call Trace: kasan_report+0xd9/0x110 mm/kasan/report.c:601 tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840 crypto_request_complete include/crypto/algapi.h:266 aead_request_complete include/crypto/internal/aead.h:85 cryptd_aead_crypt+0x3b8/0x750 crypto/cryptd.c:772 crypto_request_complete include/crypto/algapi.h:266 cryptd_queue_worker+0x131/0x200 crypto/cryptd.c:181 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 Allocated by task 8355: kzalloc_noprof include/linux/slab.h:778 tipc_crypto_start+0xcc/0x9e0 net/tipc/crypto.c:1466 tipc_init_net+0x2dd/0x430 net/tipc/core.c:72 ops_init+0xb9/0x650 net/core/net_namespace.c:139 setup_net+0x435/0xb40 net/core/net_namespace.c:343 copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508 create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228 ksys_unshare+0x419/0x970 kernel/fork.c:3323 __do_sys_unshare kernel/fork.c:3394 Freed by task 63: kfree+0x12a/0x3b0 mm/slub.c:4557 tipc_crypto_stop+0x23c/0x500 net/tipc/crypto.c:1539 tipc_exit_net+0x8c/0x110 net/tipc/core.c:119 ops_exit_list+0xb0/0x180 net/core/net_namespace.c:173 cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 After freed the tipc_crypto tx by delete namespace, tipc_aead_encrypt_done may still visit it in cryptd_queue_worker workqueue. I reproduce this issue by: ip netns add ns1 ip link add veth1 type veth peer name veth2 ip link set veth1 netns ns1 ip netns exec ns1 tipc bearer enable media eth dev veth1 ip netns exec ns1 tipc node set key this_is_a_master_key master ip netns exec ns1 tipc bearer disable media eth dev veth1 ip netns del ns1 The key of reproduction is that, simd_aead_encrypt is interrupted, leading to crypto_simd_usable() return false. Thus, the cryptd_queue_worker is triggered, and the tipc_crypto tx will be visited. tipc_disc_timeout tipc_bearer_xmit_skb tipc_crypto_xmit tipc_aead_encrypt crypto_aead_encrypt // encrypt() simd_aead_encrypt // crypto_simd_usable() is false child = &ctx->cryptd_tfm->base; simd_aead_encrypt crypto_aead_encrypt // encrypt() cryptd_aead_encrypt_enqueue cryptd_aead_enqueue cryptd_enqueue_request // trigger cryptd_queue_worker queue_work_on(smp_processor_id(), cryptd_wq, &cpu_queue->work) Fix this by holding net reference count before encrypt. Reported-by: syzbot+55c12726619ff85ce1f6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=55c12726619ff85ce1f6 Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Signed-off-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20250520101404.1341730-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22octeontx2-pf: Avoid adding dcbnl_ops for LBK and SDP vfSuman Ghosh1-3/+6
Priority flow control is not supported for LBK and SDP vf. This patch adds support to not add dcbnl_ops for LBK and SDP vf. Fixes: 8e67558177f8 ("octeontx2-pf: PFC config support with DCBx") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250519072658.2960851-1-sumang@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22selftests/tc-testing: Add an HFSC qlen accounting testCong Wang1-0/+27
This test reproduces a scenario where HFSC queue length and backlog accounting can become inconsistent when a peek operation triggers a dequeue and possible drop before the parent qdisc updates its counters. The test sets up a DRR root qdisc with an HFSC class, netem, and blackhole children, and uses Scapy to inject a packet. It helps to verify that HFSC correctly tracks qlen and backlog even when packets are dropped during peek-induced dequeue. Cc: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250518222038.58538-3-xiyou.wangcong@gmail.com Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()Cong Wang1-3/+3
When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the child qdisc's peek() operation before incrementing sch->q.qlen and sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may trigger an immediate dequeue and potential packet drop. In such cases, qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog have not yet been updated, leading to inconsistent queue accounting. This can leave an empty HFSC class in the active list, causing further consequences like use-after-free. This patch fixes the bug by moving the increment of sch->q.qlen and sch->qstats.backlog before the call to the child qdisc's peek() operation. This ensures that queue length and backlog are always accurate when packet drops or dequeues are triggered during the peek. Fixes: 12d0ad3be9c3 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline") Reported-by: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22iommu: Skip PASID validation for devices without PASID capabilityTushar Dave1-15/+28
Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly. pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing. However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices. This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID. Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored. Fixes: c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()") Cc: stable@vger.kernel.org Signed-off-by: Tushar Dave <tdave@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250520011937.3230557-1-tdave@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-21idpf: fix idpf_vport_splitq_napi_poll()Eric Dumazet1-9/+9
idpf_vport_splitq_napi_poll() can incorrectly return @budget after napi_complete_done() has been called. This violates NAPI rules, because after napi_complete_done(), current thread lost napi ownership. Move the test against POLL_MODE before the napi_complete_done(). Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Reported-by: Peter Newman <peternewman@google.com> Closes: https://lore.kernel.org/netdev/20250520121908.1805732-1-edumazet@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Joshua Hay <joshua.a.hay@intel.com> Cc: Alan Brady <alan.brady@intel.com> Cc: Madhu Chittim <madhu.chittim@intel.com> Cc: Phani Burra <phani.r.burra@intel.com> Cc: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Link: https://patch.msgid.link/20250520124030.1983936-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21ksmbd: use list_first_entry_or_null for opinfo_get_list()Namjae Jeon1-5/+2
The list_first_entry() macro never returns NULL. If the list is empty then it returns an invalid pointer. Use list_first_entry_or_null() to check if the list is empty. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202505080231.7OXwq4Te-lkp@intel.com/ Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-21ksmbd: fix rename failureNamjae Jeon1-1/+1
I found that rename fails after cifs mount due to update of lookup_one_qstr_excl(). mv a/c b/ mv: cannot move 'a/c' to 'b/c': No such file or directory In order to rename to a new name regardless of whether the dentry is negative, we need to get the dentry through lookup_one_qstr_excl(). So It will not return error if the name doesn't exist. Fixes: 204a575e91f3 ("VFS: add common error checks to lookup_one_qstr_excl()") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-05-21io_uring/net: only retry recv bundle for a full transferJens Axboe1-4/+10
If a shorter than assumed transfer was seen, a partial buffer will have been filled. For that case it isn't sane to attempt to fill more into the bundle before posting a completion, as that will cause a gap in the received data. Check if the iterator has hit zero and only allow to continue a bundle operation if that is the case. Also ensure that for putting finished buffers, only the current transfer is accounted. Otherwise too many buffers may be put for a short transfer. Link: https://github.com/axboe/liburing/issues/1409 Cc: stable@vger.kernel.org Fixes: 7c71a0af81ba ("io_uring/net: improve recv bundles") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-21bcachefs: Check for casefolded dirents in non casefolded dirsKent Overstreet2-1/+42
Check for mismatches between casefold dirents and casefold directories. A mismatch will cause lookups to fail, as we'll be doing the lookup with the casefolded name, which won't match the non-casefolded dirent, and vice versa. Reported-by: Christopher Snowhill <chris@kode54.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Fix bch2_dirent_create_snapshot() for casefoldingKent Overstreet4-21/+18
bch2_dirent_create_snapshot(), used in fsck, neglected to create a casefolded dirent. Just move this into dirent_create_key(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Fix casefold opt via xattr interfaceKent Overstreet4-26/+46
Changing the casefold option requires extra checks/work - factor out a helper from bch2_fileattr_set() for the xattr code to use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>