aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/scripts/gdb/linux/modules.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-27Make 'cc-option' work correctly for the -Wno-xyzzy patternLinus Torvalds1-2/+2
This is the follow-up to commit a79be02bba5c ("Fix mis-uses of 'cc-option' for warning disablement") where I mentioned that the best fix would be to just make 'cc-option' a bit smarter, and work for all compiler options, including the '-Wno-xyzzy' pattern that it used to accept unknown options for. It turns out that fixing cc-option is pretty straightforward: just rewrite any '-Wno-xyzzy' option pattern to use '-Wxyzzy' instead for testing. That makes the whole artificial distinction between 'cc-option' and 'cc-disable-warning' go away, and we can happily forget about the odd build rule that you have to treat compiler options that disable warnings specially. The 'cc-disable-warning' helper remains as a backwards compatibility syntax for now, but is implemented in terms of the new and improved cc-option. Acked-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-05-27btrfs: don't drop a reference if btrfs_check_write_meta_pointer() failsJosef Bacik1-1/+0
In the zoned mode there's a bug in the extent buffer tree conversion to xarray. The reference for eb is dropped and code continues but the references get dropped by releasing the batch. Reported-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/linux-btrfs/202505191521.435b97ac-lkp@intel.com/ Fixes: 19d7f65f032f ("btrfs: convert the buffer_radix to an xarray") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-05-27crypto: shash - Fix buffer overrun in import functionHerbert Xu1-4/+5
Only set the partial block length to zero if the algorithm is block-only. Otherwise the descriptor context could be empty, e.g., for digest_null. Reported-by: syzbot+4851c19615d35f0e4d68@syzkaller.appspotmail.com Fixes: 7650f826f7b2 ("crypto: shash - Handle partial blocks in API") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-26exfat: do not clear volume dirty flag during syncYuezhang Mo1-23/+7
xfstests generic/482 tests the file system consistency after each FUA operation. It fails when run on exfat. exFAT clears the volume dirty flag with a FUA operation during sync. Since s_lock is not held when data is being written to a file, sync can be executed at the same time. When data is being written to a file, the FAT chain is updated first, and then the file size is updated. If sync is executed between updating them, the length of the FAT chain may be inconsistent with the file size. To avoid the situation where the file system is inconsistent but the volume dirty flag is cleared, this commit moves the clearing of the volume dirty flag from exfat_fs_sync() to exfat_put_super(), so that the volume dirty flag is not cleared until unmounting. After the move, there is no additional action during sync, so exfat_fs_sync() can be deleted. Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-05-26exfat: fix double free in delayed_freeNamjae Jeon1-0/+1
The double free could happen in the following path. exfat_create_upcase_table() exfat_create_upcase_table() : return error exfat_free_upcase_table() : free ->vol_utbl exfat_load_default_upcase_table : return error exfat_kill_sb() delayed_free() exfat_free_upcase_table() <--------- double free This patch set ->vol_util as NULL after freeing it. Reported-by: Jianzhou Zhao <xnxc22xnxc22@qq.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-05-26x86/fpu: Fix irq_fpu_usable() to return false during CPU onliningEric Biggers4-13/+31
irq_fpu_usable() incorrectly returned true before the FPU is initialized. The x86 CPU onlining code can call sha256() to checksum AMD microcode images, before the FPU is initialized. Since sha256() recently gained a kernel-mode FPU optimized code path, a crash occurred in kernel_fpu_begin_mask() during hotplug CPU onlining. (The crash did not occur during boot-time CPU onlining, since the optimized sha256() code is not enabled until subsys_initcalls run.) Fix this by making irq_fpu_usable() return false before fpu__init_cpu() has run. To do this without adding any additional overhead to irq_fpu_usable(), replace the existing per-CPU bool in_kernel_fpu with kernel_fpu_allowed which tracks both initialization and usage rather than just usage. The initial state is false; FPU initialization sets it to true; kernel-mode FPU sections toggle it to false and then back to true; and CPU offlining restores it to the initial state of false. Fixes: 11d7956d526f ("crypto: x86/sha256 - implement library instead of shash") Reported-by: Ayush Jain <Ayush.Jain3@amd.com> Closes: https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Ayush Jain <Ayush.Jain3@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-25Linux 6.15Linus Torvalds1-1/+1
2025-05-25Disable FOP_DONTCACHE for now due to bugsLinus Torvalds1-1/+1
This is kind of last-minute, but Al Viro reported that the new FOP_DONTCACHE flag causes memory corruption due to use-after-free issues. This was triggered by commit 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE"), but that is not the underlying bug - it is just the first user of the flag. Vlastimil Babka suspects the underlying problem stems from the folio_end_writeback() logic introduced in commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached pages when writeback completes"). The most straightforward fix would be to just revert the commit that exposed this, but Matthew Wilcox points out that other filesystems are also starting to enable the FOP_DONTCACHE logic, so this instead disables that bit globally for now. The fix will hopefully end up being trivial and we can just re-enable this logic after more testing, but until such a time we'll have to disable the new FOP_DONTCACHE flag. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20250525083209.GS2023217@ZenIV/ Triggered-by: 974c5e6139db ("xfs: flag as supporting FOP_DONTCACHE") Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-05-25mailmap: add Jarkko's employer email addressJarkko Sakkinen1-0/+1
Add the current employer email address to mailmap. Link: https://lkml.kernel.org/r/20250523121105.15850-1-jarkko@kernel.org Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com> Cc: Antonio Quartulli <antonio@openvpn.net> Cc: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: fix copy_vma() error handling for hugetlb mappingsRicardo Cañuelo Navarro4-3/+22
If, during a mremap() operation for a hugetlb-backed memory mapping, copy_vma() fails after the source vma has been duplicated and opened (ie. vma_link() fails), the error is handled by closing the new vma. This updates the hugetlbfs reservation counter of the reservation map which at this point is referenced by both the source vma and the new copy. As a result, once the new vma has been freed and copy_vma() returns, the reservation counter for the source vma will be incorrect. This patch addresses this corner case by clearing the hugetlb private page reservation reference for the new vma and decrementing the reference before closing the vma, so that vma_close() won't update the reservation counter. This is also what copy_vma_and_data() does with the source vma if copy_vma() succeeds, so a helper function has been added to do the fixup in both functions. The issue was reported by a private syzbot instance and can be reproduced using the C reproducer in [1]. It's also a possible duplicate of public syzbot report [2]. The WARNING report is: ============================================================ page_counter underflow: -1024 nr_pages=1024 WARNING: CPU: 0 PID: 3287 at mm/page_counter.c:61 page_counter_cancel+0xf6/0x120 Modules linked in: CPU: 0 UID: 0 PID: 3287 Comm: repro__WARNING_ Not tainted 6.15.0-rc7+ #54 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014 RIP: 0010:page_counter_cancel+0xf6/0x120 Code: ff 5b 41 5e 41 5f 5d c3 cc cc cc cc e8 f3 4f 8f ff c6 05 64 01 27 06 01 48 c7 c7 60 15 f8 85 48 89 de 4c 89 fa e8 2a a7 51 ff <0f> 0b e9 66 ff ff ff 44 89 f9 80 e1 07 38 c1 7c 9d 4c 81 RSP: 0018:ffffc900025df6a0 EFLAGS: 00010246 RAX: 2edfc409ebb44e00 RBX: fffffffffffffc00 RCX: ffff8880155f0000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: dffffc0000000000 R08: ffffffff81c4a23c R09: 1ffff1100330482a R10: dffffc0000000000 R11: ffffed100330482b R12: 0000000000000000 R13: ffff888058a882c0 R14: ffff888058a882c0 R15: 0000000000000400 FS: 0000000000000000(0000) GS:ffff88808fc53000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004b33e0 CR3: 00000000076d6000 CR4: 00000000000006f0 Call Trace: <TASK> page_counter_uncharge+0x33/0x80 hugetlb_cgroup_uncharge_counter+0xcb/0x120 hugetlb_vm_op_close+0x579/0x960 ? __pfx_hugetlb_vm_op_close+0x10/0x10 remove_vma+0x88/0x130 exit_mmap+0x71e/0xe00 ? __pfx_exit_mmap+0x10/0x10 ? __mutex_unlock_slowpath+0x22e/0x7f0 ? __pfx_exit_aio+0x10/0x10 ? __up_read+0x256/0x690 ? uprobe_clear_state+0x274/0x290 ? mm_update_next_owner+0xa9/0x810 __mmput+0xc9/0x370 exit_mm+0x203/0x2f0 ? __pfx_exit_mm+0x10/0x10 ? taskstats_exit+0x32b/0xa60 do_exit+0x921/0x2740 ? do_raw_spin_lock+0x155/0x3b0 ? __pfx_do_exit+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 ? _raw_spin_lock_irq+0xc5/0x100 do_group_exit+0x20c/0x2c0 get_signal+0x168c/0x1720 ? __pfx_get_signal+0x10/0x10 ? schedule+0x165/0x360 arch_do_signal_or_restart+0x8e/0x7d0 ? __pfx_arch_do_signal_or_restart+0x10/0x10 ? __pfx___se_sys_futex+0x10/0x10 syscall_exit_to_user_mode+0xb8/0x2c0 do_syscall_64+0x75/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x422dcd Code: Unable to access opcode bytes at 0x422da3. RSP: 002b:00007ff266cdb208 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: 0000000000000001 RBX: 00007ff266cdbcdc RCX: 0000000000422dcd RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004c7bec RBP: 00007ff266cdb220 R08: 203a6362696c6720 R09: 203a6362696c6720 R10: 0000200000c00000 R11: 0000000000000246 R12: ffffffffffffffd0 R13: 0000000000000002 R14: 00007ffe1cb5f520 R15: 00007ff266cbb000 </TASK> ============================================================ Link: https://lkml.kernel.org/r/20250523-warning_in_page_counter_cancel-v2-1-b6df1a8cfefd@igalia.com Link: https://people.igalia.com/rcn/kernel_logs/20250422__WARNING_in_page_counter_cancel__repro.c [1] Link: https://lore.kernel.org/all/67000a50.050a0220.49194.048d.GAE@google.com/ [2] Signed-off-by: Ricardo Cañuelo Navarro <rcn@igalia.com> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Florent Revest <revest@google.com> Cc: Jann Horn <jannh@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25memcg: always call cond_resched() after fn()Breno Leitao1-4/+2
I am seeing soft lockup on certain machine types when a cgroup OOMs. This is happening because killing the process in certain machine might be very slow, which causes the soft lockup and RCU stalls. This happens usually when the cgroup has MANY processes and memory.oom.group is set. Example I am seeing in real production: [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) .... .... [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) .... [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982] .... Quick look at why this is so slow, it seems to be related to serial flush for certain machine types. For all the crashes I saw, the target CPU was at console_flush_all(). In the case above, there are thousands of processes in the cgroup, and it is soft locking up before it reaches the 1024 limit in the code (which would call the cond_resched()). So, cond_resched() in 1024 blocks is not sufficient. Remove the counter-based conditional rescheduling logic and call cond_resched() unconditionally after each task iteration, after fn() is called. This avoids the lockup independently of how slow fn() is. Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org Fixes: ade81479c7dd ("memcg: fix soft lockup in the OOM process") Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Rik van Riel <riel@surriel.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Michael van der Westhuizen <rmikey@meta.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/hugetlb: fix kernel NULL pointer dereference when replacing free hugetlb foliosGe Yang1-0/+8
A kernel crash was observed when replacing free hugetlb folios: BUG: kernel NULL pointer dereference, address: 0000000000000028 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 28 UID: 0 PID: 29639 Comm: test_cma.sh Tainted 6.15.0-rc6-zp #41 PREEMPT(voluntary) RIP: 0010:alloc_and_dissolve_hugetlb_folio+0x1d/0x1f0 RSP: 0018:ffffc9000b30fa90 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000342cca RCX: ffffea0043000000 RDX: ffffc9000b30fb08 RSI: ffffea0043000000 RDI: 0000000000000000 RBP: ffffc9000b30fb20 R08: 0000000000001000 R09: 0000000000000000 R10: ffff88886f92eb00 R11: 0000000000000000 R12: ffffea0043000000 R13: 0000000000000000 R14: 00000000010c0200 R15: 0000000000000004 FS: 00007fcda5f14740(0000) GS:ffff8888ec1d8000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 0000000391402000 CR4: 0000000000350ef0 Call Trace: <TASK> replace_free_hugepage_folios+0xb6/0x100 alloc_contig_range_noprof+0x18a/0x590 ? srso_return_thunk+0x5/0x5f ? down_read+0x12/0xa0 ? srso_return_thunk+0x5/0x5f cma_range_alloc.constprop.0+0x131/0x290 __cma_alloc+0xcf/0x2c0 cma_alloc_write+0x43/0xb0 simple_attr_write_xsigned.constprop.0.isra.0+0xb2/0x110 debugfs_attr_write+0x46/0x70 full_proxy_write+0x62/0xa0 vfs_write+0xf8/0x420 ? srso_return_thunk+0x5/0x5f ? filp_flush+0x86/0xa0 ? srso_return_thunk+0x5/0x5f ? filp_close+0x1f/0x30 ? srso_return_thunk+0x5/0x5f ? do_dup2+0xaf/0x160 ? srso_return_thunk+0x5/0x5f ksys_write+0x65/0xe0 do_syscall_64+0x64/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e There is a potential race between __update_and_free_hugetlb_folio() and replace_free_hugepage_folios(): CPU1 CPU2 __update_and_free_hugetlb_folio replace_free_hugepage_folios folio_test_hugetlb(folio) -- It's still hugetlb folio. __folio_clear_hugetlb(folio) hugetlb_free_folio(folio) h = folio_hstate(folio) -- Here, h is NULL pointer When the above race condition occurs, folio_hstate(folio) returns NULL, and subsequent access to this NULL pointer will cause the system to crash. To resolve this issue, execute folio_hstate(folio) under the protection of the hugetlb_lock lock, ensuring that folio_hstate(folio) does not return NULL. Link: https://lkml.kernel.org/r/1747884137-26685-1-git-send-email-yangge1116@126.com Fixes: 04f13d241b8b ("mm: replace free hugepage folios after migration") Signed-off-by: Ge Yang <yangge1116@126.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: vmalloc: only zero-init on vrealloc shrinkKees Cook1-5/+7
The common case is to grow reallocations, and since init_on_alloc will have already zeroed the whole allocation, we only need to zero when shrinking the allocation. Link: https://lkml.kernel.org/r/20250515214217.619685-2-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: Shung-Hsi Yu <shung-hsi.yu@suse.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm: vmalloc: actually use the in-place vrealloc regionKees Cook1-0/+1
Patch series "mm: vmalloc: Actually use the in-place vrealloc region". This fixes a performance regression[1] with vrealloc()[1]. The refactoring to not build a new vmalloc region only actually worked when shrinking. Actually return the resized area when it grows. Ugh. Link: https://lkml.kernel.org/r/20250515214217.619685-1-kees@kernel.org Fixes: a0309faf1cb0 ("mm: vmalloc: support more granular vrealloc() sizing") Signed-off-by: Kees Cook <kees@kernel.org> Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Closes: https://lore.kernel.org/all/20250515-bpf-verifier-slowdown-vwo2meju4cgp2su5ckj@6gi6ssxbnfqg [1] Tested-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Cc: "Erhard F." <erhard_f@mailbox.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25alloc_tag: allocate percpu counters for module tags dynamicallySuren Baghdasaryan5-28/+88
When a module gets unloaded it checks whether any of its tags are still in use and if so, we keep the memory containing module's allocation tags alive until all tags are unused. However percpu counters referenced by the tags are freed by free_module(). This will lead to UAF if the memory allocated by a module is accessed after module was unloaded. To fix this we allocate percpu counters for module allocation tags dynamically and we keep it alive for tags which are still in use after module unloading. This also removes the requirement of a larger PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because percpu memory for counters does not need to be reserved anymore. Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Tested-by: David Wang <00107082@163.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25module: release codetag section when module load failsDavid Wang1-0/+1
When module load fails after memory for codetag section is ready, codetag section memory will not be properly released. This causes memory leak, and if next module load happens to get the same module address, codetag may pick the uninitialized section when manipulating tags during module unload, and leads to "unable to handle page fault" BUG. Link: https://lkml.kernel.org/r/20250519163823.7540-1-00107082@163.com Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory") Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/ Signed-off-by: David Wang <00107082@163.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25mm/cma: make detection of highmem_start more robustMike Rapoport (Microsoft)1-1/+4
Pratyush Yadav reports the following crash: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/physaddr.c:23! ception 0x06 IP 10:ffffffff812ebbf8 error 0 cr2 0xffff88903ffff000 CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc6+ #231 PREEMPT(undef) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 RIP: 0010:__phys_addr+0x58/0x60 Code: 01 48 89 c2 48 d3 ea 48 85 d2 75 05 e9 91 52 cf 00 0f 0b 48 3d ff ff ff 1f 77 0f 48 8b 05 20 54 55 01 48 01 d0 e9 78 52 cf 00 <0f> 0b 90 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 0000:ffffffff82803dd8 EFLAGS: 00010006 ORIG_RAX: 0000000000000000 RAX: 000000007fffffff RBX: 00000000ffffffff RCX: 0000000000000000 RDX: 000000007fffffff RSI: 0000000280000000 RDI: ffffffffffffffff RBP: ffffffff82803e68 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff83153180 R11: ffffffff82803e48 R12: ffffffff83c9aed0 R13: 0000000000000000 R14: 0000001040000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:0000000000000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88903ffff000 CR3: 0000000002838000 CR4: 00000000000000b0 Call Trace: <TASK> ? __cma_declare_contiguous_nid+0x6e/0x340 ? cma_declare_contiguous_nid+0x33/0x70 ? dma_contiguous_reserve_area+0x2f/0x70 ? setup_arch+0x6f1/0x870 ? start_kernel+0x52/0x4b0 ? x86_64_start_reservations+0x29/0x30 ? x86_64_start_kernel+0x7c/0x80 ? common_startup_64+0x13e/0x141 The reason is that __cma_declare_contiguous_nid() does: highmem_start = __pa(high_memory - 1) + 1; If dma_contiguous_reserve_area() (or any other CMA declaration) is called before free_area_init(), high_memory is uninitialized. Without CONFIG_DEBUG_VIRTUAL, it will likely work but use the wrong value for highmem_start. The issue occurs because commit e120d1bc12da ("arch, mm: set high_memory in free_area_init()") moved initialization of high_memory after the call to dma_contiguous_reserve() -> __cma_declare_contiguous_nid() on several architectures. In the case CONFIG_HIGHMEM is enabled, some architectures that actually support HIGHMEM (arm, powerpc and x86) have initialization of high_memory before a possible call to __cma_declare_contiguous_nid() and some initialized high_memory late anyway (arc, csky, microblase, mips, sparc, xtensa) even before the commit e120d1bc12da so they are fine with using uninitialized value of high_memory. And in the case CONFIG_HIGHMEM is disabled high_memory essentially becomes the first address after memory end, so instead of relying on high_memory to calculate highmem_start use memblock_end_of_DRAM() and eliminate the dependency of CMA area creation on high_memory in majority of configurations. Link: https://lkml.kernel.org/r/20250519171805.1288393-1-rppt@kernel.org Fixes: e120d1bc12da ("arch, mm: set high_memory in free_area_init()") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reported-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Pratyush Yadav <ptyadav@amazon.de> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-25perf/headers: Clean up <linux/perf_event.h> a bitIngo Molnar1-127/+155
Do a bit of readability spring cleaning: - Fix misaligned structure member in perf_addr_filter: the new struct perf_addr_filter::action member was too long, but when it was added it was not aligned properly. Align all fields to the customary column 41 alignment of most of the rest of the header. - Adjust the vertical alignment of the definition of other structures and definitions as well, so that the 'most of' in the previous paragraph changes to 'all of'. ;-) - Prettify the assignments in perf_clear_branch_entry_bitfields() - Move comments from CPP definitions to outside the macro - Move perf_guest_info_callbacks and related defines from the front of the header closer to where it's used within the header. - And more #endif markers for larger CPP blocks and standardize #if/#else/#endif blocks to the following nomenclature: #ifdef CONFIG_FOO ... #else /* !CONFIG_FOO: */ ... #endif /* !CONFIG_FOO */ - Standardize on consistently using the 'extern' storage class where appropriate, we had cases where method prototypes sometimes omitted the storage class: extern void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu); int perf_event_read_local(struct perf_event *event, u64 *value, u64 *enabled, u64 *running); extern u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running); Which is obviously a bit confusing and adds unnecessary noise. - s/__u64/u64 and similar cleanups: there's no point in using __u64 in non-UAPI headers, and doing so only adds unnecessary visual noise. - Harmonize all multi-parameter function prototypes along the following style: extern struct perf_event * perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu, struct task_struct *task, perf_overflow_handler_t callback, void *context); - etc. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-25erofs: support DEFLATE decompression by using Intel QATBo Liu8-5/+265
This patch introduces the use of the Intel QAT to offload EROFS data decompression, aiming to improve the decompression performance. A 285MiB dataset is used with the following command to create EROFS images with different cluster sizes: $ mkfs.erofs -zdeflate,level=9 -C{4096,16384,65536,131072,262144} Fio is used to test the following read patterns: $ fio -filename=testfile -bs=4k -rw=read -name=job1 $ fio -filename=testfile -bs=4k -rw=randread -name=job1 $ fio -filename=testfile -bs=4k -rw=randread --io_size=14m -name=job1 Here are some performance numbers for reference: Processors: Intel(R) Xeon(R) 6766E (144 cores) Memory: 512 GiB |-----------------------------------------------------------------------------| | | Cluster size | sequential read | randread | small randread(5%) | |-----------|--------------|-----------------|-----------|--------------------| | Intel QAT | 4096 | 538 MiB/s | 112 MiB/s | 20.76 MiB/s | | Intel QAT | 16384 | 699 MiB/s | 158 MiB/s | 21.02 MiB/s | | Intel QAT | 65536 | 917 MiB/s | 278 MiB/s | 20.90 MiB/s | | Intel QAT | 131072 | 1056 MiB/s | 351 MiB/s | 23.36 MiB/s | | Intel QAT | 262144 | 1145 MiB/s | 431 MiB/s | 26.66 MiB/s | | deflate | 4096 | 499 MiB/s | 108 MiB/s | 21.50 MiB/s | | deflate | 16384 | 422 MiB/s | 125 MiB/s | 18.94 MiB/s | | deflate | 65536 | 452 MiB/s | 159 MiB/s | 13.02 MiB/s | | deflate | 131072 | 452 MiB/s | 177 MiB/s | 11.44 MiB/s | | deflate | 262144 | 466 MiB/s | 194 MiB/s | 10.60 MiB/s | Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20250522094931.28956-1-liubo03@inspur.com [ Gao Xiang: refine the commit message. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2025-05-24drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr()Dan Carpenter1-1/+1
The nvkm_device_tegra_resource() function returns a mix of error pointers and NULL. The callers only expect it to return NULL on error. Change it to only return NULL. Fixes: 76b8f81a5b92 ("drm/nouveau: improve handling of 64-bit BARs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://lore.kernel.org/dri-devel/334404bdf60765cb5a8e855a74c688bc537531ee.camel@nvidia.com/T/#t
2025-05-23bcachefs: Don't mount bs > ps without TRANSPARENT_HUGEPAGEKent Overstreet1-0/+7
Large folios aren't supported without TRANSPARENT_HUGEPAGE Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix btree_iter_next_node() for new locking assertsKent Overstreet1-2/+2
We can't unlock a should_be_locked path unless we're in a transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Ensure we don't use a blacklisted journal seqKent Overstreet3-1/+27
Different versions differ on the size of the blacklist range; it is theoretically possible that we could end up with blacklisted journal sequence numbers newer than the newest seq we find in the journal, and pick a new start seq that's blacklisted. Explicitly check for this in bch2_fs_journal_start(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Small check_fix_ptr fixesKent Overstreet1-8/+9
We don't want to change the bucket gen, on gen mismatch: it's possible to have multiple btree nodes with different gens in the same bucket that we want to keep, if we have to recover from btree node scan. It's also not necessary to set g->gen_valid; add a comment to that effect. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix opts.recovery_pass_lastKent Overstreet1-0/+3
This was lost in the giant recovery pass rework - but it's used heavily by bcachefs subcommand utilities. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix allocate -> self healing pathKent Overstreet1-0/+2
When we go to allocate and find taht a bucket in the freespace btree is actually allocated, we're supposed to return nonzero to tell the allocator to skip it. This fixes an emergency read only due to a bucket/ptr gen mismatch - we also don't return the correct bucket gen when this happens. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23bcachefs: Fix endianness in casefold check/repairKent Overstreet2-4/+4
Fixes: 010c89468134 ("bcachefs: Check for casefolded dirents in non casefolded dirs") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-23ACPI: MRRM: Fix default max memory regionAnil S Keshavamurthy2-2/+3
Per the spec, the default max memory region must be 1 covering all system memory. When platform does not provide ACPI MRRM table or when CONFIG_ACPI is opted out, the acpi_mrrm_max_mem_region() function defaults to returning 1 region complying to RDT spec. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250523172001.1761634-1-anil.s.keshavamurthy@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23tpm_crb: ffa_tpm: fix/update comments describing the CRB over FFA ABIStuart Yoder1-4/+7
-Fix the comment describing the 'start' function, which was a cut/paste mistake for a different function. -The comment for DIRECT_REQ and DIRECT_RESP only mentioned AArch32 and listed 32-bit function IDs. Update to include 64-bit. Signed-off-by: Stuart Yoder <stuart.yoder@arm.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-23tpm_crb_ffa: use dev_xx() macro to print logYeoreum Yun1-4/+4
Instead of pr_xxx() macro, use dev_xxx() to print log. This patch changes some error log level to warn log level when the tpm_crb_ffa secure partition doesn't support properly but system can run without it. (i.e) unsupport of direct message ABI or unsupported ABI version Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-23tpm_ffa_crb: access tpm service over FF-A direct message request v2Yeoreum Yun1-15/+40
For secure partition with multi service, tpm_ffa_crb can access tpm service with direct message request v2 interface according to chapter 3.3, TPM Service Command Response Buffer Interface Over FF-A specificationi v1.0 BET. This patch reflects this spec to access tpm service over FF-A direct message request v2 ABI. Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-23tpm: remove kmalloc failure error messageColin Ian King1-5/+2
The kmalloc failure message is just noise. Remove it and replace -EFAULT with -ENOMEM as standard for out of memory allocation error returns. Link: https://lore.kernel.org/linux-integrity/20250430083435.860146-1-colin.i.king@gmail.com/ Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2025-05-23spi: spi-qpic-snand: return early on error from qcom_spi_io_op()Gabor Juhos1-2/+4
When submitting of the descriptors fails, it is quite likely that the register read buffer contains no valid data. Even if the data is valid the function returns with an error code anyway. Change the code to return early if qcom_submit_descs() fails to avoid superfluously copying possibly invalid data. Also change the return statement at the end of the function to use zero value to indicate success obviusly. Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Link: https://patch.msgid.link/20250515-qpic-snand-early-error-v1-1-681c87611213@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-23selftests: ublk: add test for UBLK_F_QUIESCEMing Lei6-5/+115
Add test generic_11 for covering new control command of UBLK_U_CMD_QUIESCE_DEV. Add 'quiesce -n dev_id' sub-command on ublk utility for transitioning device state to quiesce states, then verify the feature via generic_10 by doing quiesce and recovery. Cc: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23ublk: add feature UBLK_F_QUIESCEMing Lei2-1/+142
Add feature UBLK_F_QUIESCE, which adds control command `UBLK_U_CMD_QUIESCE_DEV` for quiescing device, then device state can become `UBLK_S_DEV_QUIESCED` or `UBLK_S_DEV_FAIL_IO` finally from ublk_ch_release() with ublk server cooperation. This feature can help to support to upgrade ublk server application by shutting down ublk server gracefully, meantime keep ublk block device persistent during the upgrading period. The feature is only available for UBLK_F_USER_RECOVERY. Suggested-by: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZEMing Lei5-1/+93
Add test generic_10 for covering new control command of UBLK_U_CMD_UPDATE_SIZE. Add 'update_size -s|--size size_in_bytes' sub-command on ublk utility for supporting this feature, then verify the feature via generic_10. Cc: Omri Mann <omri@nvidia.com> Cc: Jared Holzman <jholzman@nvidia.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23traceevent/block: Add REQ_ATOMIC flag to block trace eventsRitesh Harjani (IBM)2-1/+3
Filesystems like XFS can implement atomic write I/O using either REQ_ATOMIC flag set in the bio or via CoW operation. It will be useful if we have a flag in trace events to distinguish between the two. This patch adds char 'U' (Untorn writes) to rwbs field of the trace events if REQ_ATOMIC flag is set in the bio. <W/ REQ_ATOMIC> ================= xfs_io-4238 [009] ..... 4148.126843: block_rq_issue: 259,0 WFSU 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [009] d.h1. 4148.129864: block_rq_complete: 259,0 WFSU () 768 + 32 none,0,0 [0] <W/O REQ_ATOMIC> =============== xfs_io-4237 [010] ..... 4143.325616: block_rq_issue: 259,0 WS 16384 () 768 + 32 none,0,0 [xfs_io] <idle>-0 [010] d.H1. 4143.329138: block_rq_complete: 259,0 WS () 768 + 32 none,0,0 [0] Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/44317cb2ec4588f6a2c1501a96684e6a1196e8ba.1747921498.git.ritesh.list@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23thermal: qcom: ipq5018: make ops_ipq5018 struct staticGeorge Moussalem1-1/+1
Fix a sparse warning by making the ops_ipq5018 struct static. Fixes: 04b31cc53fe0 ("thermal/drivers/qcom/tsens: Add support for IPQ5018 tsens") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505202356.S21Sc7bk-lkp@intel.com/ Signed-off-by: George Moussalem <george.moussalem@outlook.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://patch.msgid.link/20250522-ipq5018-tsens-sparse-v1-1-97edaaaef27c@outlook.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23thermal/drivers/airoha: Fix spelling mistake "calibrarion" -> "calibration"Colin Ian King1-1/+1
There is a spelling mistake in a dev_info message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20250522083157.1957240-1-colin.i.king@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23ACPI: bus: Bail out if acpi_kobj registration failsArmin Wolf1-2/+4
The ACPI sysfs code will fail to initialize if acpi_kobj is NULL, together with some ACPI drivers. Follow the other firmware subsystems and bail out if the kobject cannot be registered. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20250518185111.3560-2-W_Armin@gmx.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23ACPI: platform_profile: Avoid initializing on non-ACPI platformsAlexandre Ghiti1-0/+3
The platform profile driver is loaded even on platforms that do not have ACPI enabled. The initialization of the sysfs entries was recently moved from platform_profile_register() to the module init call, and those entries need acpi_kobj to be initialized which is not the case when ACPI is disabled. This results in the following warning: WARNING: CPU: 5 PID: 1 at fs/sysfs/group.c:131 internal_create_group+0xa22/0xdd8 Modules linked in: CPU: 5 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.15.0-rc7-dirty #6 PREEMPT Tainted: [W]=WARN Hardware name: riscv-virtio,qemu (DT) epc : internal_create_group+0xa22/0xdd8 ra : internal_create_group+0xa22/0xdd8 Call Trace: internal_create_group+0xa22/0xdd8 sysfs_create_group+0x22/0x2e platform_profile_init+0x74/0xb2 do_one_initcall+0x198/0xa9e kernel_init_freeable+0x6d8/0x780 kernel_init+0x28/0x24c ret_from_fork+0xe/0x18 Fix this by checking if ACPI is enabled before trying to create sysfs entries. Fixes: 77be5cacb2c2 ("ACPI: platform_profile: Create class for ACPI platform profile") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://patch.msgid.link/20250522141410.31315-1-alexghiti@rivosinc.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-23firmware: cs_dsp: Fix OOB memory read access in KUnit testJaroslav Kysela1-1/+2
KASAN reported out of bounds access - cs_dsp_mock_bin_add_name_or_info(), because the source string length was rounded up to the allocation size. Cc: Simon Trimmer <simont@opensource.cirrus.com> Cc: Charles Keepax <ckeepax@opensource.cirrus.com> Cc: Richard Fitzgerald <rf@opensource.cirrus.com> Cc: patches@opensource.cirrus.com Cc: stable@vger.kernel.org Signed-off-by: Jaroslav Kysela <perex@perex.cz> Link: https://patch.msgid.link/20250523102102.1177151-1-perex@perex.cz Reviewed-by: Richard Fitzgerald <rf@opensource.cirrus.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-23io_uring/cmd: warn on reg buf imports by ineligible cmdsPavel Begunkov1-0/+6
For IORING_URING_CMD_FIXED-less commands io_uring doesn't pull buf_index from the sqe, so imports might succeed if the index coincide, e.g. when it's 0, but otherwise it's error prone. Warn if someone tries to import without the flag. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/a1c2c88e53c3fe96978f23d50c6bc66c2c79c337.1747991070.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23statmount: update STATMOUNT_SUPPORTED macroDmitry V. Levin1-1/+3
According to commit 8f6116b5b77b ("statmount: add a new supported_mask field"), STATMOUNT_SUPPORTED macro shall be updated whenever a new flag is added. Fixes: 7a54947e727b ("Merge patch series "fs: allow changing idmappings"") Signed-off-by: "Dmitry V. Levin" <ldv@strace.io> Link: https://lore.kernel.org/20250511224953.GA17849@strace.io Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23fs: convert mount flags to enumStephen Brennan1-42/+45
In prior kernel versions (5.8-6.8), commit 9f6c61f96f2d9 ("proc/mounts: add cursor") introduced MNT_CURSOR, a flag used by readers from /proc/mounts to keep their place while reading the file. Later, commit 2eea9ce4310d8 ("mounts: keep list of mounts in an rbtree") removed this flag and its value has since been repurposed. For debuggers iterating over the list of mounts, cursors should be skipped as they are irrelevant. Detecting whether an element is a cursor can be difficult. Since the MNT_CURSOR flag is a preprocessor constant, it's not present in debuginfo, and since its value is repurposed, we cannot hard-code it. For this specific issue, cursors are possible to detect in other ways, but ideally, we would be able to read the mount flag definitions out of the debuginfo. For that reason, convert the mount flags to an enum. Link: https://github.com/osandov/drgn/pull/496 Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Link: https://lore.kernel.org/20250507223402.2795029-1-stephen.s.brennan@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23->mnt_devname is never NULLAl Viro2-14/+11
Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()" back in 2018. Get rid of the dead checks... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/20250421033509.GV2023217@ZenIV Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23mount: add a comment about concurrent changes with statmount()/listmount()Christian Brauner1-3/+23
Add some comments in there highlighting a few non-obvious assumptions. Link: https://lore.kernel.org/20250416-zerknirschen-aluminium-14a55639076f@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-23io_uring/io-wq: only create a new worker if it can make progressJens Axboe1-0/+28
Hashed work is serialized by io-wq, intended to be used for cases like serializing buffered writes to a regular file, where the file system will serialize the workers anyway with a mutex or similar. Since they would be forcibly serialized and blocked, it's more efficient for io-wq to handle these individually rather than issue them in parallel. If a worker is currently handling a hashed work item and gets blocked, don't create a new worker if the next work item is also hashed and mapped to the same bucket. That new worker would not be able to make any progress anyway. Reported-by: Fengnan Chang <changfengnan@bytedance.com> Reported-by: Diangang Li <lidiangang@bytedance.com> Link: https://lore.kernel.org/io-uring/20250522090909.73212-1-changfengnan@bytedance.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/io-wq: ignore non-busy worker going to sleepJens Axboe1-0/+2
When an io-wq worker goes to sleep, it checks if there's work to do. If there is, it'll create a new worker. But if this worker is currently idle, it'll either get woken right back up immediately, or someone else has already created the necessary worker to handle this work. Only go through the worker creation logic if the current worker is currently handling a work item. That means it's being scheduled out as part of handling that work, not just going to sleep on its own. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-23io_uring/io-wq: move hash helpers to the topJens Axboe1-10/+10
Just in preparation for using them higher up in the function, move these generic helpers. Signed-off-by: Jens Axboe <axboe@kernel.dk>