aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/exported-sql-viewer.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-04-11mm: fix apply_to_existing_page_range()Kirill A. Shutemov1-2/+2
In the case of apply_to_existing_page_range(), apply_to_pte_range() is reached with 'create' set to false. When !create, the loop over the PTE page table is broken. apply_to_pte_range() will only move to the next PTE entry if 'create' is true or if the current entry is not pte_none(). This means that the user of apply_to_existing_page_range() will not have 'fn' called for any entries after the first pte_none() in the PTE page table. Fix the loop logic in apply_to_pte_range(). There are no known runtime issues from this, but the fix is trivial enough for stable@ even without a known buggy user. Link: https://lkml.kernel.org/r/20250409094043.1629234-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Fixes: be1db4753ee6 ("mm/memory.c: add apply_to_existing_page_range() helper") Cc: Daniel Axtens <dja@axtens.net> Cc: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11selftests/mm: fix compiler -Wmaybe-uninitialized warningAnshuman Khandual1-1/+1
Following build warning comes up for cow test as 'transferred' variable has not been initialized. Fix the warning via zero init for the variable. CC cow cow.c: In function `do_test_vmsplice_in_parent': cow.c:365:61: warning: `transferred' may be used uninitialized [-Wmaybe-uninitialized] 365 | cur = read(fds[0], new + total, transferred - total); | ~~~~~~~~~~~~^~~~~~~ cow.c:296:29: note: `transferred' was declared here 296 | ssize_t cur, total, transferred; | ^~~~~~~~~~~ CC compaction_test CC gup_longterm Link: https://lkml.kernel.org/r/20250409095006.1422620-1-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11alloc_tag: handle incomplete bulk allocations in vm_module_tags_populateT.J. Mercier1-3/+12
alloc_pages_bulk_node() may partially succeed and allocate fewer than the requested nr_pages. There are several conditions under which this can occur, but we have encountered the case where CONFIG_PAGE_OWNER is enabled causing all bulk allocations to always fallback to single page allocations due to commit 187ad460b841 ("mm/page_alloc: avoid page allocator recursion with pagesets.lock held"). Currently vm_module_tags_populate() immediately fails when alloc_pages_bulk_node() returns fewer than the requested number of pages. When this happens memory allocation profiling gets disabled, for example [ 14.297583] [9: modprobe: 465] Failed to allocate memory for allocation tags in the module scsc_wlan. Memory allocation profiling is disabled! [ 14.299339] [9: modprobe: 465] modprobe: Failed to insmod '/vendor/lib/modules/scsc_wlan.ko' with args '': Out of memory This patch causes vm_module_tags_populate() to retry bulk allocations for the remaining memory instead of failing immediately which will avoid the disablement of memory allocation profiling. Link: https://lkml.kernel.org/r/20250409225111.3770347-1-tjmercier@google.com Fixes: 0f9b685626da ("alloc_tag: populate memory for module tags as needed") Signed-off-by: T.J. Mercier <tjmercier@google.com> Reported-by: Janghyuck Kim <janghyuck.kim@samsung.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mailmap: add entry for Jean-Michel HautboisJean-Michel Hautbois1-0/+1
As recent contributions where made with the @ideasonboard.com email, any reply would fail. Add the proper address to map this old one. Link: https://lkml.kernel.org/r/20250328-mailmap-v2-v2-1-bdc69d2193ca@yoseli.org Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm: (un)track_pfn_copy() fix + doc improvementsDavid Hildenbrand2-4/+7
We got a late smatch warning and some additional review feedback. smatch warnings: mm/memory.c:1428 copy_page_range() error: uninitialized symbol 'pfn'. We actually use the pfn only when it is properly initialized; however, we may pass an uninitialized value to a function -- although it will not use it that likely still is UB in C. So let's just fix it by always initializing pfn in the caller of track_pfn_copy(), and improving the documentation of track_pfn_copy(). While at it, clarify the doc of untrack_pfn_copy(), that internal checks make sure if we actually have to untrack anything. Link: https://lkml.kernel.org/r/20250408085950.976103-1-david@redhat.com Fixes: dc84bc2aba85 ("x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202503270941.IFILyNCX-lkp@intel.com/ Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm: fix filemap_get_folios_contig returning batches of identical foliosVishal Moola (Oracle)1-0/+1
filemap_get_folios_contig() is supposed to return distinct folios found within [start, end]. Large folios in the Xarray become multi-index entries. xas_next() can iterate through the sub-indexes before finding a sibling entry and breaking out of the loop. This can result in a returned folio_batch containing an indeterminate number of duplicate folios, which forces the callers to skeptically handle the returned batch. This is inefficient and incurs a large maintenance overhead. We can fix this by calling xas_advance() after we have successfully adding a folio to the batch to ensure our Xarray is positioned such that it will correctly find the next folio - similar to filemap_get_read_batch(). Link: https://lkml.kernel.org/r/Z-8s1-kiIDkzgRbc@fedora Fixes: 35b471467f88 ("filemap: add filemap_get_folios_contig()") Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reported-by: Qu Wenruo <quwenruo.btrfs@gmx.com> Closes: https://lkml.kernel.org/r/b714e4de-2583-4035-b829-72cfb5eb6fc6@gmx.com Tested-by: Qu Wenruo <quwenruo.btrfs@gmx.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm/hugetlb: add a line break at the end of the format stringwangxuewen1-1/+1
Missing line break at the end of the format string. Link: https://lkml.kernel.org/r/20250407103017.2979821-1-18810879172@163.com Signed-off-by: wangxuewen <wangxuewen@kylinos.cn> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11selftests: mincore: fix tmpfs mincore test failureBaolin Wang1-14/+2
When running mincore test cases, I encountered the following failures: " mincore_selftest.c:359:check_tmpfs_mmap:Expected ra_pages (511) == 0 (0) mincore_selftest.c:360:check_tmpfs_mmap:Read-ahead pages found in memory check_tmpfs_mmap: Test terminated by assertion FAIL global.check_tmpfs_mmap not ok 5 global.check_tmpfs_mmap FAILED: 4 / 5 tests passed " The reason for the test case failure is that my system automatically enabled tmpfs large folio allocation by adding the 'transparent_hugepage_tmpfs=always' cmdline. However, the test case still expects the tmpfs mounted on /dev/shm to allocate small folios, which leads to assertion failures when verifying readahead pages. As discussed with David, there's no reason to continue checking the readahead logic for tmpfs. Drop it to fix this issue. Link: https://lkml.kernel.org/r/9a00856cc6a8b4e46f4ab8b1af11ce5fc1a31851.1744025467.git.baolin.wang@linux.alibaba.com Fixes: d635ccdb435c ("mm: shmem: add a kernel command line to change the default huge policy for tmpfs") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm/hugetlb: fix set_max_huge_pages() when there are surplus pagesJinjiang Tu1-1/+18
In set_max_huge_pages(), min_count is computed taking into account surplus huge pages, which might lead in some cases to not be able to free huge pages and end up accounting them as surplus instead. One way to solve it is to subtract surplus_huge_pages directly, but we cannot do it blindly because there might be surplus pages that are also free pages, which might happen when we fail to restore the vmemmap for optimized hvo pages. So we could be subtracting the same page twice. In order to work this around, let us first compute the number of free persistent pages, and use that along with surplus pages to compute min_count. Steps to reproduce: 1) create 5 hugetlb folios in Node0 2) run a program to use all the hugetlb folios 3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5 hugetlb folios in Node0 are accounted as surplus. 4) create 5 hugetlb folios in Node1 5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios The result: Node0 Node1 Total 5 5 Free 0 5 Surp 5 5 The result with this patch: Node0 Node1 Total 5 0 Free 0 0 Surp 5 0 Link: https://lkml.kernel.org/r/20250409055957.3774471-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20250407124706.2688092-1-tujinjiang@huawei.com Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm/cma: report base address of single range correctlyFrank van der Linden1-8/+11
The cma_declare_contiguous_nid code was refactored by commit c009da4258f9 ("mm, cma: support multiple contiguous ranges, if requested"), so that it could use an internal function to attempt a single range area first, and then try a multi-range one. However, that meant that the actual base address used for the !fixed case (base == 0) wasn't available one level up to be printed in the informational message, and it would always end up printing a base address of 0 in the boot message. Make the internal function take a phys_addr_t pointer to the base address, so that the value is available to the caller. [fvdl@google.com: v2] Link: https://lkml.kernel.org/r/20250408164000.3215690-1-fvdl@google.com Link: https://lkml.kernel.org/r/20250407165435.2567898-1-fvdl@google.com Fixes: c009da4258f9 ("mm, cma: support multiple contiguous ranges, if requested") Signed-off-by: Frank van der Linden <fvdl@google.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/linux-mm/CAMuHMdVWviQ7O9yBFE3f=ev0eVb1CnsQvR6SKtEROBbM6z7g3w@mail.gmail.com/ Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm: page_alloc: speed up fallbacks in rmqueue_bulk()Johannes Weiner1-34/+79
The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") as the root cause of a 56.4% regression in vm-scalability::lru-file-mmap-read. Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion"), as the root cause for a regression in worst-case zone->lock+irqoff hold times. Both of these patches modify the page allocator's fallback path to be less greedy in an effort to stave off fragmentation. The flip side of this is that fallbacks are also less productive each time around, which means the fallback search can run much more frequently. Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the percpu cache by allocating a large batch of pages in a loop. It highlights how once the native freelists are exhausted, the fallback code first scans orders top-down for whole blocks to claim, then falls back to a bottom-up search for the smallest buddy to steal. For the next batch page, it goes through the same thing again. This can be made more efficient. Since rmqueue_bulk() holds the zone->lock over the entire batch, the freelists are not subject to outside changes; when the search for a block to claim has already failed, there is no point in trying again for the next page. Modify __rmqueue() to remember the last successful fallback mode, and restart directly from there on the next rmqueue_bulk() iteration. Oliver confirms that this improves beyond the regression that the test robot reported against c2f6ea38fc1b: commit: f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap") c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy") acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput Carlos confirms that worst-case times are almost fully recovered compared to before the earlier culprit patch: 2dd482ba627d (before freelist hygiene): 1ms c0cd6f557b90 (after freelist hygiene): 90ms next-20250319 (steal smallest buddy): 280ms this patch : 8ms [jackmanb@google.com: comment updates] Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com [hannes@cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin] Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion") Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Brendan Jackman <jackmanb@google.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Carlos Song <carlos.song@nxp.com> Tested-by: Carlos Song <carlos.song@nxp.com> Tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com Reviewed-by: Brendan Jackman <jackmanb@google.com> Tested-by: Shivank Garg <shivankg@amd.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [6.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11kunit: slub: add module descriptionArnd Bergmann1-0/+1
Modules without a description now cause a warning: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/tests/slub_kunit.o Link: https://lkml.kernel.org/r/20250324173242.1501003-10-arnd@kernel.org Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Guenetr Roeck <linux@roeck-us.net> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Pei Xiao <xiaopei01@kylinos.cn> Cc: Rae Moar <rmoar@google.com> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm/kasan: add module decriptionArnd Bergmann1-0/+1
Modules without a description now cause a warning: WARNING: modpost: missing MODULE_DESCRIPTION() in mm/kasan/kasan_test.o [akpm@linux-foundation.org: update description text, per Andrey] Link: https://lkml.kernel.org/r/20250324173242.1501003-9-arnd@kernel.org Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Cc: Macro Elver <elver@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nihar Chaithanya <niharchaithanya@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11ucs2_string: add module descriptionArnd Bergmann1-0/+1
Modules without a description now cause a warning: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/ucs2_string.o Link: https://lkml.kernel.org/r/20250324173242.1501003-7-arnd@kernel.org Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11zlib: add module descriptionArnd Bergmann1-0/+1
Modules without a description now cause a warning: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/zlib_inflate/zlib_inflate.o Link: https://lkml.kernel.org/r/20250324173242.1501003-6-arnd@kernel.org Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11fpga: tests: add module descriptionsArnd Bergmann3-0/+3
Modules without a description now cause a warning: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fpga/tests/fpga-bridge-test.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fpga/tests/fpga-mgr-test.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/fpga/tests/fpga-region-test.o Link: https://lkml.kernel.org/r/20250324173242.1501003-4-arnd@kernel.org Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Hao Wu <hao.wu@intel.com> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Cc: Marco Pagani <marpagan@redhat.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Moritz Fischer <mdf@kernel.org> Cc: Russ Weight <russ.weight@linux.dev> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Cc: Tom Rix <trix@redhat.com> Cc: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11samples/livepatch: add module descriptionsArnd Bergmann6-0/+6
Every module should have a description, so add one for each of these modules. [akpm@linux-foundation.org: match the livepatch-callbacks-mod.c description, per Petr] Link: https://lkml.kernel.org/r/20250324173242.1501003-3-arnd@kernel.org Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Easwar Hariharan <eahariha@linux.microsoft.com> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11ASN.1: add module descriptionArnd Bergmann1-0/+1
This is needed to avoid a build warning: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/asn1_decoder.o Link: https://lkml.kernel.org/r/20250324173242.1501003-2-arnd@kernel.org Fixes: 6c6c1fc09de3 ("modpost: require a MODULE_DESCRIPTION()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Stehen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm/vma: add give_up_on_oom option on modify/merge, use in uffd releaseLorenzo Stoakes3-7/+66
Currently, if a VMA merge fails due to an OOM condition arising on commit merge or a failure to duplicate anon_vma's, we report this so the caller can handle it. However there are cases where the caller is only ostensibly trying a merge, and doesn't mind if it fails due to this condition. Since we do not want to introduce an implicit assumption that we only actually modify VMAs after OOM conditions might arise, add a 'give up on oom' option and make an explicit contract that, should this flag be set, we absolutely will not modify any VMAs should OOM arise and just bail out. Since it'd be very unusual for a user to try to vma_modify() with this flag set but be specifying a range within a VMA which ends up being split (which can fail due to rlimit issues, not only OOM), we add a debug warning for this condition. The motivating reason for this is uffd release - syzkaller (and Pedro Falcato's VERY astute analysis) found a way in which an injected fault on allocation, triggering an OOM condition on commit merge, would result in uffd code becoming confused and treating an error value as if it were a VMA pointer. To avoid this, we make use of this new VMG flag to ensure that this never occurs, utilising the fact that, should we be clearing entire VMAs, we do not wish an OOM event to be reported to us. Many thanks to Pedro Falcato for his excellent analysis and Jann Horn for his insightful and intelligent analysis of the situation, both of whom were instrumental in this fix. Link: https://lkml.kernel.org/r/20250321100937.46634-1-lorenzo.stoakes@oracle.com Reported-by: syzbot+20ed41006cf9d842c2b5@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001e.GAE@google.com/ Fixes: 47b16d0462a4 ("mm: abort vma_modify() on merge out of memory failure") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: Pedro Falcato <pfalcato@suse.de> Suggested-by: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11selftests/mm: generate a temporary mountpoint for cgroup filesystemMark Brown2-3/+3
Currently if the filesystem for the cgroups version it wants to use is not mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh tests will attempt to mount it on the hard coded path /dev/cgroup/memory, deleting that directory when the test finishes. This will fail if there is not a preexisting directory at that path, and since the directory is deleted subsequent runs of the test will fail. Instead of relying on this hard coded directory name use mktemp to generate a temporary directory to use as a mountpoint, fixing both the assumption and the disruption caused by deleting a preexisting directory. This means that if the relevant cgroup filesystem is not already mounted then we rely on having coreutils (which provides mktemp) installed. I suspect that many current users are relying on having things automounted by default, and given that the script relies on bash it's probably not an unreasonable requirement. Link: https://lkml.kernel.org/r/20250404-kselftest-mm-cgroup2-detection-v1-1-3dba6d32ba8c@kernel.org Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11MAINTAINERS: add Andrew and Baoquan as kexec maintainersBaoquan He1-0/+3
Add Andrew as kexec/kdump maintainer because he has been helping review and merge ready kexec/kdump patches. And I would like to nominate myself as kexec maintainer because I always try to review generic kexec codes. Link: https://lkml.kernel.org/r/20250328104402.16826-1-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Simon Horman <horms@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm/hugetlb: fix nid mismatch in alloc_surplus_hugetlb_folio()Liu Shixin1-1/+1
It's wrong to use nid directly since the nid may be changed in allocation. Use folio_nid() to obtain the nid of folio instead. Fix: 2273dea6b1e1 ("mm/hugetlb: update nr_huge_pages and surplus_huge_pages together") Link: https://lkml.kernel.org/r/20250403064138.2867929-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm/page_alloc: avoid second trylock of zone->lockAlexei Starovoitov1-6/+9
spin_trylock followed by spin_lock will cause extra write cache access. If the lock is contended it may cause unnecessary cache line bouncing and will execute redundant irq restore/save pair. Therefore, check alloc/fpi_flags first and use spin_trylock or spin_lock. Link: https://lkml.kernel.org/r/20250331002809.94758-1-alexei.starovoitov@gmail.com Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mm/compaction: fix bug in hugetlb handling pathwayVishal Moola (Oracle)1-3/+3
The compaction code doesn't take references on pages until we're certain we should attempt to handle it. In the hugetlb case, isolate_or_dissolve_huge_page() may return -EBUSY without taking a reference to the folio associated with our pfn. If our folio's refcount drops to 0, compound_nr() becomes unpredictable, making low_pfn and nr_scanned unreliable. The user-visible effect is minimal - this should rarely happen (if ever). Fix this by storing the folio statistics earlier on the stack (just like the THP and Buddy cases). Also revert commit 66fe1cf7f581 ("mm: compaction: use helper compound_nr in isolate_migratepages_block") to make backporting easier. Link: https://lkml.kernel.org/r/20250401021025.637333-1-vishal.moola@gmail.com Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages") Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11lib/iov_iter: fix to increase non slab folio refcountSheng Yong1-1/+1
When testing EROFS file-backed mount over v9fs on qemu, I encountered a folio UAF issue. The page sanity check reports the following call trace. The root cause is that pages in bvec are coalesced across a folio bounary. The refcount of all non-slab folios should be increased to ensure p9_releas_pages can put them correctly. BUG: Bad page state in process md5sum pfn:18300 page: refcount:0 mapcount:0 mapping:00000000d5ad8e4e index:0x60 pfn:0x18300 head: order:0 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 aops:z_erofs_aops ino:30b0f dentry name(?):"GoogleExtServicesCn.apk" flags: 0x100000000000041(locked|head|node=0|zone=1) raw: 0100000000000041 dead000000000100 dead000000000122 ffff888014b13bd0 raw: 0000000000000060 0000000000000020 00000000ffffffff 0000000000000000 head: 0100000000000041 dead000000000100 dead000000000122 ffff888014b13bd0 head: 0000000000000060 0000000000000020 00000000ffffffff 0000000000000000 head: 0100000000000000 0000000000000000 ffffffffffffffff 0000000000000000 head: 0000000000000010 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set Call Trace: dump_stack_lvl+0x53/0x70 bad_page+0xd4/0x220 __free_pages_ok+0x76d/0xf30 __folio_put+0x230/0x320 p9_release_pages+0x179/0x1f0 p9_virtio_zc_request+0xa2a/0x1230 p9_client_zc_rpc.constprop.0+0x247/0x700 p9_client_read_once+0x34d/0x810 p9_client_read+0xf3/0x150 v9fs_issue_read+0x111/0x360 netfs_unbuffered_read_iter_locked+0x927/0x1390 netfs_unbuffered_read_iter+0xa2/0xe0 vfs_iocb_iter_read+0x2c7/0x460 erofs_fileio_rq_submit+0x46b/0x5b0 z_erofs_runqueue+0x1203/0x21e0 z_erofs_readahead+0x579/0x8b0 read_pages+0x19f/0xa70 page_cache_ra_order+0x4ad/0xb80 filemap_readahead.isra.0+0xe7/0x150 filemap_get_pages+0x7aa/0x1890 filemap_read+0x320/0xc80 vfs_read+0x6c6/0xa30 ksys_read+0xf9/0x1c0 do_syscall_64+0x9e/0x1a0 entry_SYSCALL_64_after_hwframe+0x71/0x79 Link: https://lkml.kernel.org/r/20250401144712.1377719-1-shengyong1@xiaomi.com Fixes: b9c0e49abfca ("mm: decline to manipulate the refcount on a slab page") Signed-off-by: Sheng Yong <shengyong1@xiaomi.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11mseal: fix typo and style in documentationTakuma Watanabe1-1/+1
Correct a typo in the mseal documentation. Link: https://lkml.kernel.org/r/20250318115521.11654-1-takumaw1990@gmail.com Signed-off-by: Takuma Watanabe <takumaw1990@gmail.com> Cc: Jeff Xu <jeffxu@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11locking/local_lock, mm: replace localtry_ helpers with local_trylock_t typeAlexei Starovoitov3-190/+114
Partially revert commit 0aaddfb06882 ("locking/local_lock: Introduce localtry_lock_t"). Remove localtry_*() helpers, since localtry_lock() name might be misinterpreted as "try lock". Introduce local_trylock[_irqsave]() helpers that only work with newly introduced local_trylock_t type. Note that attempt to use local_trylock[_irqsave]() with local_lock_t will cause compilation failure. Usage and behavior in !PREEMPT_RT: local_lock_t lock; // sizeof(lock) == 0 local_lock(&lock); // preempt disable local_lock_irqsave(&lock, ...); // irq save if (local_trylock_irqsave(&lock, ...)) // compilation error local_trylock_t lock; // sizeof(lock) == 4 local_lock(&lock); // preempt disable, acquired = 1 local_lock_irqsave(&lock, ...); // irq save, acquired = 1 if (local_trylock(&lock)) // if (!acquired) preempt disable, acquired = 1 if (local_trylock_irqsave(&lock, ...)) // if (!acquired) irq save, acquired = 1 The existing local_lock_*() macros can be used either with local_lock_t or local_trylock_t. With local_trylock_t they set acquired = 1 while local_unlock_*() clears it. In !PREEMPT_RT local_lock_irqsave(local_lock_t *) disables interrupts to protect critical section, but it doesn't prevent NMI, so the fully reentrant code cannot use local_lock_irqsave(local_lock_t *) for exclusive access. The local_lock_irqsave(local_trylock_t *) helper disables interrupts and sets acquired=1, so local_trylock_irqsave(local_trylock_t *) from NMI attempting to acquire the same lock will return false. In PREEMPT_RT local_lock_irqsave() maps to preemptible spin_lock(). Map local_trylock_irqsave() to preemptible spin_trylock(). When in hard IRQ or NMI return false right away, since spin_trylock() is not safe due to explicit locking in the underneath rt_spin_trylock() implementation. Removing this explicit locking and attempting only "trylock" is undesired due to PI implications. The local_trylock() without _irqsave can be used to avoid the cost of disabling/enabling interrupts by only disabling preemption, so local_trylock() in an interrupt attempting to acquire the same lock will return false. Note there is no need to use local_inc for acquired variable, since it's a percpu variable with strict nesting scopes. Note that guard(local_lock)(&lock) works only for "local_lock_t lock". The patch also makes sure that local_lock_release(l) is called before WRITE_ONCE(l->acquired, 0). Though IRQs are disabled at this point the local_trylock() from NMI will succeed and local_lock_acquire(l) will warn. Link: https://lkml.kernel.org/r/20250403025514.41186-1-alexei.starovoitov@gmail.com Fixes: 0aaddfb06882 ("locking/local_lock: Introduce localtry_lock_t") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Martin KaFai Lau <martin.lau@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-11test suite: use %zu to print size_tMatthew Wilcox (Oracle)1-2/+2
On 32-bit, we can't use %lu to print a size_t variable and gcc warns us about it. Shame it doesn't warn about it on 64-bit. Link: https://lkml.kernel.org/r/20250403003311.359917-1-Liam.Howlett@oracle.com Fixes: cc86e0c2f306 ("radix tree test suite: add support for slab bulk APIs") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>