wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2024-07-11	fs/ntfs3: Keep runs for $MFT::$ATTR_DATA and $MFT::$ATTR_BITMAP	Konstantin Komarov	1	-1/+2
	We skip the run_truncate_head call also for $MFT::$ATTR_BITMAP. Otherwise wnd_map()/run_lookup_entry will not find the disk position for the bitmap parts. Fixes: 0e5b044cbf3a ("fs/ntfs3: Refactoring attr_set_size to restore after errors") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-07-11	fs/ntfs3: Missed error return	Konstantin Komarov	1	-1/+1
	Fixes: 3f3b442b5ad2 ("fs/ntfs3: Add bitmap") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-07-11	fs/ntfs3: Fix the format of the "nocase" mount option	Konstantin Komarov	1	-1/+1
	The 'nocase' option was mistakenly added as fsparam_flag_no with the 'no' prefix, causing the case-insensitive mode to require the 'nonocase' option to be enabled. Fixes: a3a956c78efa ("fs/ntfs3: Add option "nocase"") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Fix field-spanning write in INDEX_HDR	Konstantin Komarov	2	-6/+7
	Fields flags and res[3] replaced with one 4 byte flags. Fixes: 4534a70b7056 ("fs/ntfs3: Add headers and misc files") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert attr_wof_frame_info() to use a folio	Matthew Wilcox (Oracle)	3	-20/+21
	This involves converting all users of offs_page to offs_folio, but it's worth it because we get rid of a lot of hidden calls to compound_head(). We continue to use order-0 folios here, and convert back to a struct page to call ntfs_bio_pages(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert ni_readpage_cmpr() to take a folio	Matthew Wilcox (Oracle)	3	-8/+9
	We still use an array of pages for the decompression, but this removes a few calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert ntfs_get_frame_pages() to use a folio	Matthew Wilcox (Oracle)	1	-8/+9
	The function still takes an array of pages, but use a folio internally. This function would deadlock against itself if used with large folios (as it locks each page), so we can be a little sloppy with the conversion back from folio to page for now. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Remove calls to set/clear the error flag	Matthew Wilcox (Oracle)	1	-4/+0
	Nobody checks the error flag on ntfs3 folios, so stop setting and clearing it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert attr_make_nonresident to use a folio	Matthew Wilcox (Oracle)	1	-14/+13
	Fetch a folio from the page cache instead of a page and operate on it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> [almaz.alexandrovich@paragon-software.com: skip using folio_end_read] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert attr_data_write_resident to use a folio	Matthew Wilcox (Oracle)	3	-11/+7
	Now that both callers of attr_data_write_resident() have a folio, pass it in and use memcpy_from_folio() to handle all the gnarly highmem multi-page problems. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert ntfs_write_end() to work on a folio	Matthew Wilcox (Oracle)	1	-7/+8
	Convert the passed page back into a folio and use the folio APIs, saving a few hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert attr_data_read_resident() to take a folio	Matthew Wilcox (Oracle)	3	-22/+13
	Now that all three callers have a folio, pass it in and use folio_fill_tail() to do the hard work of filling the folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert ntfs_write_begin to use a folio	Matthew Wilcox (Oracle)	1	-8/+9
	Retrieve a folio from the page cache instead of a precise page. This function is now large folio safe, but its called function is not. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	ntfs3: Convert ntfs_read_folio to use a folio	Matthew Wilcox (Oracle)	1	-5/+4
	Remove the struct page conversion, and use a folio throughout. We still convert back to a struct page for calling some internal functions, but those will change soon. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Drop stray '\' (backslash) in formatting string	Andy Shevchenko	1	-1/+1
	CHECK /home/andy/prj/linux-topic-uart/fs/ntfs3/super.c fs/ntfs3/super.c:471:23: warning: unknown escape sequence: '\%' Drop stray '\' (backslash) in formatting string. Fixes: d27e202b9ac4 ("fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Add some comments	Konstantin Komarov	3	-13/+10
	Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Rename variables	Konstantin Komarov	1	-5/+5
	New names make it easier to read code. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Add a check for attr_names and oatbl	Konstantin Komarov	1	-6/+32
	Added out-of-bound checking for *ane (ATTR_NAME_ENTRY). Reported-by: lei lu <llfamsec@gmail.com> Fixes: 865e7a7700d93 ("fs/ntfs3: Reduce stack usage") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Validate ff offset	lei lu	1	-1/+5
	This adds sanity checks for ff offset. There is a check on rt->first_free at first, but walking through by ff without any check. If the second ff is a large offset. We may encounter an out-of-bound read. Signed-off-by: lei lu <llfamsec@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Correct undo if ntfs_create_inode failed	Konstantin Komarov	1	-1/+9
	Clusters allocated for Extended Attributes, must be freed when rolling back inode creation. Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: One more reason to mark inode bad	Konstantin Komarov	1	-1/+3
	In addition to returning an error, mark the node as bad. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Replace inode_trylock with inode_lock	Konstantin Komarov	1	-4/+1
	The issue was detected due to xfstest 465 failing. Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-26	fs/ntfs3: Fix attr_insert_range at end of file	Konstantin Komarov	1	-2/+7
	If the offset is equal to or greater than the end of file, an error is returned. For such operations (i.e., inserting a hole at the end of file), ftruncate(2) should be used. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Add missing .dirty_folio in address_space_operations	Konstantin Komarov	1	-0/+1
	After switching from pages to folio [1], it became evident that the initialization of .dirty_folio for page cache operations was missed for compressed files. [1] https://lore.kernel.org/ntfs3/20240422193203.3534108-1-willy@infradead.org Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Remove sync_blockdev_nowait()	Konstantin Komarov	1	-1/+1
	Flush the file mapping directly. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Fix getting file type	Konstantin Komarov	1	-1/+2
	An additional condition causes the mft record to be read from disk and get the file type dt_type. Fixes: 22457c047ed97 ("fs/ntfs3: Modified fix directory element type detection") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Missed NI_FLAG_UPDATE_PARENT setting	Konstantin Komarov	1	-0/+1
	Fixes: be71b5cba2e64 ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Deny getting attr data block in compressed frame	Konstantin Komarov	1	-0/+13
	Attempting to retrieve an attribute data block in a compressed frame is ignored. Fixes: be71b5cba2e64 ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Fix transform resident to nonresident for compressed files	Konstantin Komarov	1	-5/+8
	Сorrected calculation of required space len (in clusters) for attribute data storage in case of compression. Fixes: be71b5cba2e64 ("fs/ntfs3: Add attrib operations") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Remove unused macros MAXIMUM_REPARSE_DATA_BUFFER_SIZE	Konstantin Komarov	1	-3/+0
	MAXIMUM_REPARSE_DATA_BUFFER_SIZE is not used in the code. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Use macros NTFS_LABEL_MAX_LENGTH instead of hardcoded value	Konstantin Komarov	1	-5/+6
	To check the length of the volume label, the existing constant NTFS_LABEL_MAX_LENGTH could be used. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Simplify initialization of $AttrDef and $UpCase	Konstantin Komarov	3	-53/+52
	Replaced the two loops reading $AttrDef and $UpCase with the inode_read_data() function. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Merge synonym COMPRESSION_UNIT and NTFS_LZNT_CUNIT	Konstantin Komarov	5	-7/+4
	COMPRESSION_UNIT and NTFS_LZNT_CUNIT mean the same thing (1u<<NTFS_LZNT_CUNIT) determines the size for compression (in clusters). COMPRESS_MAX_CLUSTER is not used in the code. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-06-07	fs/ntfs3: Remove unused function	Konstantin Komarov	2	-31/+0
	At the moment, the function turned out to be unused, so I removed it. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2024-05-26	Linux 6.10-rc1	Linus Torvalds	1	-3/+3

2024-05-26	mm: percpu: Include smp.h in alloc_tag.h	Kent Overstreet	1	-0/+1
	percpu.h depends on smp.h, but doesn't include it directly because of circular header dependency issues; percpu.h is needed in a bunch of low level headers. This fixes a randconfig build error on mips: include/linux/alloc_tag.h: In function '__alloc_tag_ref_set': include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration] Reported-by: kernel test robot <lkp@intel.com> Fixes: 24e44cc22aa3 ("mm: percpu: enable per-cpu allocation tagging") Closes: https://lore.kernel.org/oe-kbuild-all/202405210052.DIrMXJNz-lkp@intel.com/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-05-26	Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"	Arnaldo Carvalho de Melo	4	-103/+68
	This reverts commit 617824a7f0f73e4de325cf8add58e55b28c12493. This made a simple 'perf record -e cycles:pp make -j199' stop working on the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed at length in the threads in the Link tags below. The fix provided by Ian wasn't acceptable and work to fix this will take time we don't have at this point, so lets revert this and work on it on the next devel cycle. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Cc: Ethan Adams <j.ethan.adams@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-24	cifs: Fix missing set of remote_i_size	David Howells	2	-3/+4
	Occasionally, the generic/001 xfstest will fail indicating corruption in one of the copy chains when run on cifs against a server that supports FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs). The problem is that the remote_i_size value isn't updated by cifs_setsize() when called by smb2_duplicate_extents(), but i_size is. This may cause cifs_remap_file_range() to then skip the bit after calling ->duplicate_extents() that sets sizes. Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before calling cifs_setsize() to set i_size. This means we don't then need to call netfs_resize_file() upon return from ->duplicate_extents(), but we also fix the test to compare against the pre-dup inode size. [Note that this goes back before the addition of remote_i_size with the netfs_inode struct. It should probably have been setting cifsi->server_eof previously.] Fixes: cfc63fc8126a ("smb3: fix cached file size problems in duplicate extents (reflink)") Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev Signed-off-by: Steve French <stfrench@microsoft.com>
2024-05-24	cifs: Fix smb3_insert_range() to move the zero_point	David Howells	1	-0/+1
	Fix smb3_insert_range() to move the zero_point over to the new EOF. Without this, generic/147 fails as reads of data beyond the old EOF point return zeroes. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: David Howells <dhowells@redhat.com> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev Signed-off-by: Steve French <stfrench@microsoft.com>
2024-05-24	mm/ksm: fix possible UAF of stable_node	Chengming Zhou	1	-1/+2
	The commit 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication limit") introduced a possible failure case in the stable_tree_insert(), where we may free the new allocated stable_node_dup if we fail to prepare the missing chain node. Then that kfolio return and unlock with a freed stable_node set... And any MM activities can come in to access kfolio->mapping, so UAF. Fix it by moving folio_set_stable_node() to the end after stable_node is inserted successfully. Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev Fixes: 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication limit") Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	mm/memory-failure: fix handling of dissolved but not taken off from buddy pages	Miaohe Lin	1	-2/+2
	When I did memory failure tests recently, below panic occurs: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1\|zone=2\|lastcpupid=0x7fff) raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page)) ------------[ cut here ]------------ kernel BUG at include/linux/page-flags.h:1009! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:__del_page_from_free_list+0x151/0x180 RSP: 0018:ffffa49c90437998 EFLAGS: 00000046 RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0 RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69 R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80 R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009 FS: 00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0 Call Trace: <TASK> __rmqueue_pcplist+0x23b/0x520 get_page_from_freelist+0x26b/0xe40 __alloc_pages_noprof+0x113/0x1120 __folio_alloc_noprof+0x11/0xb0 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130 __alloc_fresh_hugetlb_folio+0xe7/0x140 alloc_pool_huge_folio+0x68/0x100 set_max_huge_pages+0x13d/0x340 hugetlb_sysctl_handler_common+0xe8/0x110 proc_sys_call_handler+0x194/0x280 vfs_write+0x387/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff916114887 RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887 RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003 RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0 R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004 R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00 </TASK> Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- And before the panic, there had an warning about bad page state: BUG: Bad page state in process page-types pfn:8cee00 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00 flags: 0x6fffe0000000000(node=1\|zone=2\|lastcpupid=0x7fff) page_type: 0xffffff7f(buddy) raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000 raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000 page dumped because: nonzero mapcount Modules linked in: mce_inject hwpoison_inject CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22 Call Trace: <TASK> dump_stack_lvl+0x83/0xa0 bad_page+0x63/0xf0 free_unref_page+0x36e/0x5c0 unpoison_memory+0x50b/0x630 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110 debugfs_attr_write+0x42/0x60 full_proxy_write+0x5b/0x80 vfs_write+0xcd/0x550 ksys_write+0x64/0xe0 do_syscall_64+0xc2/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f189a514887 RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887 RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003 RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8 R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040 </TASK> The root cause should be the below race: memory_failure try_memory_failure_hugetlb me_huge_page __page_handle_poison dissolve_free_hugetlb_folio drain_all_pages -- Buddy page can be isolated e.g. for compaction. take_page_off_buddy -- Failed as page is not in the buddy list. -- Page can be putback into buddy after compaction. page_ref_inc -- Leads to buddy page with refcnt = 1. Then unpoison_memory() can unpoison the page and send the buddy page back into buddy list again leading to the above bad page state warning. And bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to allocate this page. Fix this issue by only treating __page_handle_poison() as successful when it returns 1. Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again	Yuanyuan Zhong	1	-2/+7
	After switching smaps_rollup to use VMA iterator, searching for next entry is part of the condition expression of the do-while loop. So the current VMA needs to be addressed before the continue statement. Otherwise, with some VMAs skipped, userspace observed memory consumption from /proc/pid/smaps_rollup will be smaller than the sum of the corresponding fields from /proc/pid/smaps. Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end") Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	nilfs2: fix potential hang in nilfs_detach_log_writer()	Ryusuke Konishi	1	-3/+18
	Syzbot has reported a potential hang in nilfs_detach_log_writer() called during nilfs2 unmount. Analysis revealed that this is because nilfs_segctor_sync(), which synchronizes with the log writer thread, can be called after nilfs_segctor_destroy() terminates that thread, as shown in the call trace below: nilfs_detach_log_writer nilfs_segctor_destroy nilfs_segctor_kill_thread --> Shut down log writer thread flush_work nilfs_iput_work_func nilfs_dispose_list iput nilfs_evict_inode nilfs_transaction_commit nilfs_construct_segment (if inode needs sync) nilfs_segctor_sync --> Attempt to synchronize with log writer thread * DEADLOCK * Fix this issue by changing nilfs_segctor_sync() so that the log writer thread returns normally without synchronizing after it terminates, and by forcing tasks that are already waiting to complete once after the thread terminates. The skipped inode metadata flushout will then be processed together in the subsequent cleanup work in nilfs_segctor_destroy(). Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Cc: "Bai, Shuangpeng" <sjb7183@psu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	nilfs2: fix unexpected freezing of nilfs_segctor_sync()	Ryusuke Konishi	1	-4/+13
	A potential and reproducible race issue has been identified where nilfs_segctor_sync() would block even after the log writer thread writes a checkpoint, unless there is an interrupt or other trigger to resume log writing. This turned out to be because, depending on the execution timing of the log writer thread running in parallel, the log writer thread may skip responding to nilfs_segctor_sync(), which causes a call to schedule() waiting for completion within nilfs_segctor_sync() to lose the opportunity to wake up. The reason why waking up the task waiting in nilfs_segctor_sync() may be skipped is that updating the request generation issued using a shared sequence counter and adding an wait queue entry to the request wait queue to the log writer, are not done atomically. There is a possibility that log writing and request completion notification by nilfs_segctor_wakeup() may occur between the two operations, and in that case, the wait queue entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of nilfs_segctor_sync() will be carried over until the next request occurs. Fix this issue by performing these two operations simultaneously within the lock section of sc_state_lock. Also, following the memory barrier guidelines for event waiting loops, move the call to set_current_state() in the same location into the event waiting loop to ensure that a memory barrier is inserted just before the event condition determination. Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Cc: "Bai, Shuangpeng" <sjb7183@psu.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	nilfs2: fix use-after-free of timer for log writer thread	Ryusuke Konishi	1	-6/+19
	Patch series "nilfs2: fix log writer related issues". This bug fix series covers three nilfs2 log writer-related issues, including a timer use-after-free issue and potential deadlock issue on unmount, and a potential freeze issue in event synchronization found during their analysis. Details are described in each commit log. This patch (of 3): A use-after-free issue has been reported regarding the timer sc_timer on the nilfs_sc_info structure. The problem is that even though it is used to wake up a sleeping log writer thread, sc_timer is not shut down until the nilfs_sc_info structure is about to be freed, and is used regardless of the thread's lifetime. Fix this issue by limiting the use of sc_timer only while the log writer thread is alive. Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: "Bai, Shuangpeng" <sjb7183@psu.edu> Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	selftests/mm: fix build warnings on ppc64	Michael Ellerman	2	-0/+2
	Fix warnings like: In file included from uffd-unit-tests.c:8: uffd-unit-tests.c: In function `uffd_poison_handle_fault': uffd-common.h:45:33: warning: format `%llu' expects argument of type `long long unsigned int', but argument 3 has type `__u64' {aka `long unsigned int'} [-Wformat=] By switching to unsigned long long for u64 for ppc64 builds. Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.au Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	arm64: patching: fix handling of execmem addresses	Will Deacon	1	-1/+1
	Klara Modin reported warnings for a kernel configured with BPF_JIT but without MODULES: [ 44.131296] Trying to vfree() bad address (000000004a17c299) [ 44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) [ 44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G D W 6.9.0-01786-g2c9e5d4a0082 #25 [ 44.158229] Hardware name: Raspberry Pi 3 Model B (DT) [ 44.164433] Workqueue: events bpf_prog_free_deferred [ 44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) [ 44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) [ 44.188772] sp : ffff800082a13c70 [ 44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000 [ 44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000 [ 44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880 [ 44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006 [ 44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002 [ 44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000 [ 44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370 [ 44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370 [ 44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240 [ 44.275703] Call trace: [ 44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) [ 44.283858] vfree (mm/vmalloc.c:3322) [ 44.287835] execmem_free (mm/execmem.c:70) [ 44.292347] bpf_jit_free_exec+0x10/0x1c [ 44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006) [ 44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195) [ 44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474) [ 44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785) [ 44.317785] process_one_work (kernel/workqueue.c:3273) [ 44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2)) [ 44.327292] kthread (kernel/kthread.c:388) [ 44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861) The problem is because bpf_arch_text_copy() silently fails to write to the read-only area as a result of patch_map() faulting and the resulting -EFAULT being chucked away. Update patch_map() to use CONFIG_EXECMEM instead of CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses. Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reported-by: Klara Modin <klarasmodin@gmail.com> Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.com Tested-by: Klara Modin <klarasmodin@gmail.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation	Dev Jain	1	-22/+49
	Reset nr_hugepages to zero before the start of the test. If a non-zero number of hugepages is already set before the start of the test, the following problems arise: - The probability of the test getting OOM-killed increases. Proof: The test wants to run on 80% of available memory to prevent OOM-killing (see original code comments). Let the value of mem_free at the start of the test, when nr_hugepages = 0, be x. In the other case, when nr_hugepages > 0, let the memory consumed by hugepages be y. In the former case, the test operates on 0.8 * x of memory. In the latter, the test operates on 0.8 * (x - y) of memory, with y already filled, hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 * x. Q.E.D - The probability of a bogus test success increases. Proof: Let the memory consumed by hugepages be greater than 25% of x, with x and y defined as above. The definition of compaction_index is c_index = (x - y)/z where z is the memory consumed by hugepages after trying to increase them again. In check_compaction(), we set the number of hugepages to zero, and then increase them back; the probability that they will be set back to consume at least y amount of memory again is very high (since there is not much delay between the two attempts of changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption). Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3 hence, c_index can always be forced to be less than 3, thereby the test succeeding always. Q.E.D Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages	Dev Jain	1	-0/+2
	Currently, the test tries to set nr_hugepages to zero, but that is not actually done because the file offset is not reset after read(). Fix that using lseek(). Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-24	selftests/mm: compaction_test: fix bogus test success on Aarch64	Dev Jain	1	-7/+13
	Patch series "Fixes for compaction_test", v2. The compaction_test memory selftest introduces fragmentation in memory and then tries to allocate as many hugepages as possible. This series addresses some problems. On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since compaction_index becomes 0, which is less than 3, due to no division by zero exception being raised. We fix that by checking for division by zero. Secondly, correctly set the number of hugepages to zero before trying to set a large number of them. Now, consider a situation in which, at the start of the test, a non-zero number of hugepages have been already set (while running the entire selftests/mm suite, or manually by the admin). The test operates on 80% of memory to avoid OOM-killer invocation, and because some memory is already blocked by hugepages, it would increase the chance of OOM-killing. Also, since mem_free used in check_compaction() is the value before we set nr_hugepages to zero, the chance that the compaction_index will be small is very high if the preset nr_hugepages was high, leading to a bogus test success. This patch (of 3): Currently, if at runtime we are not able to allocate a huge page, the test will trivially pass on Aarch64 due to no exception being raised on division by zero while computing compaction_index. Fix that by checking for nr_hugepages == 0. Anyways, in general, avoid a division by zero by exiting the program beforehand. While at it, fix a typo, and handle the case where the number of hugepages may overflow an integer. Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>