aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-02-06bcachefs: fix incorrect pointer check in __bch2_subvolume_delete()Jeongjun Park1-1/+6
For some unknown reason, checks on struct bkey_s_c_snapshot and struct bkey_s_c_snapshot_tree pointers are missing. Therefore, I think it would be appropriate to fix the incorrect pointer checking through this patch. Fixes: 4bd06f07bcb5 ("bcachefs: Fixes for snapshot_tree.master_subvol") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs docs: SubmittingPatches.rstKent Overstreet3-0/+100
Add an (initial?) patch submission checklist, focusing mainly on testing. Yes, all patches must be tested, and that starts (but does not end) with the patch author. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-02Linux 6.14-rc1Linus Torvalds1-2/+2
2025-02-02tools/power turbostat: version 2025.02.02Len Brown1-1/+1
Summary of Changes since 2024.11.30: Fix regression in 2023.11.07 that affinitized forked child in one-shot mode. Harden one-shot mode against hotplug online/offline Enable RAPL SysWatt column by default. Add initial PTL, CWF platform support. Harden initial PMT code in response to early use. Enable first built-in PMT counter: CWF c1e residency Refuse to run on unsupported platforms without --force, to encourage updating to a version that supports the system, and to avoid no-so-useful measurement results. Signed-off-by: Len Brown <len.brown@intel.com>
2025-02-01MAINTAINERS: include linux-mm for xarray maintenanceAndrew Morton1-0/+1
MM developers have an interest in the xarray code. Cc: David Gow <davidgow@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01revert "xarray: port tests to kunit"Andrew Morton16-410/+294
Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build when compiing the xarray userspace test harness code. Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com Cc: David Gow <davidgow@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Tamir Duberstein <tamird@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01MAINTAINERS: add lib/test_xarray.cTamir Duberstein1-0/+1
Ensure test-only changes are sent to the relevant maintainer. Link: https://lkml.kernel.org/r/20250129-xarray-test-maintainer-v1-1-482e31f30f47@gmail.com Signed-off-by: Tamir Duberstein <tamird@gmail.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mailmap, MAINTAINERS, docs: update Carlos's email addressCarlos Bilbao3-6/+8
Update .mailmap to reflect my new (and final) primary email address, carlos.bilbao@kernel.org. Also update contact information in files Documentation/translations/sp_SP/index.rst and MAINTAINERS. Link: https://lkml.kernel.org/r/20250130012248.1196208-1-carlos.bilbao@kernel.org Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Carlos Bilbao <bilbao@vt.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm/hugetlb: fix hugepage allocation for interleaved memory nodesRitesh Harjani (IBM)1-1/+1
gather_bootmem_prealloc() assumes the start nid as 0 and size as num_node_state(N_MEMORY). That means in case if memory attached numa nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan few of these nodes. Since memory attached numa nodes can be interleaved in any fashion, hence ensure that the current code checks for all numa node ids (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that it can distributes all nr_node_ids among the these many no. threads. e.g. qemu cmdline ======================== numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20" mem_cmd="-object memory-backend-ram,id=mem1,size=16G" w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): ========================== ~ # cat /proc/meminfo |grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 0 kB with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): =========================== ~ # cat /proc/meminfo |grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 2 HugePages_Free: 2 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 2097152 kB Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com> Suggested-by: Muchun Song <muchun.song@linux.dev> Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Gang Li <gang.li@linux.dev> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm: gup: fix infinite loop within __get_longterm_lockedZhaoyang Huang1-10/+4
We can run into an infinite loop in __get_longterm_locked() when collect_longterm_unpinnable_folios() finds only folios that are isolated from the LRU or were never added to the LRU. This can happen when all folios to be pinned are never added to the LRU, for example when vm_ops->fault allocated pages using cma_alloc() and never added them to the LRU. Fix it by simply taking a look at the list in the single caller, to see if anything was added. [zhaoyang.huang@unisoc.com: move definition of local] Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()") Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aijun Sun <aijun.sun@unisoc.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm, swap: fix reclaim offset calculation error during allocationKairui Song1-1/+1
There is a code error that will cause the swap entry allocator to reclaim and check the whole cluster with an unexpected tail offset instead of the part that needs to be reclaimed. This may cause corruption of the swap map, so fix it. Link: https://lkml.kernel.org/r/20250130115131.37777-1-ryncsn@gmail.com Fixes: 3b644773eefd ("mm, swap: reduce contention on device lock") Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chris Li <chrisl@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01.mailmap: update email address for Christopher ObbardChristopher Obbard1-0/+1
Update my email address. Link: https://lkml.kernel.org/r/20250122-wip-obbardc-update-email-v2-1-12bde6b79ad0@linaro.org Signed-off-by: Christopher Obbard <christopher.obbard@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01kfence: skip __GFP_THISNODE allocations on NUMA systemsMarco Elver1-0/+2
On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on a particular node, and failure to allocate on the desired node will result in a failed allocation. Skip __GFP_THISNODE allocations if we are running on a NUMA system, since KFENCE can't guarantee which node its pool pages are allocated on. Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Alexander Potapenko <glider@google.com> Cc: Chistoph Lameter <cl@linux.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01nilfs2: fix possible int overflows in nilfs_fiemap()Nikita Zhandarovich1-3/+3
Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result by being prepared to go through potentially maxblocks == INT_MAX blocks, the value in n may experience an overflow caused by left shift of blkbits. While it is extremely unlikely to occur, play it safe and cast right hand expression to wider type to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com Fixes: 622daaff0a89 ("nilfs2: fiemap support") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm: compaction: use the proper flag to determine watermarksyangge1-4/+25
There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of memory. I have configured 16GB of CMA memory on each NUMA node, and starting a 32GB virtual machine with device passthrough is extremely slow, taking almost an hour. Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB of no-CMA memory on a NUMA node can be used as virtual machine memory. There is 16GB of free CMA memory on a NUMA node, which is sufficient to pass the order-0 watermark check, causing the __compaction_suitable() function to consistently return true. For costly allocations, if the __compaction_suitable() function always returns true, it causes the __alloc_pages_slowpath() function to fail to exit at the appropriate point. This prevents timely fallback to allocating memory on other nodes, ultimately resulting in excessively long virtual machine startup times. Call trace: __alloc_pages_slowpath if (compact_result == COMPACT_SKIPPED || compact_result == COMPACT_DEFERRED) goto nopage; // should exit __alloc_pages_slowpath() from here We could use the real unmovable allocation context to have __zone_watermark_unusable_free() subtract CMA pages, and thus we won't pass the order-0 check anymore once the non-CMA part is exhausted. There is some risk that in some different scenario the compaction could in fact migrate pages from the exhausted non-CMA part of the zone to the CMA part and succeed, and we'll skip it instead. But only __GFP_NORETRY allocations should be affected in the immediate "goto nopage" when compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY anyway and won't fail without trying to compact-migrate the non-CMA pageblocks into CMA pageblocks first, so it should be fine. After this fix, it only takes a few tens of seconds to start a 32GB virtual machine with device passthrough functionality. Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/ Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com Signed-off-by: yangge <yangge1116@126.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01kernel: be more careful about dup_mmap() failures and uprobe registeringLiam R. Howlett2-3/+18
If a memory allocation fails during dup_mmap(), the maple tree can be left in an unsafe state for other iterators besides the exit path. All the locks are dropped before the exit_mmap() call (in mm/mmap.c), but the incomplete mm_struct can be reached through (at least) the rmap finding the vmas which have a pointer back to the mm_struct. Up to this point, there have been no issues with being able to find an mm_struct that was only partially initialised. Syzbot was able to make the incomplete mm_struct fail with recent forking changes, so it has been proven unsafe to use the mm_struct that hasn't been initialised, as referenced in the link below. Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to invalid mm") fixed the uprobe access, it does not completely remove the race. This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the oom side (even though this is extremely unlikely to be selected as an oom victim in the race window), and sets MMF_UNSTABLE to avoid other potential users from using a partially initialised mm_struct. When registering vmas for uprobe, skip the vmas in an mm that is marked unstable. Modifying a vma in an unstable mm may cause issues if the mm isn't fully initialised. Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/ Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm/fake-numa: handle cases with no SRAT infoBruno Faccini1-1/+10
Handle more gracefully cases where no SRAT information is available, like in VMs with no Numa support, and allow fake-numa configuration to complete successfully in these cases Link: https://lkml.kernel.org/r/20250127171623.1523171-1-bfaccini@nvidia.com Fixes: 63db8170bf34 (“mm/fake-numa: allow later numa node hotplug”) Signed-off-by: Bruno Faccini <bfaccini@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm: kmemleak: fix upper boundary check for physical address objectsCatalin Marinas1-1/+1
Memblock allocations are registered by kmemleak separately, based on their physical address. During the scanning stage, it checks whether an object is within the min_low_pfn and max_low_pfn boundaries and ignores it otherwise. With the recent addition of __percpu pointer leak detection (commit 6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak started reporting leaks in setup_zone_pageset() and setup_per_cpu_pageset(). These were caused by the node_data[0] object (initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn) boundary. The non-strict upper boundary check introduced by commit 84c326299191 ("mm: kmemleak: check physical address when scan") causes the pg_data_t object to be ignored (not scanned) and the __percpu pointers it contains to be reported as leaks. Make the max_low_pfn upper boundary check strict when deciding whether to ignore a physical address object and not scan it. Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Patrick Wang <patrick.wang.shcn@gmail.com> Cc: <stable@vger.kernel.org> [6.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mailmap: add an entry for Hamza MahfoozHamza Mahfooz1-0/+1
Map my previous work email to my current one. Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hans verkuil <hverkuil@xs4all.nl> Cc: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01MAINTAINERS: mailmap: update Yosry Ahmed's email addressYosry Ahmed2-1/+2
Moving to a linux.dev email address. Link: https://lkml.kernel.org/r/20250123231344.817358-1-yosry.ahmed@linux.dev Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01scripts/gdb: fix aarch64 userspace detection in get_current_taskJan Kiszka1-1/+1
At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long which lets the right-shift always return 0. Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Barry Song <baohua@kernel.org> Cc: Kieran Bingham <kbingham@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm/vmscan: accumulate nr_demoted for accurate demotion statisticsLi Zhijian1-3/+4
In shrink_folio_list(), demote_folio_list() can be called 2 times. Currently stat->nr_demoted will only store the last nr_demoted( the later nr_demoted is always zero, the former nr_demoted will get lost), as a result number of demoted pages is not accurate. Accumulate the nr_demoted count across multiple calls to demote_folio_list(), ensuring accurate reporting of demotion statistics. [lizhijian@fujitsu.com: introduce local nr_demoted to fix nr_reclaimed double counting] Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu> Tested-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01ocfs2: fix incorrect CPU endianness conversion causing mount failureHeming Zhao1-1/+1
Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") introduced a regression bug. The blksz_bits value is already converted to CPU endian in the previous code; therefore, the code shouldn't use le32_to_cpu() anymore. Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()Hyeonggon Yoo1-1/+1
Commit c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()") introduces is_first_zpdesc() function. However, the function is only used when CONFIG_DEBUG_VM=y. When building with LLVM=1 and W=1 option, the following warning is generated: $ make -j12 W=1 LLVM=1 mm/zsmalloc.o mm/zsmalloc.c:455:20: error: function 'is_first_zpdesc' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] 455 | static inline bool is_first_zpdesc(struct zpdesc *zpdesc) | ^~~~~~~~~~~~~~~ 1 error generated. Fix the warning by adding __maybe_unused attribute to the function. No functional change intended. Link: https://lkml.kernel.org/r/20250127231631.4363-1-42.hyeyoo@gmail.com Fixes: c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501240958.4ILzuBrH-lkp@intel.com/ Cc: Alex Shi <alexs@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01mm/vmscan: fix hard LOCKUP in function isolate_lru_foliosliuye2-1/+6
This fixes the following hard lockup in isolate_lru_folios() during memory reclaim. If the LRU mostly contains ineligible folios this may trigger watchdog. watchdog: Watchdog detected hard LOCKUP on cpu 173 RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0 Call Trace: _raw_spin_lock_irqsave+0x31/0x40 folio_lruvec_lock_irqsave+0x5f/0x90 folio_batch_move_lru+0x91/0x150 lru_add_drain_per_cpu+0x1c/0x40 process_one_work+0x17d/0x350 worker_thread+0x27b/0x3a0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 lruvec->lru_lock owner: PID: 2865 TASK: ffff888139214d40 CPU: 40 COMMAND: "kswapd0" #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde [exception RIP: isolate_lru_folios+403] RIP: ffffffffa597df53 RSP: ffffc90006fb7c28 RFLAGS: 00000002 RAX: 0000000000000001 RBX: ffffc90006fb7c60 RCX: ffffea04a2196f88 RDX: ffffc90006fb7c60 RSI: ffffc90006fb7c60 RDI: ffffea04a2197048 RBP: ffff88812cbd3010 R8: ffffea04a2197008 R9: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: ffffea04a2197008 R13: ffffea04a2197048 R14: ffffc90006fb7de8 R15: 0000000003e3e937 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 <NMI exception stack> #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788 #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0 #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354 #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238 crash> Scenario: User processe are requesting a large amount of memory and keep page active. Then a module continuously requests memory from ZONE_DMA32 area. Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached. However pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. Reproduce: Terminal 1: Construct to continuously increase pages active(anon). mkdir /tmp/memory mount -t tmpfs -o size=1024000M tmpfs /tmp/memory dd if=/dev/zero of=/tmp/memory/block bs=4M tail /tmp/memory/block Terminal 2: vmstat -a 1 active will increase. procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ... r b swpd free inact active si so bi bo 1 0 0 1445623076 45898836 83646008 0 0 0 1 0 0 1445623076 43450228 86094616 0 0 0 1 0 0 1445623076 41003480 88541364 0 0 0 1 0 0 1445623076 38557088 90987756 0 0 0 1 0 0 1445623076 36109688 93435156 0 0 0 1 0 0 1445619552 33663256 95881632 0 0 0 1 0 0 1445619804 31217140 98327792 0 0 0 1 0 0 1445619804 28769988 100774944 0 0 0 1 0 0 1445619804 26322348 103222584 0 0 0 1 0 0 1445619804 23875592 105669340 0 0 0 cat /proc/meminfo | head Active(anon) increase. MemTotal: 1579941036 kB MemFree: 1445618500 kB MemAvailable: 1453013224 kB Buffers: 6516 kB Cached: 128653956 kB SwapCached: 0 kB Active: 118110812 kB Inactive: 11436620 kB Active(anon): 115345744 kB Inactive(anon): 945292 kB When the Active(anon) is 115345744 kB, insmod module triggers the ZONE_DMA32 watermark. perf record -e vmscan:mm_vmscan_lru_isolate -aR perf script isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2 nr_skipped=2 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844 nr_skipped=28835844 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29 nr_skipped=29 nr_taken=0 lru=active_anon isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0 nr_skipped=0 nr_taken=0 lru=active_anon See nr_scanned=28835844. 28835844 * 4k = 115343376KB approximately equal to 115345744 kB. If increase Active(anon) to 1000G then insmod module triggers the ZONE_DMA32 watermark. hard lockup will occur. In my device nr_scanned = 0000000003e3e937 when hard lockup. Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB. [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53 ffffc90006fb7c30: 0000000000000020 0000000000000000 ffffc90006fb7c40: ffffc90006fb7d40 ffff88812cbd3000 ffffc90006fb7c50: ffffc90006fb7d30 0000000106fb7de8 ffffc90006fb7c60: ffffea04a2197008 ffffea0006ed4a48 ffffc90006fb7c70: 0000000000000000 0000000000000000 ffffc90006fb7c80: 0000000000000000 0000000000000000 ffffc90006fb7c90: 0000000000000000 0000000000000000 ffffc90006fb7ca0: 0000000000000000 0000000003e3e937 ffffc90006fb7cb0: 0000000000000000 0000000000000000 ffffc90006fb7cc0: 8d7c0b56b7874b00 ffff88812cbd3000 About the Fixes: Why did it take eight years to be discovered? The problem requires the following conditions to occur: 1. The device memory should be large enough. 2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area. 3. The memory in ZONE_DMA32 needs to reach the watermark. If the memory is not large enough, or if the usage design of ZONE_DMA32 area memory is reasonable, this problem is difficult to detect. notes: The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL, but other suitable scenarios may also trigger the problem. Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis") Signed-off-by: liuye <liuye@kylinos.cn> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01sh: boards: Use imply to enable hardware with complex dependenciesGeert Uytterhoeven1-2/+2
If CONFIG_I2C=n: WARNING: unmet direct dependencies detected for SND_SOC_AK4642 Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && I2C [=n] Selected by [y]: - SH_7724_SOLUTION_ENGINE [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y] WARNING: unmet direct dependencies detected for SND_SOC_DA7210 Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && SND_SOC_I2C_AND_SPI [=n] Selected by [y]: - SH_ECOVEC [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y] Fix this by replacing select by imply, instead of adding a dependency on I2C. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501240836.OvXqmANX-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-02-01sh: Migrate to the generic rule for built-in DTBMasahiro Yamada4-7/+7
Commit 654102df2ac2 ("kbuild: add generic support for built-in boot DTBs") introduced generic support for built-in DTBs. Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled. To keep consistency across architectures, this commit also renames CONFIG_USE_BUILTIN_DTB to CONFIG_BUILTIN_DTB, and CONFIG_BUILTIN_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-02-01sh: irq: Use seq_put_decimal_ull_width() for decimal valuesDavid Wang1-2/+2
On a system with n CPUs and m interrupts, there will be n*m decimal values yielded via seq_printf(.."%10u "..) which has significant costs parsing format string and is less efficient than seq_put_decimal_ull_width(). Stress reading /proc/interrupts indicates ~30% performance improvement with this patch. Signed-off-by: David Wang <00107082@163.com> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-01-31Remove stale generated 'genheaders' fileLinus Torvalds1-0/+0
This bogus stale file was added in commit 101971298be2 ("riscv: add a warning when physical memory address overflows"). It's the old location for what is now 'security/selinux/genheaders'. It looks like it got incorrectly committed back when that file was in the old location, and then rebasing kept the bogus file alive. Reported-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/linux-riscv/20250201020003.GA77370@sol.localdomain/ Fixes: 101971298be2 ("riscv: add a warning when physical memory address overflows") Cc: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-31Revert "media: uvcvideo: Require entities to have a non-zero unique ID"Thadeu Lima de Souza Cascardo1-43/+27
This reverts commit 3dd075fe8ebbc6fcbf998f81a75b8c4b159a6195. Tomasz has reported that his device, Generalplus Technology Inc. 808 Camera, with ID 1b3f:2002, stopped being detected: $ ls -l /dev/video* zsh: no matches found: /dev/video* [ 7.230599] usb 3-2: Found multiple Units with ID 5 This particular device is non-compliant, having both the Output Terminal and Processing Unit with ID 5. uvc_scan_fallback, though, is able to build a chain. However, when media elements are added and uvc_mc_create_links call uvc_entity_by_id, it will get the incorrect entity, media_create_pad_link will WARN, and it will fail to register the entities. In order to reinstate support for such devices in a timely fashion, reverting the fix for these warnings is appropriate. A proper fix that considers the existence of such non-compliant devices will be submitted in a later development cycle. Reported-by: Tomasz Sikora <sikora.tomus@gmail.com> Fixes: 3dd075fe8ebb ("media: uvcvideo: Require entities to have a non-zero unique ID") Cc: stable@vger.kernel.org Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ricardo Ribalda <ribalda@chromium.org> Link: https://lore.kernel.org/r/20250114200045.1401644-1-cascardo@igalia.com Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2025-02-01kbuild: fix Clang LTO with CONFIG_OBJTOOL=nMasahiro Yamada2-4/+8
Since commit bede169618c6 ("kbuild: enable objtool for *.mod.o and additional kernel objects"), Clang LTO builds do not perform any optimizations when CONFIG_OBJTOOL is disabled (e.g., for ARCH=arm64). This is because every LLVM bitcode file is immediately converted to ELF format before the object files are linked together. This commit fixes the breakage. Fixes: bede169618c6 ("kbuild: enable objtool for *.mod.o and additional kernel objects") Reported-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Yonghong Song <yonghong.song@linux.dev>
2025-02-01kbuild: Strip runtime const RELA sections correctlyArd Biesheuvel4-16/+7
Due to the fact that runtime const ELF sections are named without a leading period or double underscore, the RSTRIP logic that removes the static RELA sections from vmlinux fails to identify them. This results in a situation like below, where some sections that were supposed to get removed are left behind. [Nr] Name Type Address Off Size ES Flg Lk Inf Al [58] runtime_shift_d_hash_shift PROGBITS ffffffff83500f50 2900f50 000014 00 A 0 0 1 [59] .relaruntime_shift_d_hash_shift RELA 0000000000000000 55b6f00 000078 18 I 70 58 8 [60] runtime_ptr_dentry_hashtable PROGBITS ffffffff83500f68 2900f68 000014 00 A 0 0 1 [61] .relaruntime_ptr_dentry_hashtable RELA 0000000000000000 55b6f78 000078 18 I 70 60 8 [62] runtime_ptr_USER_PTR_MAX PROGBITS ffffffff83500f80 2900f80 000238 00 A 0 0 1 [63] .relaruntime_ptr_USER_PTR_MAX RELA 0000000000000000 55b6ff0 000d50 18 I 70 62 8 So tweak the match expression to strip all sections starting with .rel. While at it, consolidate the logic used by RISC-V, s390 and x86 into a single shared Makefile library command. Link: https://lore.kernel.org/all/CAHk-=wjk3ynjomNvFN8jf9A1k=qSc=JFF591W00uXj-qqNUxPQ@mail.gmail.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-31cifs: Fix parsing native symlinks directory/file typePali Rohár4-0/+62
As SMB protocol distinguish between symlink to directory and symlink to file, add some mechanism to disallow resolving incompatible types. When SMB symlink is of the directory type, ensure that its target path ends with slash. This forces Linux to not allow resolving such symlink to file. And when SMB symlink is of the file type and its target path ends with slash then returns an error as such symlink is unresolvable. Such symlink always points to invalid location as file cannot end with slash. As POSIX server does not distinguish between symlinks to file and symlink directory, do not apply this change for symlinks from POSIX SMB server. For POSIX SMB servers, this change does nothing. This mimics Windows behavior of native SMB symlinks. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-31cifs: update internal version numberSteve French1-2/+2
To 2.53 Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-31cifs: Add support for creating WSL-style symlinksPali Rohár1-12/+53
This change implements support for creating new symlink in WSL-style by Linux cifs client when -o reparse=wsl mount option is specified. WSL-style symlink uses reparse point with tag IO_REPARSE_TAG_LX_SYMLINK and symlink target location is stored in reparse buffer in UTF-8 encoding prefixed by 32-bit flags. Flags bits are unknown, but it was observed that WSL always sets flags to value 0x02000000. Do same in Linux cifs client. New symlinks would be created in WSL-style only in case the mount option -o reparse=wsl is specified, which is not by default. So default CIFS mounts are not affected by this change. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-31smb3: add support for IAKerbSteve French5-3/+12
There are now more servers which advertise support for IAKerb (passthrough Kerberos authentication via proxy). IAKerb is a public extension industry standard Kerberos protocol that allows a client without line-of-sight to a Domain Controller to authenticate. There can be cases where we would fail to mount if the server only advertises the OID for IAKerb in SPNEGO/GSSAPI. Add code to allow us to still upcall to userspace in these cases to obtain the Kerberos ticket. Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-31cifs: Fix struct FILE_ALL_INFOPali Rohár2-11/+7
struct FILE_ALL_INFO for level 263 (0x107) used by QPathInfo does not have any IndexNumber, AccessFlags, IndexNumber1, CurrentByteOffset, Mode or AlignmentRequirement members. So remove all of them. Also adjust code in move_cifs_info_to_smb2() function which converts struct FILE_ALL_INFO to struct smb2_file_all_info. Fixed content of struct FILE_ALL_INFO was verified that is correct against: * [MS-CIFS] section 2.2.8.3.10 SMB_QUERY_FILE_ALL_INFO * Samba server implementation of trans2 query file/path for level 263 * Packet structure tests against Windows SMB servers This change fixes CIFSSMBQFileInfo() and CIFSSMBQPathInfo() functions which directly copy received FILE_ALL_INFO network buffers into kernel structures of FILE_ALL_INFO type. struct FILE_ALL_INFO is the response structure returned by the SMB server. So the incorrect definition of this structure can lead to returning bogus information in stat() call. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-31cifs: Add support for creating NFS-style symlinksPali Rohár1-8/+39
CIFS client is currently able to parse NFS-style symlinks, but is not able to create them. This functionality is useful when the mounted SMB share is used also by Windows NFS server (on Windows Server 2012 or new). It allows interop of symlinks between SMB share mounted by Linux CIFS client and same export from Windows NFS server mounted by some NFS client. New symlinks would be created in NFS-style only in case the mount option -o reparse=nfs is specified, which is not by default. So default CIFS mounts are not affected by this change. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-31cifs: Add support for creating native Windows socketsPali Rohár5-0/+45
Native Windows sockets created by WinSock on Windows 10 April 2018 Update (version 1803) or Windows Server 2019 (version 1809) or later versions is reparse point with IO_REPARSE_TAG_AF_UNIX tag, with empty reparse point data buffer and without any EAs. Create AF_UNIX sockets in this native format if -o nonativesocket was not specified. This change makes AF_UNIX sockets created by Linux CIFS client compatible with AF_UNIX sockets created by Windows applications on NTFS volumes. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-31block: force noio scope in blk_mq_freeze_queueChristoph Hellwig26-84/+136
When block drivers or the core block code perform allocations with a frozen queue, this could try to recurse into the block device to reclaim memory and deadlock. Thus all allocations done by a process that froze a queue need to be done without __GFP_IO and __GFP_FS. Instead of tying to track all of them down, force a noio scope as part of freezing the queue. Note that nvme is a bit of a mess here due to the non-owner freezes, and they will be addressed separately. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-31Revert "mips: fix shmctl/semctl/msgctl syscall for o32"Thomas Bogendoerfer1-3/+3
This reverts commit bc7584e009c39375294794f7ca751a6b2622c425. The split IPC system calls for o32 have been introduced with modern version only. Changing this breaks ABI. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2025-01-30MAINTAINERS: Update my email addressBrian Cain2-1/+3
Qualcomm is migrating away from quicinc.com email addresses towards ones with *.qualcomm.com. Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com> Signed-off-by: Brian Cain <bcain@quicinc.com>
2025-01-30hexagon: Fix unbalanced spinlock in die()Lin Yujun1-1/+3
die executes holding the spinlock of &die.lock and unlock it after printing the oops message. However in the code if the notify_die() returns NOTIFY_STOP , die() exit with returning 1 but never unlocked the spinlock. Fix this by adding spin_unlock_irq(&die.lock) before returning. Fixes: cf9750bae262 ("Hexagon: Provide basic debugging and system trap support.") Signed-off-by: Lin Yujun <linyujun809@huawei.com> Link: https://lore.kernel.org/r/20230522025608.2515558-1-linyujun809@huawei.com Signed-off-by: Brian Cain <bcain@quicinc.com> Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
2025-01-30hexagon: Fix warning comparing pointer to 0Yang Li1-1/+1
./arch/hexagon/kernel/traps.c:138:6-7: WARNING comparing pointer to 0 Avoid pointer type value compared with 0 to make code clear. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3978 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20230208011105.80219-1-yang.lee@linux.alibaba.com Signed-off-by: Brian Cain <bcain@quicinc.com> Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
2025-01-30hexagon: Move kernel prototypes out of uapi/asm/setup.h headerThomas Huth2-12/+22
The kernel function prototypes are of no use for userspace and shouldn't get exposed in an uapi header, so let's move them into an internal header instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/r/20240502173818.58152-1-thuth@redhat.com Signed-off-by: Brian Cain <bcain@quicinc.com> Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
2025-01-30hexagon: time: Remove redundant null check for resourceHardevsinh Palaniya1-2/+1
Null check for 'resource' before assignment is unnecessary because the variable 'resource' is initialized to NULL at the beginning of the function. Signed-off-by: Hardevsinh Palaniya <hardevsinh.palaniya@siliconsignals.io> Link: https://lore.kernel.org/r/20241111142458.67854-1-hardevsinh.palaniya@siliconsignals.io Signed-off-by: Brian Cain <bcain@quicinc.com> Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
2025-01-30hexagon: fix using plain integer as NULL pointer warning in cmpxchgWillem de Bruijn1-1/+1
Sparse reports net/ipv4/inet_diag.c:1511:17: sparse: sparse: Using plain integer as NULL pointer Due to this code calling cmpxchg on a non-integer type struct inet_diag_handler * return !cmpxchg((const struct inet_diag_handler**)&inet_diag_table[type], NULL, h) ? 0 : -EEXIST; While hexagon's cmpxchg assigns an integer value to a variable of this type. __typeof__(*(ptr)) __oldval = 0; Update this assignment to cast 0 to the correct type. The original issue is easily reproduced at head with the below block, and is absent after this change. make LLVM=1 ARCH=hexagon defconfig make C=1 LLVM=1 ARCH=hexagon net/ipv4/inet_diag.o Fixes: 99a70aa051d2 ("Hexagon: Add processor and system headers") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411091538.PGSTqUBi-lkp@intel.com/ Signed-off-by: Willem de Bruijn <willemb@google.com> Tested-by: Christian Gmeiner <cgmeiner@igalia.com> Link: https://lore.kernel.org/r/20241203221736.282020-1-willemdebruijn.kernel@gmail.com Signed-off-by: Brian Cain <bcain@quicinc.com> Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
2025-01-30MAINTAINERS: add Neal to TCP maintainersJakub Kicinski1-0/+1
Neal Cardwell has been indispensable in TCP reviews and investigations, especially protocol-related. Neal is also the author of packetdrill. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250129191332.2526140-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30net: revert RTNL changes in unregister_netdevice_many_notify()Eric Dumazet1-30/+3
This patch reverts following changes: 83419b61d187 net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 2) ae646f1a0bb9 net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 1) cfa579f66656 net: no longer hold RTNL while calling flush_all_backlogs() This caused issues in layers holding a private mutex: cleanup_net() rtnl_lock(); mutex_lock(subsystem_mutex); unregister_netdevice(); rtnl_unlock(); // LOCKDEP violation rtnl_lock(); I will revisit this in next cycle, opt-in for the new behavior from safe contexts only. Fixes: cfa579f66656 ("net: no longer hold RTNL while calling flush_all_backlogs()") Fixes: ae646f1a0bb9 ("net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 1)") Fixes: 83419b61d187 ("net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 2)") Reported-by: syzbot+5b9196ecf74447172a9a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6789d55f.050a0220.20d369.004e.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250129142726.747726-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30net: hsr: fix fill_frame_info() regression vs VLAN packetsEric Dumazet1-2/+5
Stephan Wurm reported that my recent patch broke VLAN support. Apparently skb->mac_len is not correct for VLAN traffic as shown by debug traces [1]. Use instead pskb_may_pull() to make sure the expected header is present in skb->head. Many thanks to Stephan for his help. [1] kernel: skb len=170 headroom=2 headlen=170 tailroom=20 mac=(2,14) mac_len=14 net=(16,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) kernel: dev name=prp0 feat=0x0000000000007000 kernel: sk family=17 type=3 proto=0 kernel: skb headroom: 00000000: 74 00 kernel: skb linear: 00000000: 01 0c cd 01 00 01 00 d0 93 53 9c cb 81 00 80 00 kernel: skb linear: 00000010: 88 b8 00 01 00 98 00 00 00 00 61 81 8d 80 16 52 kernel: skb linear: 00000020: 45 47 44 4e 43 54 52 4c 2f 4c 4c 4e 30 24 47 4f kernel: skb linear: 00000030: 24 47 6f 43 62 81 01 14 82 16 52 45 47 44 4e 43 kernel: skb linear: 00000040: 54 52 4c 2f 4c 4c 4e 30 24 44 73 47 6f 6f 73 65 kernel: skb linear: 00000050: 83 07 47 6f 49 64 65 6e 74 84 08 67 8d f5 93 7e kernel: skb linear: 00000060: 76 c8 00 85 01 01 86 01 00 87 01 00 88 01 01 89 kernel: skb linear: 00000070: 01 00 8a 01 02 ab 33 a2 15 83 01 00 84 03 03 00 kernel: skb linear: 00000080: 00 91 08 67 8d f5 92 77 4b c6 1f 83 01 00 a2 1a kernel: skb linear: 00000090: a2 06 85 01 00 83 01 00 84 03 03 00 00 91 08 67 kernel: skb linear: 000000a0: 8d f5 92 77 4b c6 1f 83 01 00 kernel: skb tailroom: 00000000: 80 18 02 00 fe 4e 00 00 01 01 08 0a 4f fd 5e d1 kernel: skb tailroom: 00000010: 4f fd 5e cd Fixes: b9653d19e556 ("net: hsr: avoid potential out-of-bound access in fill_frame_info()") Reported-by: Stephan Wurm <stephan.wurm@a-eberle.de> Tested-by: Stephan Wurm <stephan.wurm@a-eberle.de> Closes: https://lore.kernel.org/netdev/Z4o_UC0HweBHJ_cw@PC-LX-SteWu/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250129130007.644084-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>