aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-04-17MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING sectionLorenzo Stoakes1-0/+1
Pedro has offered to review memory mapping code. He has good experience in this area and has provided excellent feedback on memory mapping series in the past so I feel he'll be a great addition. Link: https://lkml.kernel.org/r/20250416135301.43513-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilizationDavid Hildenbrand1-2/+2
In __folio_remove_rmap() for RMAP_LEVEL_PMD/RMAP_LEVEL_PUD and with CONFIG_PAGE_MAPCOUNT we first decrement the folio mapcount (and recompute mapped shared vs. mapped exclusively) to then adjust the entire mapcount. This means that another process might stumble in do_wp_page() over a PTE-mapped PMD folio that is indicated as "exclusively mapped", but still has an entire mapcount (PMD mapping), because it is racing with the process that is unmapping the folio (PMD mapping). Note that do_wp_page() will back off once it detects the remaining folio reference from the process that is in the process of unmapping the folio. This will trigger the early VM_WARN_ON_ONCE(folio_entire_mapcount(folio)) check in do_wp_page(), that can easily be reproduced by looping a couple of times over allocating a PMD THP, forking a child where we immediately unmap it again, and writing in the parent concurrently to the THP. [ 252.738129][T16470] ------------[ cut here ]------------ [ 252.739267][T16470] WARNING: CPU: 3 PID: 16470 at mm/memory.c:3738 do_wp_page+0x2a75/0x2c00 [ 252.740968][T16470] Modules linked in: [ 252.741958][T16470] CPU: 3 UID: 0 PID: 16470 Comm: ... ... [ 252.765841][T16470] <TASK> [ 252.766419][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.767558][T16470] ? rcu_is_watching+0x12/0x60 [ 252.768525][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.769645][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.770778][T16470] ? lock_acquire+0x33/0x80 [ 252.771697][T16470] ? __handle_mm_fault+0x5e8/0x3e40 [ 252.772735][T16470] ? __handle_mm_fault+0x5e8/0x3e40 [ 252.773781][T16470] __handle_mm_fault+0x1869/0x3e40 [ 252.774839][T16470] handle_mm_fault+0x22a/0x640 [ 252.775808][T16470] do_user_addr_fault+0x618/0x1000 [ 252.776847][T16470] exc_page_fault+0x68/0xd0 [ 252.777775][T16470] asm_exc_page_fault+0x26/0x30 While we could adjust the sequence in __folio_remove_rmap(), let's rater move the mapcount sanity checks after the mapcount vs. refcount stabilization phase. With this fix, a simple reproducer is happy. While at it, convert the two VM_WARN_ON_ONCE() we are moving to VM_WARN_ON_ONCE_FOLIO(). Link: https://lkml.kernel.org/r/20250415095007.569836-1-david@redhat.com Fixes: 1da190f4d0a6 ("mm: Copy-on-Write (COW) reuse support for PTE-mapped THP") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+5e8feb543ca8e12e0ede@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/67fab4fe.050a0220.2c5fcf.0011.GAE@google.com Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm, hugetlb: increment the number of pages to be reset on HVOOscar Salvador1-3/+3
commit 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]") shifted hugetlb specific stuff, and now mapping overlaps _hugetlb_cgroup field. Upon restoring the vmemmap for HVO, only the first two tail pages are reset, and this causes the check in free_tail_page_prepare() to fail as it finds an unexpected mapping value in some tails. Increment the number of pages to be reset to 4 (head + 3 tail pages) Link: https://lkml.kernel.org/r/20250415111859.376302-1-osalvador@suse.de Fixes: 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]") Signed-off-by: Oscar Salvador <osalvador@suse.de> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17writeback: fix false warning in inode_to_wb()Andreas Gruenbacher1-0/+1
inode_to_wb() is used also for filesystems that don't support cgroup writeback. For these filesystems inode->i_wb is stable during the lifetime of the inode (it points to bdi->wb) and there's no need to hold locks protecting the inode->i_wb dereference. Improve the warning in inode_to_wb() to not trigger for these filesystems. Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17docs: ABI: replace mcroce@microsoft.com with new Meta addressAhmad Fatoum2-6/+6
The Microsoft email address is bouncing: 550 5.4.1 Recipient address rejected: Access denied. So let's replace it with Matteo's current mail address. Link: https://lkml.kernel.org/r/20250414-fix-mcroce-mail-bounce-v3-1-0aed2d71f3d7@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Acked-by: Matteo Croce <teknoraver@meta.com> Link: https://lore.kernel.org/all/BYAPR15MB2504E4B02DFFB1E55871955DA1062@BYAPR15MB2504.namprd15.prod.outlook.com/ Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matteo Croce <teknoraver@meta.com> Cc: Sascha Hauer <kernel@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()Baoquan He1-2/+2
Not like fault_in_readable() or fault_in_writeable(), in fault_in_safe_writeable() local variable 'start' is increased page by page to loop till the whole address range is handled. However, it mistakenly calculates the size of the handled range with 'uaddr - start'. Fix it here. Andreas said: : In gfs2, fault_in_iov_iter_writeable() is used in : gfs2_file_direct_read() and gfs2_file_read_iter(), so this potentially : affects buffered as well as direct reads. This bug could cause those : gfs2 functions to spin in a loop. Link: https://lkml.kernel.org/r/20250410035717.473207-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20250410035717.473207-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Fixes: fe673d3f5bf1 ("mm: gup: make fault_in_safe_writeable() use fixup_user_fault()") Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Yanjun.Zhu <yanjun.zhu@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add memory advice sectionLorenzo Stoakes1-0/+14
The madvise code straddles both VMA and page table manipulation. As a result, separate it out into its own section and add maintainers/reviewers as appropriate. We additionally include the mman-common.h file as this contains the shared madvise flags and it is important we maintain this alongside madvise.c. Link: https://lkml.kernel.org/r/20250411072724.10841-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Jann Horn <jannh@google.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add mmap trace events to MEMORY MAPPINGLiam R. Howlett1-0/+1
MEMORY MAPPING does not list the mmap.h trace point file, but does list the mmap.c file. Couple the trace points with the users and authors of the trace points for notifications of updates. Link: https://lkml.kernel.org/r/20250411173328.8172-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm: memcontrol: fix swap counter leak from offline cgroupMuchun Song1-1/+1
commit 73f839b6d2ed addressed an issue regarding the swap counter leak that occurred from an offline cgroup. However, commit 89ce924f0bd4 modified the parameter from @swap_memcg to @memcg (presumably this alteration was introduced while resolving conflicts). Fix this problem by reverting this minor change. Link: https://lkml.kernel.org/r/20250410081812.10073-1-songmuchun@bytedance.com Fixes: 89ce924f0bd4 ("mm: memcontrol: move memsw charge callbacks to v1") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add MM subsection for the page allocatorVlastimil Babka1-0/+15
Add a subsection for the page allocator, including compaction as it's crucial for high-order allocations and works together with the anti-fragmentation features. Add reviewers (including myself) who voluteered. Link: https://lkml.kernel.org/r/20250410090021.72296-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Brendan Jackman <jackmanb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Christoph Lameter (Ampere) <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: update SLAB ALLOCATOR maintainersVlastimil Babka2-2/+4
With permission, reduce the number of maintainers. Create a CREDITS entry for Joonsoo (Pekka already has one). Thanks for all the work! Link: https://lkml.kernel.org/r/20250410090021.72296-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Christoph Lameter (Ampere) <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17fs/dax: fix folio splitting issue by resetting old folio order + _nr_pagesDavid Hildenbrand2-0/+18
Alison reports an issue with fsdax when large extends end up using large ZONE_DEVICE folios: [ 417.796271] BUG: kernel NULL pointer dereference, address: 0000000000000b00 [ 417.796982] #PF: supervisor read access in kernel mode [ 417.797540] #PF: error_code(0x0000) - not-present page [ 417.798123] PGD 2a5c5067 P4D 2a5c5067 PUD 2a5c6067 PMD 0 [ 417.798690] Oops: Oops: 0000 [#1] SMP NOPTI [ 417.799178] CPU: 5 UID: 0 PID: 1515 Comm: mmap Tainted: ... [ 417.800150] Tainted: [O]=OOT_MODULE [ 417.800583] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 417.801358] RIP: 0010:__lruvec_stat_mod_folio+0x7e/0x250 [ 417.801948] Code: ... [ 417.803662] RSP: 0000:ffffc90002be3a08 EFLAGS: 00010206 [ 417.804234] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000002 [ 417.804984] RDX: ffffffff815652d7 RSI: 0000000000000000 RDI: ffffffff82a2beae [ 417.805689] RBP: ffffc90002be3a28 R08: 0000000000000000 R09: 0000000000000000 [ 417.806384] R10: ffffea0007000040 R11: ffff888376ffe000 R12: 0000000000000001 [ 417.807099] R13: 0000000000000012 R14: ffff88807fe4ab40 R15: ffff888029210580 [ 417.807801] FS: 00007f339fa7a740(0000) GS:ffff8881fa9b9000(0000) knlGS:0000000000000000 [ 417.808570] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 417.809193] CR2: 0000000000000b00 CR3: 000000002a4f0004 CR4: 0000000000370ef0 [ 417.809925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 417.810622] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 417.811353] Call Trace: [ 417.811709] <TASK> [ 417.812038] folio_add_file_rmap_ptes+0x143/0x230 [ 417.812566] insert_page_into_pte_locked+0x1ee/0x3c0 [ 417.813132] insert_page+0x78/0xf0 [ 417.813558] vmf_insert_page_mkwrite+0x55/0xa0 [ 417.814088] dax_fault_iter+0x484/0x7b0 [ 417.814542] dax_iomap_pte_fault+0x1ca/0x620 [ 417.815055] dax_iomap_fault+0x39/0x40 [ 417.815499] __xfs_write_fault+0x139/0x380 [ 417.815995] ? __handle_mm_fault+0x5e5/0x1a60 [ 417.816483] xfs_write_fault+0x41/0x50 [ 417.816966] xfs_filemap_fault+0x3b/0xe0 [ 417.817424] __do_fault+0x31/0x180 [ 417.817859] __handle_mm_fault+0xee1/0x1a60 [ 417.818325] ? debug_smp_processor_id+0x17/0x20 [ 417.818844] handle_mm_fault+0xe1/0x2b0 [...] The issue is that when we split a large ZONE_DEVICE folio to order-0 ones, we don't reset the order/_nr_pages. As folio->_nr_pages overlays page[1]->memcg_data, once page[1] is a folio, it suddenly looks like it has folio->memcg_data set. And we never manually initialize folio->memcg_data in fsdax code, because we never expect it to be set at all. When __lruvec_stat_mod_folio() then stumbles over such a folio, it tries to use folio->memcg_data (because it's non-NULL) but it does not actually point at a memcg, resulting in the problem. Alison also observed that these folios sometimes have "locked" set, which is rather concerning (folios locked from the beginning ...). The reason is that the order for large folios is stored in page[1]->flags, which become the folio->flags of a new small folio. Let's fix it by adding a folio helper to clear order/_nr_pages for splitting purposes. Maybe we should reinitialize other large folio flags / folio members as well when splitting, because they might similarly cause harm once page[1] becomes a folio? At least other flags in PAGE_FLAGS_SECOND should not be set for fsdax, so at least page[1]->flags might be as expected with this fix. From a quick glimpse, initializing ->mapping, ->pgmap and ->share should re-initialize most things from a previous page[1] used by large folios that fsdax cares about. For example folio->private might not get reinitialized, but maybe that's not relevant -- no traces of it's use in fsdax code. Needs a closer look. Another thing that should be considered in the future is performing similar checks as we perform in free_tail_page_prepare() -- checking pincount etc. -- when freeing a large fsdax folio. Link: https://lkml.kernel.org/r/20250410091020.119116-1-david@redhat.com Fixes: 4996fc547f5b ("mm: let _folio_nr_pages overlay memcg_data in first tail page") Fixes: 38607c62b34b ("fs/dax: properly refcount fs dax pages") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Alison Schofield <alison.schofield@intel.com> Closes: https://lkml.kernel.org/r/Z_W9Oeg-D9FhImf3@aschofie-mobl2.lan Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: "Darrick J. Wong" <djwong@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()Kirill A. Shutemov4-2/+31
When the last page in the zone is accepted, __accept_page() calls static_branch_dec(). This function takes cpu_hotplug_lock, which can lead to a deadlock if the allocation occurs during CPU bringup path as _cpu_up() also takes the lock. To prevent this deadlock, defer static_branch_dec() to a workqueue. Call static_branch_dec() only when the workqueue is not yet initialized. Workqueues are initialized before CPU bring up, so this will not conflict with the first scenario. Link: https://lkml.kernel.org/r/20250329171030.3942298-1-kirill.shutemov@linux.intel.com Fixes: 55ad43e8ba0f ("mm: add a helper to accept page") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17bcachefs: Fix snapshotting a subvolume, then renaming itKent Overstreet1-1/+43
Subvolume roots and the dirents that point to them are special; they don't obey the normal snapshot versioning rules because they cross snapshot boundaries. We don't keep around older versions of subvolume dirents on rename - we don't need to, because subvolume dirents are only visible in the parent subvolume, and we wouldn't be able to match up the different dirent and inode versions due to crossing the snapshot ID boundary. That means that when we rename a subvolume, that's been snapshotted, the older version of the subvolume root will become dangling - it won't have a dirent that points to it. That's expected, we just need to tell fsck that this is ok. Fixes: https://github.com/koverstreet/bcachefs/issues/856 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-17net: ethernet: mtk_eth_soc: revise QDMA packet scheduler settingsBo-Cun Chen1-3/+3
The QDMA packet scheduler suffers from a performance issue. Fix this by picking up changes from MediaTek's SDK which change to use Token Bucket instead of Leaky Bucket and fix the SPEED_1000 configuration. Fixes: 160d3a9b1929 ("net: ethernet: mtk_eth_soc: introduce MTK_NETSYS_V2 support") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/18040f60f9e2f5855036b75b28c4332a2d2ebdd8.1744764277.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-17net: ethernet: mtk_eth_soc: correct the max weight of the queue limit for 100MbpsBo-Cun Chen1-2/+2
Without this patch, the maximum weight of the queue limit will be incorrect when linked at 100Mbps due to an apparent typo. Fixes: f63959c7eec31 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/74111ba0bdb13743313999ed467ce564e8189006.1744764277.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-17net: ethernet: mtk_eth_soc: reapply mdc divider on resetBo-Cun Chen2-15/+25
In the current method, the MDC divider was reset to the default setting of 2.5MHz after the NETSYS SER. Therefore, we need to reapply the MDC divider configuration function in mtk_hw_init() after reset. Fixes: c0a440031d431 ("net: ethernet: mtk_eth_soc: set MDIO bus clock frequency") Signed-off-by: Bo-Cun Chen <bc-bocun.chen@mediatek.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/8ab7381447e6cdcb317d5b5a6ddd90a1734efcb0.1744764277.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-17spi: spi-imx: Add check for spi_imx_setupxfer()Tamura Dai1-1/+4
Add check for the return value of spi_imx_setupxfer(). spi_imx->rx and spi_imx->tx function pointer can be NULL when spi_imx_setupxfer() return error, and make NULL pointer dereference. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: 0x0 spi_imx_pio_transfer+0x50/0xd8 spi_imx_transfer_one+0x18c/0x858 spi_transfer_one_message+0x43c/0x790 __spi_pump_transfer_message+0x238/0x5d4 __spi_sync+0x2b0/0x454 spi_write_then_read+0x11c/0x200 Signed-off-by: Tamura Dai <kirinode0@gmail.com> Reviewed-by: Carlos Song <carlos.song@nxp.com> Link: https://patch.msgid.link/20250417011700.14436-1-kirinode0@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-17net: ti: icss-iep: Fix possible NULL pointer dereference for perout requestMeghana Malladi1-63/+58
The ICSS IEP driver tracks perout and pps enable state with flags. Currently when disabling pps and perout signals during icss_iep_exit(), results in NULL pointer dereference for perout. To fix the null pointer dereference issue, the icss_iep_perout_enable_hw function can be modified to directly clear the IEP CMP registers when disabling PPS or PEROUT, without referencing the ptp_perout_request structure, as its contents are irrelevant in this case. Fixes: 9b115361248d ("net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/7b1c7c36-363a-4085-b26c-4f210bee1df6@stanley.mountain/ Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250415090543.717991-4-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-17net: ti: icssg-prueth: Fix possible NULL pointer dereference inside emac_xmit_xdp_frame()Meghana Malladi1-2/+4
There is an error check inside emac_xmit_xdp_frame() function which is called when the driver wants to transmit XDP frame, to check if the allocated tx descriptor is NULL, if true to exit and return ICSSG_XDP_CONSUMED implying failure in transmission. In this case trying to free a descriptor which is NULL will result in kernel crash due to NULL pointer dereference. Fix this error handling and increase netdev tx_dropped stats in the caller of this function if the function returns ICSSG_XDP_CONSUMED. Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/70d8dd76-0c76-42fc-8611-9884937c82f5@stanley.mountain/ Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250415090543.717991-3-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-17net: ti: icssg-prueth: Fix kernel warning while bringing down network interfaceMeghana Malladi1-3/+0
During network interface initialization, the NIC driver needs to register its Rx queue with the XDP, to ensure the incoming XDP buffer carries a pointer reference to this info and is stored inside xdp_rxq_info. While this struct isn't tied to XDP prog, if there are any changes in Rx queue, the NIC driver needs to stop the Rx queue by unregistering with XDP before purging and reallocating memory. Drop page_pool destroy during Rx channel reset as this is already handled by XDP during xdp_rxq_info_unreg (Rx queue unregister), failing to do will cause the following warning: warning logs: https://gist.github.com/MeghanaMalladiTI/eb627e5dc8de24e42d7d46572c13e576 Fixes: 46eeb90f03e0 ("net: ti: icssg-prueth: Use page_pool API for RX buffer allocation") Signed-off-by: Meghana Malladi <m-malladi@ti.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250415090543.717991-2-m-malladi@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-17netfilter: conntrack: fix erronous removal of offload bitFlorian Westphal1-4/+6
The blamed commit exposes a possible issue with flow_offload_teardown(): We might remove the offload bit of a conntrack entry that has been offloaded again. 1. conntrack entry c1 is offloaded via flow f1 (f1->ct == c1). 2. f1 times out and is pushed back to slowpath, c1 offload bit is removed. Due to bug, f1 is not unlinked from rhashtable right away. 3. a new packet arrives for the flow and re-offload is triggered, i.e. f2->ct == c1. This is because lookup in flowtable skip entries with teardown bit set. 4. Next flowtable gc cycle finds f1 again 5. flow_offload_teardown() is called again for f1 and c1 offload bit is removed again, even though we have f2 referencing the same entry. This is harmless, but clearly not correct. Fix the bug that exposes this: set 'teardown = true' to have the gc callback unlink the flowtable entry from the table right away instead of the unintentional defer to the next round. Also prevent flow_offload_teardown() from fixing up the ct state more than once: We could also be called from the data path or a notifier, not only from the flowtable gc callback. NF_FLOW_TEARDOWN can never be unset, so we can use it as synchronization point: if we observe did not see a 0 -> 1 transition, then another CPU is already doing the ct state fixups for us. Fixes: 03428ca5cee9 ("netfilter: conntrack: rework offload nf_conn timeout extension logic") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-17xfs: document zoned rt specifics in admin-guideHans Holmberg1-0/+29
Document the lifetime, nolifetime and max_open_zones mount options added for zoned rt file systems. Also add documentation describing the max_open_zones sysfs attribute exposed in /sys/fs/xfs/<dev>/zoned/ Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-16net: don't try to ops lock uninitialized devsJakub Kicinski2-4/+3
We need to be careful when operating on dev while in rtnl_create_link(). Some devices (vxlan) initialize netdev_ops in ->newlink, so later on. Avoid using netdev_lock_ops(), the device isn't registered so we cannot legally call its ops or generate any notifications for it. netdev_ops_assert_locked_or_invisible() is safe to use, it checks registration status first. Reported-by: syzbot+de1c7d68a10e3f123bdd@syzkaller.appspotmail.com Fixes: 04efcee6ef8d ("net: hold instance lock during NETDEV_CHANGE") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250415151552.768373-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16ptp: ocp: fix start time alignment in ptp_ocp_signal_setSagi Maimon1-0/+1
In ptp_ocp_signal_set, the start time for periodic signals is not aligned to the next period boundary. The current code rounds up the start time and divides by the period but fails to multiply back by the period, causing misaligned signal starts. Fix this by multiplying the rounded-up value by the period to ensure the start time is the closest next period. Fixes: 4bd46bb037f8e ("ptp: ocp: Use DIV64_U64_ROUND_UP for rounding.") Signed-off-by: Sagi Maimon <maimon.sagi@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250415053131.129413-1-maimon.sagi@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: avoid refcount warnings when ds->ops->tag_8021q_vlan_del() failsVladimir Oltean1-1/+1
This is very similar to the problem and solution from commit 232deb3f9567 ("net: dsa: avoid refcount warnings when ->port_{fdb,mdb}_del returns error"), except for the dsa_port_do_tag_8021q_vlan_del() operation. Fixes: c64b9c05045a ("net: dsa: tag_8021q: add proper cross-chip notifier support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414213020.2959021-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: free routing table on probe failureVladimir Oltean1-7/+14
If complete = true in dsa_tree_setup(), it means that we are the last switch of the tree which is successfully probing, and we should be setting up all switches from our probe path. After "complete" becomes true, dsa_tree_setup_cpu_ports() or any subsequent function may fail. If that happens, the entire tree setup is in limbo: the first N-1 switches have successfully finished probing (doing nothing but having allocated persistent memory in the tree's dst->ports, and maybe dst->rtable), and switch N failed to probe, ending the tree setup process before anything is tangible from the user's PoV. If switch N fails to probe, its memory (ports) will be freed and removed from dst->ports. However, the dst->rtable elements pointing to its ports, as created by dsa_link_touch(), will remain there, and will lead to use-after-free if dereferenced. If dsa_tree_setup_switches() returns -EPROBE_DEFER, which is entirely possible because that is where ds->ops->setup() is, we get a kasan report like this: ================================================================== BUG: KASAN: slab-use-after-free in mv88e6xxx_setup_upstream_port+0x240/0x568 Read of size 8 at addr ffff000004f56020 by task kworker/u8:3/42 Call trace: __asan_report_load8_noabort+0x20/0x30 mv88e6xxx_setup_upstream_port+0x240/0x568 mv88e6xxx_setup+0xebc/0x1eb0 dsa_register_switch+0x1af4/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 Allocated by task 42: __kasan_kmalloc+0x84/0xa0 __kmalloc_cache_noprof+0x298/0x490 dsa_switch_touch_ports+0x174/0x3d8 dsa_register_switch+0x800/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 Freed by task 42: __kasan_slab_free+0x48/0x68 kfree+0x138/0x418 dsa_register_switch+0x2694/0x2ae0 mv88e6xxx_register_switch+0x1b8/0x2a8 mv88e6xxx_probe+0xc4c/0xf60 mdio_probe+0x78/0xb8 really_probe+0x2b8/0x5a8 __driver_probe_device+0x164/0x298 driver_probe_device+0x78/0x258 __device_attach_driver+0x274/0x350 The simplest way to fix the bug is to delete the routing table in its entirety. dsa_tree_setup_routing_table() has no problem in regenerating it even if we deleted links between ports other than those of switch N, because dsa_link_touch() first checks whether the port pair already exists in dst->rtable, allocating if not. The deletion of the routing table in its entirety already exists in dsa_tree_teardown(), so refactor that into a function that can also be called from the tree setup error path. In my analysis of the commit to blame, it is the one which added dsa_link elements to dst->rtable. Prior to that, each switch had its own ds->rtable which is freed when the switch fails to probe. But the tree is potentially persistent memory. Fixes: c5f51765a1f6 ("net: dsa: list DSA links in the fabric") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414213001.2957964-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: clean up FDB, MDB, VLAN entries on unbindVladimir Oltean1-3/+35
As explained in many places such as commit b117e1e8a86d ("net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del"), DSA is written given the assumption that higher layers have balanced additions/deletions. As such, it only makes sense to be extremely vocal when those assumptions are violated and the driver unbinds with entries still present. But Ido Schimmel points out a very simple situation where that is wrong: https://lore.kernel.org/netdev/ZDazSM5UsPPjQuKr@shredder/ (also briefly discussed by me in the aforementioned commit). Basically, while the bridge bypass operations are not something that DSA explicitly documents, and for the majority of DSA drivers this API simply causes them to go to promiscuous mode, that isn't the case for all drivers. Some have the necessary requirements for bridge bypass operations to do something useful - see dsa_switch_supports_uc_filtering(). Although in tools/testing/selftests/net/forwarding/local_termination.sh, we made an effort to popularize better mechanisms to manage address filters on DSA interfaces from user space - namely macvlan for unicast, and setsockopt(IP_ADD_MEMBERSHIP) - through mtools - for multicast, the fact is that 'bridge fdb add ... self static local' also exists as kernel UAPI, and might be useful to someone, even if only for a quick hack. It seems counter-productive to block that path by implementing shim .ndo_fdb_add and .ndo_fdb_del operations which just return -EOPNOTSUPP in order to prevent the ndo_dflt_fdb_add() and ndo_dflt_fdb_del() from running, although we could do that. Accepting that cleanup is necessary seems to be the only option. Especially since we appear to be coming back at this from a different angle as well. Russell King is noticing that the WARN_ON() triggers even for VLANs: https://lore.kernel.org/netdev/Z_li8Bj8bD4-BYKQ@shell.armlinux.org.uk/ What happens in the bug report above is that dsa_port_do_vlan_del() fails, then the VLAN entry lingers on, and then we warn on unbind and leak it. This is not a straight revert of the blamed commit, but we now add an informational print to the kernel log (to still have a way to see that bugs exist), and some extra comments gathered from past years' experience, to justify the logic. Fixes: 0832cd9f1f02 ("net: dsa: warn if port lists aren't empty in dsa_port_teardown") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414212930.2956310-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: mv88e6xxx: fix -ENOENT when deleting VLANs and MST is unsupportedVladimir Oltean1-1/+12
Russell King reports that on the ZII dev rev B, deleting a bridge VLAN from a user port fails with -ENOENT: https://lore.kernel.org/netdev/Z_lQXNP0s5-IiJzd@shell.armlinux.org.uk/ This comes from mv88e6xxx_port_vlan_leave() -> mv88e6xxx_mst_put(), which tries to find an MST entry in &chip->msts associated with the SID, but fails and returns -ENOENT as such. But we know that this chip does not support MST at all, so that is not surprising. The question is why does the guard in mv88e6xxx_mst_put() not exit early: if (!sid) return 0; And the answer seems to be simple: the sid comes from vlan.sid which supposedly was previously populated by mv88e6xxx_vtu_get(). But some chip->info->ops->vtu_getnext() implementations do not populate vlan.sid, for example see mv88e6185_g1_vtu_getnext(). In that case, later in mv88e6xxx_port_vlan_leave() we are using a garbage sid which is just residual stack memory. Testing for sid == 0 covers all cases of a non-bridge VLAN or a bridge VLAN mapped to the default MSTI. For some chips, SID 0 is valid and installed by mv88e6xxx_stu_setup(). A chip which does not support the STU would implicitly only support mapping all VLANs to the default MSTI, so although SID 0 is not valid, it would be sufficient, if we were to zero-initialize the vlan structure, to fix the bug, due to the coincidence that a test for vlan.sid == 0 already exists and leads to the same (correct) behavior. Another option which would be sufficient would be to add a test for mv88e6xxx_has_stu() inside mv88e6xxx_mst_put(), symmetric to the one which already exists in mv88e6xxx_mst_get(). But that placement means the caller will have to dereference vlan.sid, which means it will access uninitialized memory, which is not nice even if it ignores it later. So we end up making both modifications, in order to not rely just on the sid == 0 coincidence, but also to avoid having uninitialized structure fields which might get temporarily accessed. Fixes: acaf4d2e36b3 ("net: dsa: mv88e6xxx: MST Offloading") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414212913.2955253-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: dsa: mv88e6xxx: avoid unregistering devlink regions which were never registeredVladimir Oltean1-1/+2
Russell King reports that a system with mv88e6xxx dereferences a NULL pointer when unbinding this driver: https://lore.kernel.org/netdev/Z_lRkMlTJ1KQ0kVX@shell.armlinux.org.uk/ The crash seems to be in devlink_region_destroy(), which is not NULL tolerant but is given a NULL devlink global region pointer. At least on some chips, some devlink regions are conditionally registered since the blamed commit, see mv88e6xxx_setup_devlink_regions_global(): if (cond && !cond(chip)) continue; These are MV88E6XXX_REGION_STU and MV88E6XXX_REGION_PVT. If the chip does not have an STU or PVT, it should crash like this. To fix the issue, avoid unregistering those regions which are NULL, i.e. were skipped at mv88e6xxx_setup_devlink_regions_global() time. Fixes: 836021a2d0e0 ("net: dsa: mv88e6xxx: Export cross-chip PVT as devlink region") Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250414212850.2953957-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: txgbe: fix memory leak in txgbe_probe() error pathAbdun Nihaal1-1/+2
When txgbe_sw_init() is called, memory is allocated for wx->rss_key in wx_init_rss_key(). However, in txgbe_probe() function, the subsequent error paths after txgbe_sw_init() don't free the rss_key. Fix that by freeing it in error path along with wx->mac_table. Also change the label to which execution jumps when txgbe_sw_init() fails, because otherwise, it could lead to a double free for rss_key, when the mac_table allocation fails in wx_sw_init(). Fixes: 937d46ecc5f9 ("net: wangxun: add ethtool_ops for channel number") Reported-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250415032910.13139-1-abdun.nihaal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: bridge: switchdev: do not notify new brentries as changedJonas Gorski1-1/+3
When adding a bridge vlan that is pvid or untagged after the vlan has already been added to any other switchdev backed port, the vlan change will be propagated as changed, since the flags change. This causes the vlan to not be added to the hardware for DSA switches, since the DSA handler ignores any vlans for the CPU or DSA ports that are changed. E.g. the following order of operations would work: $ ip link add swbridge type bridge vlan_filtering 1 vlan_default_pvid 0 $ ip link set lan1 master swbridge $ bridge vlan add dev swbridge vid 1 pvid untagged self $ bridge vlan add dev lan1 vid 1 pvid untagged but this order would break: $ ip link add swbridge type bridge vlan_filtering 1 vlan_default_pvid 0 $ ip link set lan1 master swbridge $ bridge vlan add dev lan1 vid 1 pvid untagged $ bridge vlan add dev swbridge vid 1 pvid untagged self Additionally, the vlan on the bridge itself would become undeletable: $ bridge vlan port vlan-id lan1 1 PVID Egress Untagged swbridge 1 PVID Egress Untagged $ bridge vlan del dev swbridge vid 1 self $ bridge vlan port vlan-id lan1 1 PVID Egress Untagged swbridge 1 Egress Untagged since the vlan was never added to DSA's vlan list, so deleting it will cause an error, causing the bridge code to not remove it. Fix this by checking if flags changed only for vlans that are already brentry and pass changed as false for those that become brentries, as these are a new vlan (member) from the switchdev point of view. Since *changed is set to true for becomes_brentry = true regardless of would_change's value, this will not change any rtnetlink notification delivery, just the value passed on to switchdev in vlan->changed. Fixes: 8d23a54f5bee ("net: bridge: switchdev: differentiate new VLANs from changed ones") Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20250414200020.192715-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16net: b53: enable BPDU reception for management portJonas Gorski1-0/+10
For STP to work, receiving BPDUs is essential, but the appropriate bit was never set. Without GC_RX_BPDU_EN, the switch chip will filter all BPDUs, even if an appropriate PVID VLAN was setup. Fixes: ff39c2d68679 ("net: dsa: b53: Add bridge support") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Link: https://patch.msgid.link/20250414200434.194422-1-jonas.gorski@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16netlink: specs: rt-neigh: prefix struct nfmsg members with ndmJakub Kicinski1-6/+6
Attach ndm- to all members of struct nfmsg. We could possibly use name-prefix just for C, but I don't think we have any precedent for using name-prefix on structs, and other rtnetlink sub-specs give full names for fixed header struct members. Fixes: bc515ed06652 ("netlink: specs: Add a spec for neighbor tables in rtnetlink") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16netlink: specs: rt-link: adjust mctp attribute namingJakub Kicinski1-1/+2
MCTP attribute naming is inconsistent. In C we have: IFLA_MCTP_NET, IFLA_MCTP_PHYS_BINDING, ^^^^ but in YAML: - mctp-net - phys-binding ^ no "mctp" It's unclear whether the "mctp" part of the name is supposed to be a prefix or part of attribute name. Make it a prefix, seems cleaner, even tho technically phys-binding was added later. Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16netlink: specs: rtnetlink: attribute naming correctionsJakub Kicinski2-4/+4
Some attribute names diverge in very minor ways from the C names. These are most likely typos, and they prevent the C codegen from working. Fixes: bc515ed06652 ("netlink: specs: Add a spec for neighbor tables in rtnetlink") Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16netlink: specs: rt-link: add an attr layer around alt-ifnameJakub Kicinski1-3/+8
alt-ifname attr is directly placed in requests (as an alternative to ifname) but in responses its wrapped up in IFLA_PROP_LIST and only there is may be multi-attr. See rtnl_fill_prop_list(). Fixes: b2f63d904e72 ("doc/netlink: Add spec for rt link messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16tools: ynl-gen: make sure we validate subtype of array-nestJakub Kicinski1-2/+5
ArrayNest AKA indexed-array support currently skips inner type validation. We count the attributes and then we parse them, make sure we call validate, too. Otherwise buggy / unexpected kernel response may lead to crashes. Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16tools: ynl-gen: individually free previous values on double setJakub Kicinski1-17/+45
When user calls request_attrA_set() multiple times (for the same attribute), and attrA is of type which allocates memory - we try to free the previously associated values. For array types (including multi-attr) we have only freed the array, but the array may have contained pointers. Refactor the code generation for free attr and reuse the generated lines in setters to flush out the previous state. Since setters are static inlines in the header we need to add forward declarations for the free helpers of pure nested structs. Track which types get used by arrays and include the right forwad declarations. At least ethtool string set and bit set would not be freed without this. Tho, admittedly, overriding already set attribute twice is likely a very very rare thing to do. Fixes: be5bea1cc0bf ("net: add basic C code generators for Netlink") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16tools: ynl-gen: move local vars after the opening bracketJakub Kicinski1-1/+1
The "function writing helper" tries to put local variables between prototype and the opening bracket. Clearly wrong, but up until now nothing actually uses it to write local vars so it wasn't noticed. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16tools: ynl-gen: don't declare loop iterator in placeJakub Kicinski1-4/+21
The codegen tries to follow the "old" C style and declare loop iterators at the start of the block / function. Only nested request handling breaks this style, so adjust it. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414211851.602096-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16cxgb4: fix memory leak in cxgb4_init_ethtool_filters() error pathAbdun Nihaal1-0/+1
In the for loop used to allocate the loc_array and bmap for each port, a memory leak is possible when the allocation for loc_array succeeds, but the allocation for bmap fails. This is because when the control flow goes to the label free_eth_finfo, only the allocations starting from (i-1)th iteration are freed. Fix that by freeing the loc_array in the bmap allocation error path. Fixes: d915c299f1da ("cxgb4: add skeleton for ethtool n-tuple filters") Signed-off-by: Abdun Nihaal <abdun.nihaal@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20250414170649.89156-1-abdun.nihaal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-16bcachefs: Add missing READ_ONCE() for metadata replicasKent Overstreet1-1/+1
If we race with the user changing the metadata_replicas setting, this could cause us to get an incorrectly sized disk reservation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-16Bluetooth: vhci: Avoid needless snprintf() callsKees Cook1-5/+5
Avoid double-copying of string literals. Use a "const char *" for each string instead of copying from .rodata into stack and then into the skb. We can go directly from .rodata to the skb. This also works around a Clang bug (that has since been fixed[1]). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401250927.1poZERd6-lkp@intel.com/ Fixes: ab4e4380d4e1 ("Bluetooth: Add vhci devcoredump support") Link: https://github.com/llvm/llvm-project/commit/ea2e66aa8b6e363b89df66dc44275a0d7ecd70ce [1] Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-04-16Bluetooth: l2cap: Process valid commands in too long frameFrédéric Danis1-1/+17
This is required for passing PTS test cases: - L2CAP/COS/CED/BI-14-C Multiple Signaling Command in one PDU, Data Truncated, BR/EDR, Connection Request - L2CAP/COS/CED/BI-15-C Multiple Signaling Command in one PDU, Data Truncated, BR/EDR, Disconnection Request The test procedure defined in L2CAP.TS.p39 for both tests is: 1. The Lower Tester sends a C-frame to the IUT with PDU Length set to 8 and Channel ID set to the correct signaling channel for the logical link. The Information payload contains one L2CAP_ECHO_REQ packet with Data Length set to 0 with 0 octets of echo data and one command packet and Data Length set as specified in Table 4.6 and the correct command data. 2. The IUT sends an L2CAP_ECHO_RSP PDU to the Lower Tester. 3. Perform alternative 3A, 3B, 3C, or 3D depending on the IUT’s response. Alternative 3A (IUT terminates the link): 3A.1 The IUT terminates the link. 3A.2 The test ends with a Pass verdict. Alternative 3B (IUT discards the frame): 3B.1 The IUT does not send a reply to the Lower Tester. Alternative 3C (IUT rejects PDU): 3C.1 The IUT sends an L2CAP_COMMAND_REJECT_RSP PDU to the Lower Tester. Alternative 3D (Any other IUT response): 3D.1 The Upper Tester issues a warning and the test ends. 4. The Lower Tester sends a C-frame to the IUT with PDU Length set to 4 and Channel ID set to the correct signaling channel for the logical link. The Information payload contains Data Length set to 0 with an L2CAP_ECHO_REQ packet with 0 octets of echo data. 5. The IUT sends an L2CAP_ECHO_RSP PDU to the Lower Tester. With expected outcome: In Steps 2 and 5, the IUT responds with an L2CAP_ECHO_RSP. In Step 3A.1, the IUT terminates the link. In Step 3B.1, the IUT does not send a reply to the Lower Tester. In Step 3C.1, the IUT rejects the PDU. In Step 3D.1, the IUT sends any valid response. Currently PTS fails with the following logs: Failed to receive ECHO RESPONSE. And HCI logs: > ACL Data RX: Handle 11 flags 0x02 dlen 20 L2CAP: Information Response (0x0b) ident 2 len 12 Type: Fixed channels supported (0x0003) Result: Success (0x0000) Channels: 0x000000000000002e L2CAP Signaling (BR/EDR) Connectionless reception AMP Manager Protocol L2CAP Signaling (LE) > ACL Data RX: Handle 11 flags 0x02 dlen 13 frame too long 08 01 00 00 08 02 01 00 aa ......... Cc: stable@vger.kernel.org Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-04-16xfs: fix fsmap for internal zoned devicesDarrick J. Wong1-18/+33
Filesystems with an internal zoned rt section use xfs_rtblock_t values that are relative to the start of the data device. When fsmap reports on internal rt sections, it reports the space used by the data section as "OWN_FS". Unfortunately, the logic for resuming a query isn't quite right, so xfs/273 fails because it stress-tests GETFSMAP with a single-record buffer. If we enter the "report fake space as OWN_FS" block with a nonzero key[0].fmr_length, we should add that to key[0].fmr_physical and recheck if we still need to emit the fake record. We should /not/ just return 0 from the whole function because that prevents all rmap record iteration. If we don't enter that block, the resumption is still wrong. keys[*].fmr_physical is a reflection of what we copied out to userspace on a previous query, which means that it already accounts for rgstart. It is not correct to add rtstart_daddr when computing start_rtb or end_rtb, so stop that. While we're at it, add a xfs_has_zoned to make it clear that this is a zoned filesystem thing. Fixes: e50ec7fac81aa2 ("xfs: enable fsmap reporting for internal RT devices") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-16xfs: Fix spelling mistake "drity" -> "dirty"Zhang Xianwei1-1/+1
There is a spelling mistake in fs/xfs/xfs_log.c. Fix it. Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2025-04-16ata: libata-sata: Save all fields from sense data descriptorNiklas Cassel1-0/+15
When filling the taskfile result for a successful NCQ command, we use the SDB FIS from the FIS Receive Area, see e.g. ahci_qc_ncq_fill_rtf(). However, the SDB FIS only has fields STATUS and ERROR. For a successful NCQ command that has sense data, we will have a successful sense data descriptor, in the Sense Data for Successful NCQ Commands log. Since we have access to additional taskfile result fields, fill in these additional fields in qc->result_tf. This matches how for failing/aborted NCQ commands, we will use e.g. ahci_qc_fill_rtf() to fill in some fields, but then for the command that actually caused the NCQ error, we will use ata_eh_read_log_10h(), which provides additional fields, saving additional fields/overriding the qc->result_tf that was fetched using ahci_qc_fill_rtf(). Fixes: 18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL commands using policy 0xD") Signed-off-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2025-04-16platform/x86: msi-wmi-platform: Workaround a ACPI firmware bugArmin Wolf2-32/+63
The ACPI byte code inside the ACPI control method responsible for handling the WMI method calls uses a global buffer for constructing the return value, yet the ACPI control method itself is not marked as "Serialized". This means that calling WMI methods on this WMI device is not thread-safe, as concurrent WMI method calls will corrupt the global buffer. Fix this by serializing the WMI method calls using a mutex. Cc: stable@vger.kernel.org # 6.x.x: 912d614ac99e: platform/x86: msi-wmi-platform: Rename "data" variable Fixes: 9c0beb6b29e7 ("platform/x86: wmi: Add MSI WMI Platform driver") Tested-by: Antheas Kapenekakis <lkml@antheas.dev> Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20250414140453.7691-2-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-04-15batman-adv: Fix double-hold of meshif when getting enabledSven Eckelmann1-1/+0
It was originally meant to replace the dev_hold with netdev_hold. But this was missed in batadv_hardif_enable_interface(). As result, there was an imbalance and a hang when trying to remove the mesh-interface with (previously) active hard-interfaces: unregister_netdevice: waiting for batadv0 to become free. Usage count = 3 Fixes: 00b35530811f ("batman-adv: adopt netdev_hold() / netdev_put()") Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+ff3aa851d46ab82953a3@syzkaller.appspotmail.com Reported-by: syzbot+4036165fc595a74b09b2@syzkaller.appspotmail.com Reported-by: syzbot+c35d73ce910d86c0026e@syzkaller.appspotmail.com Reported-by: syzbot+48c14f61594bdfadb086@syzkaller.appspotmail.com Reported-by: syzbot+f37372d86207b3bb2941@syzkaller.appspotmail.com Signed-off-by: Sven Eckelmann <sven@narfation.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250414-double_hold_fix-v5-1-10e056324cde@narfation.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>