wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2026-02-09	vduse: avoid adding implicit padding	Arnd Bergmann	1	-2/+7
	The vduse_iova_range_v2 and vduse_iotlb_entry_v2 structures are both defined in a way that adds implicit padding and is incompatible between i386 and x86_64 userspace because of the different structure alignment requirements. Building the header with -Wpadded shows these new warnings: vduse.h:305:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] vduse.h:374:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] Change the amount of padding in these two structures to align them to 64 bit words and avoid those problems. Since the v1 vduse_iotlb_entry already has an inconsistent size, do not attempt to reuse the structure but rather list the members indiviudally, with a fixed amount of padding. Fixes: 079212f6877e ("vduse: add vq group asid support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20260202224835.559538-1-arnd@kernel.org>
2026-02-09	Merge tag 'kvmarm-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD	Paolo Bonzini	3	-5/+6
	KVM/arm64 updates for 7.0 - Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception. - Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process. - Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers. - More pKVM fixes for features that are not supposed to be exposed to guests. - Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor. - Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding. - Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest. - Preliminary work for guest GICv5 support. - A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures. - A small set of FPSIMD cleanups. - Selftest fixes addressing the incorrect alignment of page allocation. - Other assorted low-impact fixes and spelling fixes.
2026-02-09	Merge tag 'asoc-v6.20' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus	Takashi Iwai	46	-140/+486
	ASoC: Updates for v7.0 This release is almost all abut driers, there's very little core work here, although some of that driver work is in more generic areas like SDCA and SOF: - Generic SDCA support for reporting jack events. - Continuing platform support, cleanup and feature improements for the AMD, Intel, Qualcomm and SOF code. - Platform description improvements for the Cirrus drivers. - Support for NXP i.MX952, Realtek RT1320 and RT5575, and Sophogo CV1800B. We also pulled in one small SPI API update and some more substantial regmap work (cache description improvements) for use in drivers.
2026-02-09	Merge branch 'for-6.20/sony' into for-linus	Jiri Kosina	318	-1963/+11908
	- Support for Rock band 4 PS4 and PS5 guitars (Rosalie Wanders)
2026-02-09	dma-fence: Fix sparse warnings due __rcu annotations	Tvrtko Ursulin	1	-30/+5
	__rcu annotations on the return types from dma_fence_driver_name() and dma_fence_timeline_name() cause sparse to complain because both the constant signaled strings, and the strings return by the dma_fence_ops are not __rcu annotated. For a simple fix it is easiest to cast them with __rcu added and undo the smarts from the tracpoints side of things. There is no functional change since the rest is left in place. Later we can consider changing the dma_fence_ops return types too, and handle all the individual drivers which define them. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: 506aa8b02a8d ("dma-fence: Add safe access helpers and document the rules") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506162214.1eA69hLe-lkp@intel.com/ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250616155952.24259-1-tvrtko.ursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2026-02-09	libceph: add support for CEPH_CRYPTO_AES256KRB5	Ilya Dryomov	1	-2/+3
	This is based on AES256-CTS-HMAC384-192 crypto algorithm per RFC 8009 (i.e. Kerberos 5, hence the name) with custom-defined key usage numbers. The implementation allows a given key to have/be linked to between one and three usage numbers. The existing CEPH_CRYPTO_AES remains in place and unchanged. The usage_slot parameter that needed to be added to ceph_crypt() and its wrappers is simply ignored there. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-02-09	drm/xe/uapi: update used tracking kernel-doc	Matthew Auld	1	-7/+1
	In commit 4d0b035fd6da ("drm/xe/uapi: loosen used tracking restriction") we dropped the CAP_PERMON restriction but missed updating the corresponding kernel-doc. Fix that. v2 (Sanjay): - Don't drop the note around the extra cpu_visible_used expectations. Reported-by: Ulisses Furquim <ulisses.furquim@intel.com> Fixes: 4d0b035fd6da ("drm/xe/uapi: loosen used tracking restriction") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Link: https://patch.msgid.link/20260130125105.451229-2-matthew.auld@intel.com
2026-02-08	RDMA/uverbs: Add DMABUF object type and operations	Yishai Hadas	3	-0/+20
	Expose DMABUF functionality to userspace through the uverbs interface, enabling InfiniBand/RDMA devices to export PCI based memory regions (e.g. device memory) as DMABUF file descriptors. This allows zero-copy sharing of RDMA memory with other subsystems that support the dma-buf framework. A new UVERBS_OBJECT_DMABUF object type and allocation method were introduced. During allocation, uverbs invokes the driver to supply the rdma_user_mmap_entry associated with the given page offset (pgoff). Based on the returned rdma_user_mmap_entry, uverbs requests the driver to provide the corresponding physical-memory details as well as the driver’s PCI provider information. Using this information, dma_buf_export() is called; if it succeeds, uobj->object is set to the underlying file pointer returned by the dma-buf framework. The file descriptor number follows the standard uverbs allocation flow, but the file pointer comes from the dma-buf subsystem, including its own fops and private data. When an mmap entry is removed, uverbs iterates over its associated DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that their importers are notified. The same procedure applies during the disassociate flow; final cleanup occurs when the application closes the file. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260201-dmabuf-export-v3-2-da238b614fe3@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-08	Merge tag 'common_phys_vec_via_vfio' into HEAD	Leon Romanovsky	4	-18/+12
	* Reuse common phys_vec, phase out dma_buf_phys_vec Signed-off-by: Alex Williamson <alex@shazbot.org> * tag 'common_phys_vec_via_vfio': types: reuse common phys_vec type instead of DMABUF open‑coded variant types: move phys_vec definition to common header nvme-pci: Use size_t for length fields to handle larger sizes
2026-02-08	RDMA/core: introduce rdma_restrict_node_type()	Stefan Metzmacher	1	-0/+17
	For smbdirect it required to use different ports depending on the RDMA protocol. E.g. for iWarp 5445 is needed (as tcp port 445 already used by the raw tcp transport for SMB), while InfiniBand, RoCEv1 and RoCEv2 use port 445, as they use an independent port range (even for RoCEv2, which uses udp port 4791 itself). Currently ksmbd is not able to function correctly at all if the system has iWarp (RDMA_NODE_RNIC) interface(s) and any InfiniBand, RoCEv1 and/or RoCEv2 interface(s) at the same time. And cifs.ko uses 5445 with a fallback to 445, which means depending on the available interfaces, it tries 5445 in the RoCE range or may tries iWarp with 445 as a fallback. This leads to strange error messages and strange network captures. To avoid these problems they will be able to use rdma_restrict_node_type(RDMA_NODE_RNIC) before trying port 5445 and rdma_restrict_node_type(RDMA_NODE_IB_CA) before trying port 445. It means we'll get early -ENODEV early from rdma_resolve_addr() without any network traffic and timeouts. This is designed to be called before calling any of rdma_bind_addr(), rdma_resolve_addr() or rdma_listen(). Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-rdma@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Acked-by: Leon Romanovsky <leon@kernel.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-02-08	Merge branch 'for-linus' into for-next	Takashi Iwai	25	-138/+222
	Pull 6.19-devel branch. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2026-02-08	kcsan, compiler_types: avoid duplicate type issues in BPF Type Format	Alan Maguire	1	-7/+16
	Enabling KCSAN is causing a large number of duplicate types in BTF for core kernel structs like task_struct [1]. This is due to the definition in include/linux/compiler_types.h `#ifdef __SANITIZE_THREAD__ ... `#define __data_racy volatile .. `#else ... `#define __data_racy ... `#endif Because some objects in the kernel are compiled without KCSAN flags (KCSAN_SANITIZE) we sometimes get the empty __data_racy annotation for objects; as a result we get multiple conflicting representations of the associated structs in DWARF, and these lead to multiple instances of core kernel types in BTF since they cannot be deduplicated due to the additional modifier in some instances. Moving the __data_racy definition under CONFIG_KCSAN avoids this problem, since the volatile modifier will be present for both KCSAN and KCSAN_SANITIZE objects in a CONFIG_KCSAN=y kernel. Link: https://lkml.kernel.org/r/20260116091730.324322-1-alan.maguire@oracle.com Fixes: 31f605a308e6 ("kcsan, compiler_types: Introduce __data_racy type qualifier") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Reported-by: Nilay Shroff <nilay@linux.ibm.com> Tested-by: Nilay Shroff <nilay@linux.ibm.com> Suggested-by: Marco Elver <elver@google.com> Reviewed-by: Marco Elver <elver@google.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Bart van Assche <bvanassche@acm.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Naman Jain <namjain@linux.microsoft.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08	tests/liveupdate: add in-kernel liveupdate test	Pasha Tatashin	1	-0/+5
	Introduce an in-kernel test module to validate the core logic of the Live Update Orchestrator's File-Lifecycle-Bound feature. This provides a low-level, controlled environment to test FLB registration and callback invocation without requiring userspace interaction or actual kexec reboots. The test is enabled by the CONFIG_LIVEUPDATE_TEST Kconfig option. Link: https://lkml.kernel.org/r/20251218155752.3045808-6-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08	liveupdate: luo_flb: introduce File-Lifecycle-Bound global state	Pasha Tatashin	2	-0/+223
	Introduce a mechanism for managing global kernel state whose lifecycle is tied to the preservation of one or more files. This is necessary for subsystems where multiple preserved file descriptors depend on a single, shared underlying resource. An example is HugeTLB, where multiple file descriptors such as memfd and guest_memfd may rely on the state of a single HugeTLB subsystem. Preserving this state for each individual file would be redundant and incorrect. The state should be preserved only once when the first file is preserved, and restored/finished only once the last file is handled. This patch introduces File-Lifecycle-Bound (FLB) objects to solve this problem. An FLB is a global, reference-counted object with a defined set of operations: - A file handler (struct liveupdate_file_handler) declares a dependency on one or more FLBs via a new registration function, liveupdate_register_flb(). - When the first file depending on an FLB is preserved, the FLB's .preserve() callback is invoked to save the shared global state. The reference count is then incremented for each subsequent file. - Conversely, when the last file is unpreserved (before reboot) or finished (after reboot), the FLB's .unpreserve() or .finish() callback is invoked to clean up the global resource. The implementation includes: - A new set of ABI definitions (luo_flb_ser, luo_flb_head_ser) and a corresponding FDT node (luo-flb) to serialize the state of all active FLBs and pass them via Kexec Handover. - Core logic in luo_flb.c to manage FLB registration, reference counting, and the invocation of lifecycle callbacks. - An API (liveupdate_flb_get/_incoming/_outgoing) for other kernel subsystems to safely access the live object managed by an FLB, both before and after the live update. This framework provides the necessary infrastructure for more complex subsystems like IOMMU, VFIO, and KVM to integrate with the Live Update Orchestrator. Link: https://lkml.kernel.org/r/20251218155752.3045808-5-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08	list: add primitives for private list manipulations	Pasha Tatashin	1	-0/+256
	Patch series "list private v2 & luo flb", v9. This series introduces two connected infrastructure improvements: a new API for handling private linked lists, and the "File-Lifecycle-Bound" (FLB) mechanism for the Live Update Orchestrator. 1. Private List Primitives (patches 1-3) Recently, Linux introduced the ability to mark structure members as __private and access them via ACCESS_PRIVATE(). This enforces better encapsulation by ensuring internal details are only accessible by the owning subsystem. However, struct list_head is frequently used as an internal linkage mechanism within these private sections. The standard macros in <linux/list.h> do not support ACCESS_PRIVATE() natively. Consequently, subsystems using private lists are forced to implement ad-hoc workarounds or local iterator macros. This series adds <linux/list_private.h>, providing a set of primitives identical to those in <linux/list.h> but designed for private list heads. It also includes a KUnit test suite to verify that the macros correctly handle pointer offsets and qualifiers. 2. This series adds FLB (patches 4-5) support to Live Update that also internally uses private lists. FLB allows global kernel state (such as IOMMU domains or HugeTLB state) to be preserved once, shared across multiple file descriptors, and restored when needed. This is necessary for subsystems where multiple preserved file descriptors depend on a single, shared underlying resource. Preserving this state for each individual file would be redundant and incorrect. FLB uses reference counting tied to the lifecycle of preserved files. The state is preserved when the first file depending on it is preserved, and restored or cleaned up only when the last file is handled. This patch (of 5): Linux recently added an ability to add private members to structs (i.e. __private) and access them via ACCESS_PRIVATE(). This ensures that those members are only accessible by the subsystem which owns the struct type, and not to the object owner. However, struct list_head often needs to be placed into the private section to be manipulated privately by the subsystem. Add macros to support private list manipulations in <linux/list_private.h>. [akpm@linux-foundation.org: fix kerneldoc] Link: https://lkml.kernel.org/r/20251218155752.3045808-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20251218155752.3045808-2-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Alexander Graf <graf@amazon.com> Cc: David Gow <davidgow@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Pratyush Yadav <pratyush@kernel.org> Cc: Samiullah Khawaja <skhawaja@google.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-08	delayacct: fix uapi timespec64 definition	Arnd Bergmann	1	-18/+9
	The custom definition of 'struct timespec64' is incompatible with both the kernel's internal definition and the glibc type, at least on big-endian targets that have the tv_nsec field in a different place, and the definition clashes with any userspace that also defines a timespec64 structure. Running the header check with -Wpadding enabled produces this output that warns about the incorrect padding: usr/include/linux/taskstats.h:25:1: error: padding struct size to alignment boundary with 4 bytes [-Werror=padded] Remove the hack and instead use the regular __kernel_timespec type that is meant to be used in uapi definitions. Link: https://lkml.kernel.org/r/20260202095906.1344100-1-arnd@kernel.org Fixes: 29b63f6eff0e ("delayacct: add timestamp of delay max") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Jiang Kun <jiang.kun2@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-07	Merge tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	Linus Torvalds	1	-4/+2
	Pull scheduler fixes from Ingo Molnar: "Miscellaneous MMCID fixes to address bugs and performance regressions in the recent rewrite of the SCHED_MM_CID management code: - Fix livelock triggered by BPF CI testing - Fix hard lockup on weakly ordered systems - Simplify the dropping of CIDs in the exit path by removing an unintended transition phase - Fix performance/scalability regression on a thread-pool benchmark by optimizing transitional CIDs when scheduling out" * tag 'sched-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/mmcid: Optimize transitional CIDs when scheduling out sched/mmcid: Drop per CPU CID immediately when switching to per task mode sched/mmcid: Protect transition on weakly ordered systems sched/mmcid: Prevent live lock on task to CPU mode transition
2026-02-07	Merge tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	Linus Torvalds	1	-0/+3
	Pull objtool fixes from Ingo Molnar:: - Bump up the Clang minimum version requirements for livepatch builds, due to Clang assembler section handling bugs causing silent miscompilations - Strip livepatching symbol artifacts from non-livepatch modules - Fix livepatch build warnings when certain Clang LTO options are enabled - Fix livepatch build error when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y * tag 'objtool-urgent-2026-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool/klp: Fix unexported static call key access for manually built livepatch modules objtool/klp: Fix symbol correlation for orphaned local symbols livepatch: Free klp_{object,func}_ext data after initialization livepatch: Fix having __klp_objects relics in non-livepatch modules livepatch/klp-build: Require Clang assembler >= 20
2026-02-06	net/ipv6: Remove HBH helpers	Alice Mikityanska	1	-77/+0
	Now that the HBH jumbo helpers are not used by any driver or GSO, remove them altogether. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-13-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	net/ipv6: Drop HBH for BIG TCP on TX side	Alice Mikityanska	1	-1/+0
	BIG TCP IPv6 inserts a hop-by-hop extension header to indicate the real IPv6 payload length when it doesn't fit into the 16-bit field in the IPv6 header itself. While it helps tools parse the packet, it also requires every driver that supports TSO and BIG TCP to remove this 8-byte extension header. It might not sound that bad until we try to apply it to tunneled traffic. Currently, the drivers don't attempt to strip HBH if skb->encapsulation = 1. Moreover, trying to do so would require dissecting different tunnel protocols and making corresponding adjustments on case-by-case basis, which would slow down the fastpath (potentially also requiring adjusting checksums in outer headers). At the same time, BIG TCP IPv4 doesn't insert any extra headers and just calculates the payload length from skb->len, significantly simplifying implementing BIG TCP for tunnels. Stop inserting HBH when building BIG TCP GSO SKBs. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-3-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	net/ipv6: Introduce payload_len helpers	Alice Mikityanska	3	-4/+24
	The next commits will transition away from using the hop-by-hop extension header to encode packet length for BIG TCP. Add wrappers around ip6->payload_len that return the actual value if it's non-zero, and calculate it from skb->len if payload_len is set to zero (and a symmetrical setter). The new helpers are used wherever the surrounding code supports the hop-by-hop jumbo header for BIG TCP IPv6, or the corresponding IPv4 code uses skb_ip_totlen (e.g., in include/net/netfilter/nf_tables_ipv6.h). No behavioral change in this commit. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260205133925.526371-2-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	mptcp: fix kdoc warnings	Matthieu Baerts (NGI0)	1	-1/+1
	The following warnings were visible: $ ./scripts/kernel-doc -Wall -none \ net/mptcp/ include/net/mptcp.h include/uapi/linux/mptcp.h \ include/trace/events/mptcp.h Warning: net/mptcp/token.c:108 No description found for return value of 'mptcp_token_new_request' Warning: net/mptcp/token.c:151 No description found for return value of 'mptcp_token_new_connect' Warning: net/mptcp/token.c:246 No description found for return value of 'mptcp_token_get_sock' Warning: net/mptcp/token.c:298 No description found for return value of 'mptcp_token_iter_next' Warning: net/mptcp/protocol.c:4431 No description found for return value of 'mptcp_splice_read' Warning: include/uapi/linux/mptcp_pm.h:13 missing initial short description on line: enum mptcp_event_type Address all of them: either by using the 'Return:' keyword, or by adding a missing initial short description. The MPTCP CI will soon report issues with kdoc to avoid introducing new issues and being flagged by the Netdev CI. Reviewed-by: Geliang Tang <geliang@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260205-net-mptcp-misc-fixes-6-19-rc8-v2-3-c2720ce75c34@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	tcp: inline tcp_filter()	Eric Dumazet	1	-1/+7
	This helper is already (auto)inlined from IPv4 TCP stack. Make it an inline function to benefit IPv6 as well. $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 30/-49 (-19) Function old new delta tcp_v6_rcv 3448 3478 +30 __pfx_tcp_filter 16 - -16 tcp_filter 33 - -33 Total: Before=24891904, After=24891885, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260205164329.3401481-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	netns: optimize netns cleaning by batching unhash_nsid calls	Qiliang Yuan	1	-0/+1
	Currently, unhash_nsid() scans the entire system for each netns being killed, leading to O(L_dying_net * M_alive_net * N_id) complexity, as __peernet2id() also performs a linear search in the IDR. Optimize this to O(M_alive_net * N_id) by batching unhash operations. Move unhash_nsid() out of the per-netns loop in cleanup_net() to perform a single-pass traversal over survivor namespaces. Identify dying peers by an 'is_dying' flag, which is set under net_rwsem write lock after the netns is removed from the global list. This batches the unhashing work and eliminates the O(L_dying_net) multiplier. To minimize the impact on struct net size, 'is_dying' is placed in an existing hole after 'hash_mix' in struct net. Use a restartable idr_get_next() loop for iteration. This avoids the unsafe modification issue inherent to idr_for_each() callbacks and allows dropping the nsid_lock to safely call sleepy rtnl_net_notifyid(). Clean up redundant nsid_lock and simplify the destruction loop now that unhashing is centralized. Signed-off-by: Qiliang Yuan <yuanql9@chinatelecom.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260204074854.3506916-1-realwujing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06	mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config	Qi Zheng	1	-1/+1
	For architectures that define __HAVE_ARCH_TLB_REMOVE_TABLE, the page tables at the pmd/pud level are generally not of struct ptdesc type, and do not have pt_rcu_head member, thus these architectures cannot support PT_RECLAIM. In preparation for enabling PT_RECLAIM on more architectures, convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config, so that we can make conditional judgments in Kconfig. Link: https://lkml.kernel.org/r/5ebfa3d4b56e63c6906bda5eccaa9f7194d3a86b.1769515122.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Andreas Larsson <andreas@gaisler.com> [sparc, UP&SMP] Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-06	Merge branch 'pci/misc'	Bjorn Helgaas	1	-0/+2
	- Fix documentation typos (Shawn Lin) - Add struct p2pdma_provider kernel doc (Leon Romanovsky) - Remove useless devres WARN_ON() (Philipp Stanner) * pci/misc: PCI: Remove useless WARN_ON() from devres PCI/P2PDMA: Add missing struct p2pdma_provider documentation Documentation: PCI: Fix typos in msi-howto.rst
2026-02-06	Merge branch 'pci/controller/dwc'	Bjorn Helgaas	3	-0/+33
	- Extend PCI_FIND_NEXT_CAP() and PCI_FIND_NEXT_EXT_CAP() to return a pointer to the preceding Capability (Qiang Yu) - Add dw_pcie_remove_capability() and dw_pcie_remove_ext_capability() to remove Capabilities that are advertised but not fully implemented (Qiang Yu) - Remove MSI and MSI-X Capabilities for DWC controllers in platforms that can't support them, so we automatically fall back to INTx (Qiang Yu) - Remove MSI-X and DPC Capabilities for Qualcomm platforms that advertise but don't support them (Qiang Yu) - Remove duplicate dw_pcie_ep_hide_ext_capability() function and replace with dw_pcie_remove_ext_capability() (Qiang Yu) - Add ASPM L1.1 and L1.2 Substates context to debugfs ltssm_status for drivers that support this (Shawn Lin) - Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up to avoid an unnecessary timeout (Manivannan Sadhasivam) - Revert dw-rockchip, qcom, and DWC core changes that used link-up IRQs to trigger enumeration instead of waiting for link to be up because the PCI core doesn't allocate bus number space for hierarchies that might be attached (Niklas Cassel) - Make endpoint iATU entry for MSI permanent instead of programming it dynamically, which is slow and racy with respect to other concurrent traffic, e.g., eDMA (Koichiro Den) - Use iMSI-RX MSI target address when possible to fix endpoints using 32-bit MSI (Shawn Lin) - Make dw_pcie_ltssm_status_string() available and use it for logging errors in dw_pcie_wait_for_link() (Manivannan Sadhasivam) - Return -ENODEV when dw_pcie_wait_for_link() finds no devices, -EIO for device present but inactive, -ETIMEDOUT for other failures, so callers can handle these cases differently (Manivannan Sadhasivam) - Allow DWC host controller driver probe to continue if device is not found or found but inactive; only fail when there's an error with the link (Manivannan Sadhasivam) - For controllers like NXP i.MX6QP and i.MX7D, where LTSSM registers are not accessible after PME_Turn_Off, simply wait 10ms instead of polling for L2/L3 Ready (Richard Zhu) - Use multiple iATU entries to map large bridge windows and DMA ranges when necessary instead of failing (Samuel Holland) - Rename struct dw_pcie_rp.has_msi_ctrl to .use_imsi_rx for clarity (Qiang Yu) - Add EPC dynamic_inbound_mapping feature bit for Endpoint Controllers that can update BAR inbound address translation without requiring EPF driver to clear/reset the BAR first, and advertise it for DWC-based Endpoints (Koichiro Den) - Add EPC subrange_mapping feature bit for Endpoint Controllers that can map multiple independent inbound regions in a single BAR, implement subrange mapping, advertise it for DWC-based Endpoints, and add Endpoint selftests for it (Koichiro Den) - Allow overriding default BAR sizes for pci-epf-test (Niklas Cassel) - Make resizable BARs work for Endpoint multi-PF configurations; previously it only worked for PF 0 (Aksh Garg) - Fix Endpoint non-PF 0 support for BAR configuration, ATU mappings, and Address Match Mode (Aksh Garg) - Fix issues with outbound iATU index assignment that caused iATU index to be out of bounds (Niklas Cassel) - Clean up iATU index tracking to be consistent (Niklas Cassel) - Set up iATU when ECAM is enabled; previously IO and MEM outbound windows weren't programmed, and ECAM-related iATU entries weren't restored after suspend/resume, so config accesses failed (Krishna Chaitanya Chundru) * pci/controller/dwc: PCI: dwc: Fix missing iATU setup when ECAM is enabled PCI: dwc: Clean up iATU index usage in dw_pcie_iatu_setup() PCI: dwc: Fix msg_atu_index assignment PCI: dwc: ep: Add comment explaining controller level PTM access in multi PF setup PCI: dwc: ep: Add per-PF BAR and inbound ATU mapping support PCI: dwc: ep: Fix resizable BAR support for multi-PF configurations PCI: endpoint: pci-epf-test: Allow overriding default BAR sizes selftests: pci_endpoint: Add BAR subrange mapping test case misc: pci_endpoint_test: Add BAR subrange mapping test case PCI: endpoint: pci-epf-test: Add BAR subrange mapping test support Documentation: PCI: endpoint: Clarify pci_epc_set_bar() usage PCI: dwc: ep: Support BAR subrange inbound mapping via Address Match Mode iATU PCI: dwc: Advertise dynamic inbound mapping support PCI: endpoint: Add BAR subrange mapping support PCI: endpoint: Add dynamic_inbound_mapping EPC feature PCI: dwc: Rename dw_pcie_rp::has_msi_ctrl to dw_pcie_rp::use_imsi_rx for clarity PCI: dwc: Fix grammar and formatting for comment in dw_pcie_remove_ext_capability() PCI: dwc: Use multiple iATU windows for mapping large bridge windows and DMA ranges PCI: dwc: Remove duplicate dw_pcie_ep_hide_ext_capability() function PCI: dwc: Skip waiting for L2/L3 Ready if dw_pcie_rp::skip_l23_wait is true PCI: dwc: Fail dw_pcie_host_init() if dw_pcie_wait_for_link() returns -ETIMEDOUT PCI: dwc: Rework the error print of dw_pcie_wait_for_link() PCI: dwc: Rename and move ltssm_status_string() to pcie-designware.c PCI: dwc: Return -EIO from dw_pcie_wait_for_link() if device is not active PCI: dwc: Return -ENODEV from dw_pcie_wait_for_link() if device is not found PCI: dwc: Use cfg0_base as iMSI-RX target address to support 32-bit MSI devices PCI: dwc: ep: Cache MSI outbound iATU mapping Revert "PCI: dwc: Don't wait for link up if driver can detect Link Up event" Revert "PCI: qcom: Enumerate endpoints based on Link up event in 'global_irq' interrupt" Revert "PCI: qcom: Enable MSI interrupts together with Link up if 'Global IRQ' is supported" Revert "PCI: qcom: Don't wait for link if we can detect Link Up" Revert "PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ" Revert "PCI: dw-rockchip: Don't wait for link since we can detect Link Up" PCI: dwc: Skip PME_Turn_Off broadcast and L2/L3 transition during suspend if link is not up PCI: dw-rockchip: Change get_ltssm() to provide L1 Substates info PCI: dwc: Add L1 Substates context to ltssm_status of debugfs PCI: qcom: Remove DPC Extended Capability PCI: qcom: Remove MSI-X Capability for Root Ports PCI: dwc: Remove MSI/MSIX capability for Root Port if iMSI-RX is used as MSI controller PCI: dwc: Add new APIs to remove standard and extended Capability PCI: Add preceding capability position support in PCI_FIND_NEXT_*_CAP macros
2026-02-06	Merge branch 'pci/virtualization'	Bjorn Helgaas	1	-0/+1
	- Mark ASM1164 SATA controller to avoid bus reset since it fails to train the Link after reset (Alex Williamson) - Mark Nvidia GB10 Root Ports to avoid bus reset since they may fail to retrain the link after reset (Johnny-CC Chang) - Add lockdep and other lock assertions (Ilpo Järvinen) - Add ACS quirk for Qualcomm Hamoa & Glymur, which provides ACS-like features but doesn't advertise an ACS Capability (Krishna Chaitanya Chundru) - Add ACS quirk for Pericom PI7C9X2G404 switches, which fail under load when P2P Redirect Request is enabled (Nicolas Cavallari) - Remove an incorrect unlock in pci_slot_trylock() error handling (Jinhui Guo) - Lock the bridge device for slot reset (Keith Busch) - Enable ACS after IOMMU configuration on OF platforms so ACS is enabled an all devices; previously the first device enumeration (typically a Root Port) was omitted (Manivannan Sadhasivam) - Disable ACS Source Validation for IDT 0x80b5 and 0x8090 switches to work around hardware erratum; previously ACS SV was temporarily disabled, which worked for enumeration but not after reset (Manivannan Sadhasivam) * pci/virtualization: PCI: Disable ACS SV for IDT 0x8090 switch PCI: Disable ACS SV for IDT 0x80b5 switch PCI: Cache ACS Capabilities register PCI: Enable ACS after configuring IOMMU for OF platforms PCI: Add ACS quirk for Pericom PI7C9X2G404 switches [12d8:b404] PCI: Add ACS quirk for Qualcomm Hamoa & Glymur PCI: Use device_lock_assert() to verify device lock is held PCI: Use lockdep_assert_held(pci_bus_sem) to verify lock is held PCI: Fix pci_slot_lock () device locking PCI: Fix pci_slot_trylock() error handling PCI: Mark Nvidia GB10 to avoid bus reset PCI: Mark ASM1164 SATA controller to avoid bus reset
2026-02-06	Merge branch 'pci/trace'	Bjorn Helgaas	2	-0/+136
	- Add generic RAS tracepoint for hotplug events (Shuai Xue) - Add RAS tracepoint for link speed changes (Shuai Xue) * pci/trace: Documentation: tracing: Add PCI tracepoint documentation PCI: trace: Add RAS tracepoint to monitor link speed changes PCI: trace: Add generic RAS tracepoint for hotplug event
2026-02-06	Merge branch 'pci/resource'	Bjorn Helgaas	3	-2/+11
	- Build zero-sized resources when a BAR is larger than 4G but pci_bus_addr_t or resource_size_t can't represent 64-bit addresses (Ilpo Järvinen) - Fix bridge window alignment with optional resources, where we previously lost the additional alignment requirement (Ilpo Järvinen) - Stop over-estimating bridge window size since we now assign them without any gaps between them (Ilpo Järvinen) - Increase resource MAX_IORES_LEVEL to avoid /proc/iomem flattening for nested bridges and endpoints (Ilpo Järvinen) - Remove old_size limit from bridge window sizing (Ilpo Järvinen) - Push realloc check into pbus_size_mem() to simplify callers (Ilpo Järvinen) - Pass bridge window resource to pbus_size_mem() to avoid looking it up again (Ilpo Järvinen) - Use res_to_dev_res() instead of open-coding the same search (Ilpo Järvinen) - Add pci_resource_is_bridge_win() helper (Ilpo Järvinen) - Add more logging of resource assignment (Ilpo Järvinen) - Add pbus_mem_size_optional() to handle sizes of optional resources (SR-IOV VF BARs, expansion ROMs, bridge windows) (Ilpo Järvinen) - Move CardBus code to setup-cardbus.c and only build it when CONFIG_CARDBUS is set (Ilpo Järvinen) - Use scnprintf() instead of sprintf() (Ilpo Järvinen) - Add pbus_validate_busn() for Bus Number validation (Ilpo Järvinen) - Don't claim disabled bridge windows to avoid spurious claim failures (Ilpo Järvinen) * pci/resource: PCI: Don't claim disabled bridge windows PCI: Move CardBus bridge scanning to setup-cardbus.c PCI: Add pbus_validate_busn() for Bus Number validation PCI: Add dword #defines for Bus Number + Secondary Latency Timer PCI: Use scnprintf() instead of sprintf() PCI: Handle CardBus-specific params in setup-cardbus.c PCI: Separate CardBus setup & build it only with CONFIG_CARDBUS PCI: Add 'pci' prefix to struct pci_dev_resource handling functions PCI: Use resource_assigned() in setup-bus.c algorithm resource: Mark res given to resource_assigned() as const PCI: Add pbus_mem_size_optional() to handle optional sizes PCI: Check invalid align earlier in pbus_size_mem() PCI: Log reset and restore of resources PCI: Add pci_resource_is_bridge_win() PCI: Fetch dev_res to local var in __assign_resources_sorted() PCI: Use res_to_dev_res() in reassign_resources_sorted() PCI: Pass bridge window resource to pbus_size_mem() PCI: Push realloc check into pbus_size_mem() PCI: Remove old_size limit from bridge window sizing resource: Increase MAX_IORES_LEVEL to 8 PCI: Stop over-estimating bridge window size PCI: Rewrite bridge window head alignment function PCI: Fix bridge window alignment with optional resources PCI: Use resource_set_range() that correctly sets ->end
2026-02-06	Merge branch 'pci/pwrctrl'	Bjorn Helgaas	2	-2/+15
	- Rename pwrseq, tc9563, and slot driver structs, variables, and functions for consistency (Bjorn Helgaas) - Add power_on/off callbacks with generic signature to pwrseq, tc9563, and slot drivers so they can be used by pwrctrl core (Manivannan Sadhasivam) - Add interfaces to create and destroy pwrctrl devices (Krishna Chaitanya Chundru) - Add interfaces to power devices on and off (Manivannan Sadhasivam) - Switch to pwrctrl interfaces to create, destroy, and power on/off devices, calling them from host controller drivers instead of the PCI core (Manivannan Sadhasivam) - Drop qcom .assert_perst() callbacks since this is now done by the controller driver instead of the pwrctrl driver (Manivannan Sadhasivam) - Add PCIe M.2 connector support to the slot pwrctrl driver (Manivannan Sadhasivam) - Create pwrctrl devices for devicetree PCIe M.2 connector nodes (Manivannan Sadhasivam) * pci/pwrctrl: PCI/pwrctrl: Create pwrctrl device if graph port is found PCI/pwrctrl: Add PCIe M.2 connector support PCI: Drop the assert_perst() callback PCI: qcom: Drop the assert_perst() callbacks PCI/pwrctrl: Switch to pwrctrl create, power on/off, destroy APIs PCI/pwrctrl: Add APIs to power on/off pwrctrl devices PCI/pwrctrl: Add APIs to create, destroy pwrctrl devices PCI/pwrctrl: Add 'struct pci_pwrctrl::power_{on/off}' callbacks PCI/pwrctrl: pwrseq: Factor out power on/off code to helpers PCI/pwrctrl: slot: Factor out power on/off code to helpers PCI/pwrctrl: tc9563: Rename private struct and pointers for consistency PCI/pwrctrl: tc9563: Add local variables to reduce repetition PCI/pwrctrl: tc9563: Clean up whitespace PCI/pwrctrl: tc9563: Use put_device() instead of i2c_put_adapter() PCI/pwrctrl: slot: Rename private struct and pointers for consistency PCI/pwrctrl: pwrseq: Rename private struct and pointers for consistency # Conflicts: # drivers/pci/bus.c
2026-02-06	Merge branch 'pci/iommu'	Bjorn Helgaas	2	-0/+7
	- Add PCI_BRIDGE_NO_ALIAS quirk for ASPEED AST1150, where VGA and USB are behind a PCIe-to-PCI bridge and share the same StreamID (Nirmoy Das) * pci/iommu: PCI: Add PCI_BRIDGE_NO_ALIAS quirk for ASPEED AST1150 PCI: Add ASPEED vendor ID to pci_ids.h
2026-02-06	PCI/bwctrl: Disable BW controller on Intel P45 using a quirk	Ilpo Järvinen	1	-0/+1
	The commit 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") was found to lead to a boot hang on a Intel P45 system. Testing without setting Link Bandwidth Management Interrupt Enable (LBMIE) and Link Autonomous Bandwidth Interrupt Enable (LABIE) (PCIe r7.0, sec 7.5.3.7) in bwctrl allowed system to come up. P45 is a very old chipset and supports only up to gen2 PCIe, so not having bwctrl does not seem a huge deficiency. Add no_bw_notif in struct pci_dev and quirk Intel P45 Root Port with it. Reported-by: Adam Stylinski <kungfujesus06@gmail.com> Link: https://lore.kernel.org/linux-pci/aUCt1tHhm_-XIVvi@eggsbenedict/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Adam Stylinski <kungfujesus06@gmail.com> Link: https://patch.msgid.link/20260116131513.2359-1-ilpo.jarvinen@linux.intel.com
2026-02-06	PCI: Cache ACS Capabilities register	Manivannan Sadhasivam	1	-0/+1
	The ACS Capability register is read-only. Cache it to allow quirks to override it and to avoid re-reading it. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://patch.msgid.link/20260102-pci_acs-v3-2-72280b94d288@oss.qualcomm.com
2026-02-06	bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy}	Amery Hung	1	-2/+2
	Take care of rqspinlock error in bpf_local_storage_{map_free, destroy}() properly by switching to bpf_selem_unlink_nofail(). Both functions iterate their own RCU-protected list of selems and call bpf_selem_unlink_nofail(). In map_free(), to prevent infinite loop when both map_free() and destroy() fail to remove a selem from b->list (extremely unlikely), switch to hlist_for_each_entry_rcu(). In destroy(), also switch to hlist_for_each_entry_rcu() since we no longer iterate local_storage->list under local_storage->lock. bpf_selem_unlink() now becomes dedicated to helpers and syscalls paths so reuse_now should always be false. Remove it from the argument and hardcode it. Acked-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-12-ameryhung@gmail.com
2026-02-06	bpf: Support lockless unlink when freeing map or local storage	Amery Hung	1	-1/+8
	Introduce bpf_selem_unlink_nofail() to properly handle errors returned from rqspinlock in bpf_local_storage_map_free() and bpf_local_storage_destroy() where the operation must succeeds. The idea of bpf_selem_unlink_nofail() is to allow an selem to be partially linked and use atomic operation on a bit field, selem->state, to determine when and who can free the selem if any unlink under lock fails. An selem initially is fully linked to a map and a local storage. Under normal circumstances, bpf_selem_unlink_nofail() will be able to grab locks and unlink a selem from map and local storage in sequeunce, just like bpf_selem_unlink(), and then free it after an RCU grace period. However, if any of the lock attempts fails, it will only clear SDATA(selem)->smap or selem->local_storage depending on the caller and set SELEM_MAP_UNLINKED or SELEM_STORAGE_UNLINKED according to the caller. Then, after both map_free() and destroy() see the selem and the state becomes SELEM_UNLINKED, one of two racing caller can succeed in cmpxchg the state from SELEM_UNLINKED to SELEM_TOFREE, ensuring no double free or memory leak. To make sure bpf_obj_free_fields() is done only once and when map is still present, it is called when unlinking an selem from b->list under b->lock. To make sure uncharging memory is done only when the owner is still present in map_free(), block destroy() from returning until there is no pending map_free(). Since smap may not be valid in destroy(), bpf_selem_unlink_nofail() skips bpf_selem_unlink_storage_nolock_misc() when called from destroy(). This is okay as bpf_local_storage_destroy() will return the remaining amount of memory charge tracked by mem_charge to the owner to uncharge. It is also safe to skip clearing local_storage->owner and owner_storage as the owner is being freed and no users or bpf programs should be able to reference the owner and using local_storage. Finally, access of selem, SDATA(selem)->smap and selem->local_storage are racy. Callers will protect these fields with RCU. Acked-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-11-ameryhung@gmail.com
2026-02-06	bpf: Prepare for bpf_selem_unlink_nofail()	Amery Hung	1	-2/+3
	The next patch will introduce bpf_selem_unlink_nofail() to handle rqspinlock errors. bpf_selem_unlink_nofail() will allow an selem to be partially unlinked from map or local storage. Save memory allocation method in selem so that later an selem can be correctly freed even when SDATA(selem)->smap is init to NULL. In addition, keep track of memory charge to the owner in local storage so that later bpf_selem_unlink_nofail() can return the correct memory charge to the owner. Updating local_storage->mem_charge is protected by local_storage->lock. Finally, extract miscellaneous tasks performed when unlinking an selem from local_storage into bpf_selem_unlink_storage_nolock_misc(). It will be reused by bpf_selem_unlink_nofail(). This patch also takes the chance to remove local_storage->smap, which is no longer used since commit f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage"). Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-10-ameryhung@gmail.com
2026-02-06	bpf: Remove unused percpu counter from bpf_local_storage_map_free	Amery Hung	1	-2/+1
	Percpu locks have been removed from cgroup and task local storage. Now that all local storage no longer use percpu variables as locks preventing recursion, there is no need to pass them to bpf_local_storage_map_free(). Remove the argument from the function. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-9-ameryhung@gmail.com
2026-02-06	bpf: Change local_storage->lock and b->lock to rqspinlock	Amery Hung	1	-2/+3
	Change bpf_local_storage::lock and bpf_local_storage_map_bucket::lock from raw_spin_lock to rqspinlock. Finally, propagate errors from raw_res_spin_lock_irqsave() to syscall return or BPF helper return. In bpf_local_storage_destroy(), ignore return from raw_res_spin_lock_irqsave() for now. A later patch will correctly handle errors correctly in bpf_local_storage_destroy() so that it can unlink selems even when failing to acquire locks. For __bpf_local_storage_map_cache(), instead of handling the error, skip updating the cache. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-6-ameryhung@gmail.com
2026-02-06	bpf: Convert bpf_selem_unlink to failable	Amery Hung	1	-1/+1
	To prepare changing both bpf_local_storage_map_bucket::lock and bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Open code bpf_selem_unlink_storage() in the only caller, bpf_selem_unlink(), since unlink_map and unlink_storage must be done together after all the necessary locks are acquired. For bpf_local_storage_map_free(), ignore the return from bpf_selem_unlink() for now. A later patch will allow it to unlink selems even when failing to acquire locks. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-5-ameryhung@gmail.com
2026-02-06	bpf: Convert bpf_selem_link_map to failable	Amery Hung	1	-3/+3
	To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_link_map() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-4-ameryhung@gmail.com
2026-02-06	bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage	Amery Hung	1	-0/+1
	A later bpf_local_storage refactor will acquire all locks before performing any update. To simplified the number of locks needed to take in bpf_local_storage_map_update(), determine the bucket based on the local_storage an selem belongs to instead of the selem pointer. Currently, when a new selem needs to be created to replace the old selem in bpf_local_storage_map_update(), locks of both buckets need to be acquired to prevent racing. This can be simplified if the two selem belongs to the same bucket so that only one bucket needs to be locked. Therefore, instead of hashing selem, hashing the local_storage pointer the selem belongs. Performance wise, this is slightly better as update now requires locking one bucket. It should not change the level of contention on one bucket as the pointers to local storages of selems in a map are just as unique as pointers to selems. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-2-ameryhung@gmail.com
2026-02-06	Merge tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm	Linus Torvalds	1	-0/+3
	Pull hotfixes from Andrew Morton: "A couple of late-breaking MM fixes. One against a new-in-this-cycle patch and the other addresses a locking issue which has been there for over a year" * tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory-failure: reject unsupported non-folio compound page procfs: avoid fetching build ID while holding VMA lock
2026-02-06	Merge tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client	Linus Torvalds	1	-0/+6
	Pull ceph fixes from Ilya Dryomov: "One RBD and two CephFS fixes which address potential oopses. The RBD thing is more of a rare edge case that pops up in our CI, while the two CephFS scenarios are regressions that were reported by users and can be triggered trivially in normal operation. All marked for stable" * tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client: ceph: fix NULL pointer dereference in ceph_mds_auth_match() ceph: fix oops due to invalid pointer for kfree() in parse_longname() rbd: check for EOD after exclusive lock is ensured to be held
2026-02-06	Merge tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux	Linus Torvalds	1	-6/+19
	Pull dma-mapping fixes from Marek Szyprowski: "Two minor fixes for the DMA-mapping subsystem: - check for the rare case of the allocation failure of the global CMA pool (Shanker Donthineni) - avoid perf buffer overflow when tracing large scatter-gather lists (Deepanshu Kartikey)" * tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma: contiguous: Check return value of dma_contiguous_reserve_area() tracing/dma: Cap dma_map_sg tracepoint arrays to prevent buffer overflow
2026-02-06	landlock: Minor reword of docs for TCP access rights	Matthieu Buffet	1	-8/+9
	- Move ABI requirement next to each access right to prepare adding more access rights; - Mention the possibility to remove the random component of a socket's ephemeral port choice within the netns-wide ephemeral port range, since it allows choosing the "random" ephemeral port. Signed-off-by: Matthieu Buffet <matthieu@buffet.re> Link: https://lore.kernel.org/r/20251212163704.142301-2-matthieu@buffet.re Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-02-06	landlock: Multithreading support for landlock_restrict_self()	Günther Noack	1	-0/+13
	Introduce the LANDLOCK_RESTRICT_SELF_TSYNC flag. With this flag, a given Landlock ruleset is applied to all threads of the calling process, instead of only the current one. Without this flag, multithreaded userspace programs currently resort to using the nptl(7)/libpsx hack for multithreaded policy enforcement, which is also used by libcap and for setuid(2). Using this userspace-based scheme, the threads of a process enforce the same Landlock policy, but the resulting Landlock domains are still separate. The domains being separate causes multiple problems: * When using Landlock's "scoped" access rights, the domain identity is used to determine whether an operation is permitted. As a result, when using LANLDOCK_SCOPE_SIGNAL, signaling between sibling threads stops working. This is a problem for programming languages and frameworks which are inherently multithreaded (e.g. Go). * In audit logging, the domains of separate threads in a process will get logged with different domain IDs, even when they are based on the same ruleset FD, which might confuse users. Cc: Andrew G. Morgan <morgan@kernel.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Paul Moore <paul@paul-moore.com> Suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20251127115136.3064948-2-gnoack@google.com [mic: Fix restrict_self_flags test, clean up Makefile, allign comments, reduce local variable scope, add missing includes] Closes: https://github.com/landlock-lsm/linux/issues/2 Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-02-06	Revert "revocable: Revocable resource management"	Johan Hovold	1	-89/+0
	This reverts commit 62eb557580eb2177cf16c3fd2b6efadff297b29a. The revocable implementation uses two separate abstractions, struct revocable_provider and struct revocable, in order to store the SRCU read lock index which must be passed unaltered to srcu_read_unlock() in the same context when a resource is no longer needed. With the merged revocable API, multiple threads could however share the same struct revocable and therefore potentially overwrite the SRCU index of another thread which can cause the SRCU synchronisation in revocable_provider_revoke() to never complete. [1] An example revocable conversion of the gpiolib code also turned out to be fundamentally flawed and could lead to use-after-free. [2] An attempt to address both issues was quickly put together and merged, but revocable is still fundamentally broken. [3] Specifically, the latest design relies on RCU for storing a pointer to the revocable provider, but since the resource can be shared by value (e.g. as in the now reverted selftests) this does not work at all and can also lead to use-after-free: static void revocable_provider_release(struct kref kref) { struct revocable_provider rp = container_of(kref, struct revocable_provider, kref); cleanup_srcu_struct(&rp->srcu); kfree_rcu(rp, rcu); } void revocable_provider_revoke(struct revocable_provider __rcu *rp_ptr) { struct revocable_provider rp; rp = rcu_replace_pointer(rp_ptr, NULL, 1); ... kref_put(&rp->kref, revocable_provider_release); } int revocable_init(struct revocable_provider __rcu _rp, struct revocable rev) { struct revocable_provider rp; ... scoped_guard(rcu) { rp = rcu_dereference(_rp); if (!rp) return -ENODEV; if (!kref_get_unless_zero(&rp->kref)) return -ENODEV; } ... } producer: priv->rp = revocable_provider_alloc(&priv->res); // pass priv->rp by value to consumer revocable_provider_revoke(&priv->rp); consumer: struct revocable_provider __rcu rp = filp->private_data; struct revocable rev; revocable_init(rp, &rev); as _rp would still be non-NULL in revocable_init() regardless of whether the producer has revoked the resource and set its pointer to NULL. Essentially revocable still relies on having a pointer to reference counted driver data which holds the revocable provider, which makes all the RCU protection unnecessary along with most of the current revocable design and implementation. As the above shows, and as has been pointed out repeatedly elsewhere, these kind of issues are not something that should be addressed incrementally. [4] Revert the revocable implementation until a redesign has been proposed and evaluated properly. Link: https://lore.kernel.org/all/20260124170535.11756-4-johan@kernel.org/ [1] Link: https://lore.kernel.org/all/aXT45B6vLf9R3Pbf@hovoldconsulting.com/ [2] Link: https://lore.kernel.org/all/20260129143733.45618-1-tzungbi@kernel.org/ [3] Link: https://lore.kernel.org/all/aXobzoeooJqxMkEj@hovoldconsulting.com/ [4] Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://patch.msgid.link/20260204142849.22055-4-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-06	io_uring: allow registration of per-task restrictions	Jens Axboe	2	-0/+9
	Currently io_uring supports restricting operations on a per-ring basis. To use those, the ring must be setup in a disabled state by setting IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and the ring can then be enabled. This commit adds support for IORING_REGISTER_RESTRICTIONS with ring_fd == -1, like the other "blind" register opcodes which work on the task rather than a specific ring. This allows registration of the same kind of restrictions as can been done on a specific ring, but with the task itself. Once done, any ring created will inherit these restrictions. If a restriction filter is registered with a task, then it's inherited on fork for its children. Children may only further restrict operations, not extend them. Inheriting restrictions include both the classic IORING_REGISTER_RESTRICTIONS based restrictions, as well as the BPF filters that have been registered with the task via IORING_REGISTER_BPF_FILTER. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-06	io_uring: add task fork hook	Jens Axboe	2	-1/+14
	Called when copy_process() is called to copy state to a new child. Right now this is just a stub, but will be used shortly to properly handle fork'ing of task based io_uring restrictions. Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>