aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-26Merge tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-nextPaolo Abeni5-51/+96
Steffen Klassert says: ==================== 1) Remove some unnecessary strscpy_pad() size arguments. From Thorsten Blum. 2) Correct use of xso.real_dev on bonding offloads. Patchset from Cosmin Ratiu. 3) Add hardware offload configuration to XFRM_MSG_MIGRATE. From Chiachang Wang. 4) Refactor migration setup during cloning. This was done after the clone was created. Now it is done in the cloning function itself. From Chiachang Wang. 5) Validate assignment of maximal possible SEQ number. Prevent from setting to the maximum sequrnce number as this would cause for traffic drop. From Leon Romanovsky. 6) Prevent configuration of interface index when offload is used. Hardware can't handle this case.i From Leon Romanovsky. 7) Always use kfree_sensitive() for SA secret zeroization. From Zilin Guan. ipsec-next-2025-05-23 * tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: use kfree_sensitive() for SA secret zeroization xfrm: prevent configuration of interface index when offload is used xfrm: validate assignment of maximal possible SEQ number xfrm: Refactor migration setup during the cloning process xfrm: Migrate offload configuration bonding: Fix multiple long standing offload races bonding: Mark active offloaded xfrm_states xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free} xfrm: Remove unneeded device check from validate_xmit_xfrm xfrm: Use xdo.dev instead of xdo.real_dev net/mlx5: Avoid using xso.real_dev unnecessarily xfrm: Remove unnecessary strscpy_pad() size arguments ==================== Link: https://patch.msgid.link/20250523075611.3723340-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26net: mctp: use nlmsg_payload() for netlink message data extractionJeremy Kerr2-3/+6
Jakub suggests: > I have a different request :) Matt, once this ends up in net-next > (end of this week) could you refactor it to use nlmsg_payload() ? > It doesn't exist in net but this is exactly why it was added. This refactors the additions to both mctp_dump_addrinfo(), and mctp_rtm_getneigh() - two cases where we're calling nlh_data() on an an incoming netlink message, without a prior nlmsg_parse(). For the neigh.c case, we cannot hit the failure where the nlh does not contain a full ndmsg at present, as the core handler (net/core/neighbour.c, neigh_get()) has already validated the size through neigh_valid_req_get(), and would have failed the get operation before the MCTP hander is called. However, relying on that is a bit fragile, so apply the nlmsg_payload refector here too. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20250521-mctp-nlmsg-payload-v2-1-e85df160c405@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26Merge tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds1-6/+6
Pull vfs directory lookup updates from Christian Brauner: "This contains cleanups for the lookup_one*() family of helpers. We expose a set of functions with names containing "lookup_one_len" and others without the "_len". This difference has nothing to do with "len". It's rater a historical accident that can be confusing. The functions without "_len" take a "mnt_idmap" pointer. This is found in the "vfsmount" and that is an important question when choosing which to use: do you have a vfsmount, or are you "inside" the filesystem. A related question is "is permission checking relevant here?". nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len functions. They pass nop_mnt_idmap and refuse to work on filesystems which have any other idmap. This work changes nfsd and cachefile to use the lookup_one family of functions and to explictily pass &nop_mnt_idmap which is consistent with all other vfs interfaces used where &nop_mnt_idmap is explicitly passed. The remaining uses of the "_one" functions do not require permission checks so these are renamed to be "_noperm" and the permission checking is removed. This series also changes these lookup function to take a qstr instead of separate name and len. In many cases this simplifies the call" * tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: change lookup_one_common and lookup_noperm_common to take a qstr Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS VFS: rename lookup_one_len family to lookup_noperm and remove permission check cachefiles: Use lookup_one() rather than lookup_one_len() nfsd: Use lookup_one() rather than lookup_one_len() VFS: improve interface for lookup_one functions
2025-05-26net: neigh: use kfree_skb_reason() in neigh_resolve_output() and neigh_connected_output()Qiu Yutan1-2/+2
Replace kfree_skb() used in neigh_resolve_output() and neigh_connected_output() with kfree_skb_reason(). Following new skb drop reason is added: /* failed to fill the device hard header */ SKB_DROP_REASON_NEIGH_HH_FILLFAIL Signed-off-by: Qiu Yutan <qiu.yutan@zte.com.cn> Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Xu Xin <xu.xin16@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-26net: devmem: support single IOV with sendmsgStanislav Fomichev1-1/+2
sendmsg() with a single iov becomes ITER_UBUF, sendmsg() with multiple iovs becomes ITER_IOVEC. iter_iov_len does not return correct value for UBUF, so teach to treat UBUF differently. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Mina Almasry <almasrymina@google.com> Fixes: bd61848900bf ("net: devmem: Implement TX path") Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Acked-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23netfilter: nf_tables: Add notifications for hook changesPhil Sutter3-0/+62
Notify user space if netdev hooks are updated due to netdev add/remove events. Send minimal notification messages by introducing NFT_MSG_NEWDEV/DELDEV message types describing a single device only. Upon NETDEV_CHANGENAME, the callback has no information about the interface's old name. To provide a clear message to user space, include the hook's stored interface name in the notification. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Support wildcard netdev hook specsPhil Sutter2-16/+15
User space may pass non-nul-terminated NFTA_DEVICE_NAME attribute values to indicate a suffix wildcard. Expect for multiple devices to match the given prefix in nft_netdev_hook_alloc() and populate 'ops_list' with them all. When checking for duplicate hooks, compare the shortest prefix so a device may never match more than a single hook spec. Finally respect the stored prefix length when hooking into new devices from event handlers. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc()Phil Sutter1-9/+7
No point in having err_hook_alloc, just call return directly. Also rename err_hook_dev - it's not about the hook's device but freeing the hook itself. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Handle NETDEV_CHANGENAME eventsPhil Sutter2-18/+48
For the sake of simplicity, treat them like consecutive NETDEV_REGISTER and NETDEV_UNREGISTER events. If the new name matches a hook spec and registration fails, escalate the error and keep things as they are. To avoid unregistering the newly registered hook again during the following fake NETDEV_UNREGISTER event, leave hooks alone if their interface spec matches the new name. Note how this patch also skips for NETDEV_REGISTER if the device is already registered. This is not yet possible as the new name would have to match the old one. This will change with wildcard interface specs, though. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Wrap netdev notifiersPhil Sutter2-26/+46
Handling NETDEV_CHANGENAME events has to traverse all chains/flowtables twice, prepare for this. No functional change intended. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Respect NETDEV_REGISTER eventsPhil Sutter2-9/+60
Hook into new devices if their name matches the hook spec. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Prepare for handling NETDEV_REGISTER eventsPhil Sutter2-14/+24
Put NETDEV_UNREGISTER handling code into a switch, no functional change intended as the function is only called for that event yet. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Have a list of nf_hook_ops in nft_hookPhil Sutter3-62/+133
Supporting a 1:n relationship between nft_hook and nf_hook_ops is convenient since a chain's or flowtable's nft_hooks may remain in place despite matching interfaces disappearing. This stabilizes ruleset dumps in that regard and opens the possibility to claim newly added interfaces which match the spec. Also it prepares for wildcard interface specs since these will potentially match multiple interfaces. All spots dealing with hook registration are updated to handle a list of multiple nf_hook_ops, but nft_netdev_hook_alloc() only adds a single item for now to retain the old behaviour. The only expected functional change here is how vanishing interfaces are handled: Instead of dropping the respective nft_hook, only the matching nf_hook_ops are dropped. To safely remove individual ops from the list in netdev handlers, an rcu_head is added to struct nf_hook_ops so kfree_rcu() may be used. There is at least nft_flowtable_find_dev() which may be iterating through the list at the same time. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook()Phil Sutter1-11/+9
The function accesses only the hook's ops field, pass it directly. This prepares for nft_hooks holding a list of nf_hook_ops in future. While at it, make use of the function in __nft_unregister_flowtable_net_hooks() as well. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce nft_register_flowtable_ops()Phil Sutter1-11/+21
Facilitate binding and registering of a flowtable hook via a single function call. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()Phil Sutter4-5/+26
Also a pretty dull wrapper around the hook->ops.dev comparison for now. Will search the embedded nf_hook_ops list in future. The ugly cast to eliminate the const qualifier will vanish then, too. Since this future list will be RCU-protected, also introduce an _rcu() variant here. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce functions freeing nft_hook objectsPhil Sutter1-14/+24
Pointless wrappers around kfree() for now, prep work for an embedded list of nf_hook_ops. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: add packets conntrack state to debug trace infoFlorian Westphal1-1/+53
Add the minimal relevant info needed for userspace ("nftables monitor trace") to provide the conntrack view of the packet: - state (new, related, established) - direction (original, reply) - status (e.g., if connection is subject to dnat) - id (allows to query ctnetlink for remaining conntrack state info) Example: trace id a62 inet filter PRE_RAW packet: iif "enp0s3" ether [..] [..] trace id a62 inet filter PRE_MANGLE conntrack: ct direction original ct state new ct id 32 trace id a62 inet filter PRE_MANGLE packet: [..] [..] trace id a62 inet filter IN conntrack: ct direction original ct state new ct status dnat-done ct id 32 [..] In this case one can see that while NAT is active, the new connection isn't subject to a translation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: conntrack: make nf_conntrack_id callable without a module dependencyFlorian Westphal1-0/+6
While nf_conntrack_id() doesn't need any functionaliy from conntrack, it does reside in nf_conntrack_core.c -- callers add a module dependency on conntrack. Followup patch will need to compute the conntrack id from nf_tables_trace.c to include it in nf_trace messages emitted to userspace via netlink. I don't want to introduce a module dependency between nf_tables and conntrack for this. Since trace is slowpath, the added indirection is ok. One alternative is to move nf_conntrack_id to the netfilter/core.c, but I don't see a compelling reason so far. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmitSebastian Andrzej Siewior1-4/+18
nf_dup_skb_recursion is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Move nf_dup_skb_recursion to struct netdev_xmit, provide wrappers. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctxSebastian Andrzej Siewior1-3/+15
nft_pcpu_tun_ctx is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a nft_inner_tun_ctx member (original nft_pcpu_tun_ctx) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_dup{4, 6}: Move duplication check to task_structSebastian Andrzej Siewior5-11/+8
nf_skb_duplicated is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Due to the recursion involved, the simplest change is to make it a per-task variable. Move the per-CPU variable nf_skb_duplicated to task_struct and name it in_nf_duplicate. Add it to the existing bitfield so it doesn't use additional memory. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nft_tunnel: fix geneve_opt dumpFernando Fernandez Mancera1-4/+4
When dumping a nft_tunnel with more than one geneve_opt configured the netlink attribute hierarchy should be as follow: NFTA_TUNNEL_KEY_OPTS | |--NFTA_TUNNEL_KEY_OPTS_GENEVE | | | |--NFTA_TUNNEL_KEY_GENEVE_CLASS | |--NFTA_TUNNEL_KEY_GENEVE_TYPE | |--NFTA_TUNNEL_KEY_GENEVE_DATA | |--NFTA_TUNNEL_KEY_OPTS_GENEVE | | | |--NFTA_TUNNEL_KEY_GENEVE_CLASS | |--NFTA_TUNNEL_KEY_GENEVE_TYPE | |--NFTA_TUNNEL_KEY_GENEVE_DATA | |--NFTA_TUNNEL_KEY_OPTS_GENEVE ... Otherwise, userspace tools won't be able to fetch the geneve options configured correctly. Fixes: 925d844696d9 ("netfilter: nft_tunnel: add support for geneve opts") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: nft_fib: consistent l3mdev handlingFlorian Westphal2-5/+10
fib has two modes: 1. Obtain output device according to source or destination address 2. Obtain the type of the address, e.g. local, unicast, multicast. 'fib daddr type' should return 'local' if the address is configured in this netns or unicast otherwise. 'fib daddr . iif type' should return 'local' if the address is configured on the input interface or unicast otherwise, i.e. more restrictive. However, if the interface is part of a VRF, then 'fib daddr type' returns unicast even if the address is configured on the incoming interface. This is broken for both ipv4 and ipv6. In the ipv4 case, inet_dev_addr_type must only be used if the 'iif' or 'oif' (strict mode) was requested. Else inet_addr_type_dev_table() needs to be used and the correct dev argument must be passed as well so the correct fib (vrf) table is used. In the ipv6 case, the bug is similar, without strict mode, dev is NULL so .flowi6_l3mdev will be set to 0. Add a new 'nft_fib_l3mdev_master_ifindex_rcu()' helper and use that to init the .l3mdev structure member. For ipv6, use it from nft_fib6_flowi_init() which gets called from both the 'type' and the 'route' mode eval functions. This provides consistent behaviour for all modes for both ipv4 and ipv6: If strict matching is requested, the input respectively output device of the netfilter hooks is used. Otherwise, use skb->dev to obtain the l3mdev ifindex. Without this, most type checks in updated nft_fib.sh selftest fail: FAIL: did not find veth0 . 10.9.9.1 . local in fibtype4 FAIL: did not find veth0 . dead:1::1 . local in fibtype6 FAIL: did not find veth0 . dead:9::1 . local in fibtype6 FAIL: did not find tvrf . 10.0.1.1 . local in fibtype4 FAIL: did not find tvrf . 10.9.9.1 . local in fibtype4 FAIL: did not find tvrf . dead:1::1 . local in fibtype6 FAIL: did not find tvrf . dead:9::1 . local in fibtype6 FAIL: fib expression address types match (iif in vrf) (fib errounously returns 'unicast' for all of them, even though all of these addresses are local to the vrf). Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23af_unix: Introduce SO_PASSRIGHTS.Kuniyuki Iwashima2-2/+34
As long as recvmsg() or recvmmsg() is used with cmsg, it is not possible to avoid receiving file descriptors via SCM_RIGHTS. This behaviour has occasionally been flagged as problematic, as it can be (ab)used to trigger DoS during close(), for example, by passing a FUSE-controlled fd or a hung NFS fd. For instance, as noted on the uAPI Group page [0], an untrusted peer could send a file descriptor pointing to a hung NFS mount and then close it. Once the receiver calls recvmsg() with msg_control, the descriptor is automatically installed, and then the responsibility for the final close() now falls on the receiver, which may result in blocking the process for a long time. Regarding this, systemd calls cmsg_close_all() [1] after each recvmsg() to close() unwanted file descriptors sent via SCM_RIGHTS. However, this cannot work around the issue at all, because the final fput() may still occur on the receiver's side once sendmsg() with SCM_RIGHTS succeeds. Also, even filtering by LSM at recvmsg() does not work for the same reason. Thus, we need a better way to refuse SCM_RIGHTS at sendmsg(). Let's introduce SO_PASSRIGHTS to disable SCM_RIGHTS. Note that this option is enabled by default for backward compatibility. Link: https://uapi-group.org/kernel-features/#disabling-reception-of-scm_rights-for-af_unix-sockets #[0] Link: https://github.com/systemd/systemd/blob/v257.5/src/basic/fd-util.c#L612-L628 #[1] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23af_unix: Inherit sk_flags at connect().Kuniyuki Iwashima1-6/+6
For SOCK_STREAM embryo sockets, the SO_PASS{CRED,PIDFD,SEC} options are inherited from the parent listen()ing socket. Currently, this inheritance happens at accept(), because these attributes were stored in sk->sk_socket->flags and the struct socket is not allocated until accept(). This leads to unintentional behaviour. When a peer sends data to an embryo socket in the accept() queue, unix_maybe_add_creds() embeds credentials into the skb, even if neither the peer nor the listener has enabled these options. If the option is enabled, the embryo socket receives the ancillary data after accept(). If not, the data is silently discarded. This conservative approach works for SO_PASS{CRED,PIDFD,SEC}, but would not for SO_PASSRIGHTS; once an SCM_RIGHTS with a hung file descriptor was sent, it'd be game over. To avoid this, we will need to preserve SOCK_PASSRIGHTS even on embryo sockets. Commit aed6ecef55d7 ("af_unix: Save listener for embryo socket.") made it possible to access the parent's flags in sendmsg() via unix_sk(other)->listener->sk->sk_socket->flags, but this introduces an unnecessary condition that is irrelevant for most sockets, accept()ed sockets and clients. Therefore, we moved SOCK_PASSXXX into struct sock. Let’s inherit sk->sk_scm_recv_flags at connect() to avoid receiving SCM_RIGHTS on embryo sockets created from a parent with SO_PASSRIGHTS=0. Note that the parent socket is locked in connect() so we don't need READ_ONCE() for sk_scm_recv_flags. Now, we can remove !other->sk_socket check in unix_maybe_add_creds() to avoid slow SOCK_PASS{CRED,PIDFD} handling for embryo sockets created from a parent with SO_PASS{CRED,PIDFD}=0. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.Kuniyuki Iwashima3-52/+39
As explained in the next patch, SO_PASSRIGHTS would have a problem if we assigned a corresponding bit to socket->flags, so it must be managed in struct sock. Mixing socket->flags and sk->sk_flags for similar options will look confusing, and sk->sk_flags does not have enough space on 32bit system. Also, as mentioned in commit 16e572626961 ("af_unix: dont send SCM_CREDENTIALS by default"), SOCK_PASSCRED and SOCK_PASSPID handling is known to be slow, and managing the flags in struct socket cannot avoid that for embryo sockets. Let's move SOCK_PASS{CRED,PIDFD,SEC} to struct sock. While at it, other SOCK_XXX flags in net.h are grouped as enum. Note that assign_bit() was atomic, so the writer side is moved down after lock_sock() in setsockopt(), but the bit is only read once in sendmsg() and recvmsg(), so lock_sock() is not needed there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH}.Kuniyuki Iwashima1-0/+18
SCM_CREDENTIALS and SCM_SECURITY can be recv()ed by calling scm_recv() or scm_recv_unix(), and SCM_PIDFD is only used by scm_recv_unix(). scm_recv() is called from AF_NETLINK and AF_BLUETOOTH. scm_recv_unix() is literally called from AF_UNIX. Let's restrict SO_PASSCRED and SO_PASSSEC to such sockets and SO_PASSPIDFD to AF_UNIX only. Later, SOCK_PASS{CRED,PIDFD,SEC} will be moved to struct sock and united with another field. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23tcp: Restrict SO_TXREHASH to TCP socket.Kuniyuki Iwashima1-0/+5
sk->sk_txrehash is only used for TCP. Let's restrict SO_TXREHASH to TCP to reflect this. Later, we will make sk_txrehash a part of the union for other protocol families. Note that we need to modify BPF selftest not to get/set SO_TEREHASH for non-TCP sockets. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23scm: Move scm_recv() from scm.h to scm.c.Kuniyuki Iwashima1-0/+123
scm_recv() has been placed in scm.h since the pre-git era for no particular reason (I think), which makes the file really fragile. For example, when you move SOCK_PASSCRED from include/linux/net.h to enum sock_flags in include/net/sock.h, you will see weird build failure due to terrible dependency. To avoid the build failure in the future, let's move scm_recv(_unix())? and its callees to scm.c. Note that only scm_recv() needs to be exported for Bluetooth. scm_send() should be moved to scm.c too, but I'll revisit later. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23af_unix: Don't pass struct socket to maybe_add_creds().Kuniyuki Iwashima1-11/+12
We will move SOCK_PASS{CRED,PIDFD,SEC} from struct socket.flags to struct sock for better handling with SOCK_PASSRIGHTS. Then, we don't need to access struct socket in maybe_add_creds(). Let's pass struct sock to maybe_add_creds() and its caller queue_oob(). While at it, we append the unix_ prefix and fix double spaces around the pid assignment. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-23af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD.Kuniyuki Iwashima1-22/+15
Currently, the same checks for SOCK_PASSCRED and SOCK_PASSPIDFD are scattered across many places. Let's centralise the bit tests to make the following changes cleaner. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-22bpf, sockmap: Avoid using sk_socket after free when sendingJiayuan Chen1-0/+8
The sk->sk_socket is not locked or referenced in backlog thread, and during the call to skb_send_sock(), there is a race condition with the release of sk_socket. All types of sockets(tcp/udp/unix/vsock) will be affected. Race conditions: ''' CPU0 CPU1 backlog::skb_send_sock sendmsg_unlocked sock_sendmsg sock_sendmsg_nosec close(fd): ... ops->release() -> sock_map_close() sk_socket->ops = NULL free(socket) sock->ops->sendmsg ^ panic here ''' The ref of psock become 0 after sock_map_close() executed. ''' void sock_map_close() { ... if (likely(psock)) { ... // !! here we remove psock and the ref of psock become 0 sock_map_remove_links(sk, psock) psock = sk_psock_get(sk); if (unlikely(!psock)) goto no_psock; <=== Control jumps here via goto ... cancel_delayed_work_sync(&psock->work); <=== not executed sk_psock_put(sk, psock); ... } ''' Based on the fact that we already wait for the workqueue to finish in sock_map_close() if psock is held, we simply increase the psock reference count to avoid race conditions. With this patch, if the backlog thread is running, sock_map_close() will wait for the backlog thread to complete and cancel all pending work. If no backlog running, any pending work that hasn't started by then will fail when invoked by sk_psock_get(), as the psock reference count have been zeroed, and sk_psock_drop() will cancel all jobs via cancel_delayed_work_sync(). In summary, we require synchronization to coordinate the backlog thread and close() thread. The panic I catched: ''' Workqueue: events sk_psock_backlog RIP: 0010:sock_sendmsg+0x21d/0x440 RAX: 0000000000000000 RBX: ffffc9000521fad8 RCX: 0000000000000001 ... Call Trace: <TASK> ? die_addr+0x40/0xa0 ? exc_general_protection+0x14c/0x230 ? asm_exc_general_protection+0x26/0x30 ? sock_sendmsg+0x21d/0x440 ? sock_sendmsg+0x3e0/0x440 ? __pfx_sock_sendmsg+0x10/0x10 __skb_send_sock+0x543/0xb70 sk_psock_backlog+0x247/0xb80 ... ''' Fixes: 4b4647add7d3 ("sock_map: avoid race between sock_map_close and sk_psock_put") Reported-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20250516141713.291150-1-jiayuan.chen@linux.dev
2025-05-22Merge tag 'wireless-next-2025-05-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-nextJakub Kicinski5-11/+32
Johannes Berg says: ==================== Lots of new things, notably: * ath12k: monitor mode for WCN7850, better 6 GHz regulatory * brcmfmac: SAE for some Cypress devices * iwlwifi: rework device configuration * mac80211: scan improvements with MLO * mt76: EHT improvements, new device IDs * rtw88: throughput improvements * rtw89: MLO, STA/P2P concurrency improvements, SAR * tag 'wireless-next-2025-05-22' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (389 commits) wifi: mt76: mt7925: add rfkill_poll for hardware rfkill wifi: mt76: support power delta calculation for 5 TX paths wifi: mt76: fix available_antennas setting wifi: mt76: mt7996: fix RX buffer size of MCU event wifi: mt76: mt7996: change max beacon size wifi: mt76: mt7996: fix invalid NSS setting when TX path differs from NSS wifi: mt76: mt7996: drop fragments with multicast or broadcast RA wifi: mt76: mt7996: set EHT max ampdu length capability wifi: mt76: mt7996: fix beamformee SS field wifi: mt76: remove capability of partial bandwidth UL MU-MIMO wifi: mt76: mt7925: add test mode support wifi: mt76: mt7925: extend MCU support for testmode wifi: mt76: mt7925: ensure all MCU commands wait for response wifi: mt76: mt7925: refine the sniffer commnad wifi: mt76: mt7925: prevent multiple scan commands wifi: mt76: mt7915: Fix null-ptr-deref in mt7915_mmio_wed_init() wifi: mt76: mt7996: Fix null-ptr-deref in mt7996_mmio_wed_init() wifi: mt76: mt7925: add RNR scan support for 6GHz wifi: mt76: add mt76_connac_mcu_build_rnr_scan_param routine wifi: mt76: scan: Fix 'mlink' dereferenced before IS_ERR_OR_NULL check ... ==================== Link: https://patch.msgid.link/20250522165501.189958-50-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22Merge tag 'for-net-next-2025-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-nextJakub Kicinski11-66/+403
Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: core: - Add support for SIOCETHTOOL ETHTOOL_GET_TS_INFO - Separate CIS_LINK and BIS_LINK link types - Introduce HCI Driver protocol drivers: - btintel_pcie: Do not generate coredump for diagnostic events - btusb: Add HCI Drv commands for configuring altsetting - btusb: Add RTL8851BE device 0x0bda:0xb850 - btusb: Add new VID/PID 13d3/3584 for MT7922 - btusb: Add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: Implement host-wakeup feature * tag 'for-net-next-2025-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (23 commits) Bluetooth: btintel: Check dsbr size from EFI variable Bluetooth: MGMT: iterate over mesh commands in mgmt_mesh_foreach() Bluetooth: btusb: Add new VID/PID 13d3/3584 for MT7922 Bluetooth: btusb: use skb_pull to avoid unsafe access in QCA dump handling Bluetooth: L2CAP: Fix not checking l2cap_chan security level Bluetooth: separate CIS_LINK and BIS_LINK link types Bluetooth: btusb: Add new VID/PID 13d3/3630 for MT7925 Bluetooth: add support for SIOCETHTOOL ETHTOOL_GET_TS_INFO Bluetooth: btintel_pcie: Dump debug registers on error Bluetooth: ISO: Fix getpeername not returning sockaddr_iso_bc fields Bluetooth: ISO: Fix not using SID from adv report Revert "Bluetooth: btusb: add sysfs attribute to control USB alt setting" Revert "Bluetooth: btusb: Configure altsetting for HCI_USER_CHANNEL" Bluetooth: btusb: Add HCI Drv commands for configuring altsetting Bluetooth: Introduce HCI Driver protocol Bluetooth: btnxpuart: Implement host-wakeup feature dt-bindings: net: bluetooth: nxp: Add support for host-wakeup Bluetooth: btusb: Add RTL8851BE device 0x0bda:0xb850 Bluetooth: hci_uart: Remove unnecessary NULL check before release_firmware() Bluetooth: btmtksdio: Fix wakeup source leaks on device unbind ... ==================== Link: https://patch.msgid.link/20250522171048.3307873-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22Bluetooth: MGMT: iterate over mesh commands in mgmt_mesh_foreach()Dmitry Antipov1-1/+1
In 'mgmt_mesh_foreach()', iterate over mesh commands rather than generic mgmt ones. Compile tested only. Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-05-22Bluetooth: L2CAP: Fix not checking l2cap_chan security levelLuiz Augusto von Dentz1-7/+8
l2cap_check_enc_key_size shall check the security level of the l2cap_chan rather than the hci_conn since for incoming connection request that may be different as hci_conn may already been encrypted using a different security level. Fixes: 522e9ed157e3 ("Bluetooth: l2cap: Check encryption key size on incoming connection") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski19-185/+139
Cross-merge networking fixes after downstream PR (net-6.15-rc8). Conflicts: 80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X") 4bcc063939a5 ("ice, irdma: fix an off by one in error handling code") c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers") https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au No extra adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-22netfilter: nf_tables: nft_fib_ipv6: fix VRF ipv4/ipv6 result discrepancyFlorian Westphal1-4/+9
With a VRF, ipv4 and ipv6 FIB expression behave differently. fib daddr . iif oif Will return the input interface name for ipv4, but the real device for ipv6. Example: If VRF device name is tvrf and real (incoming) device is veth0. First round is ok, both ipv4 and ipv6 will yield 'veth0'. But in the second round (incoming device will be set to "tvrf"), ipv4 will yield "tvrf" whereas ipv6 returns "veth0" for the second round too. This makes ipv6 behave like ipv4. A followup patch will add a test case for this, without this change it will fail with: get element inet t fibif6iif { tvrf . dead:1::99 . tvrf } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FAIL: did not find tvrf . dead:1::99 . tvrf in fibif6iif Alternatively we could either not do anything at all or change ipv4 to also return the lower/real device, however, nft (userspace) doc says "iif: if fib lookup provides a route then check its output interface is identical to the packets input interface." which is what the nft fib ipv4 behaviour is. Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-22netfilter: xtables: support arpt_mark and ipv6 optstrip for iptables-nft only buildsFlorian Westphal2-3/+3
Its now possible to build a kernel that has no support for the classic xtables get/setsockopt interfaces and builtin tables. In this case, we have CONFIG_IP6_NF_MANGLE=n and CONFIG_IP_NF_ARPTABLES=n. For optstript, the ipv6 code is so small that we can enable it if netfilter ipv6 support exists. For mark, check if either classic arptables or NFT_ARP_COMPAT is set. Fixes: a9525c7f6219 ("netfilter: xtables: allow xtables-nft only builds") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-22net: Add support for providing the PTP hardware source in tsinfoKory Maincent2-5/+47
Multi-PTP source support within a network topology has been merged, but the hardware timestamp source is not yet exposed to users. Currently, users only see the PTP index, which does not indicate whether the timestamp comes from a PHY or a MAC. Add support for reporting the hwtstamp source using a hwtstamp-source field, alongside hwtstamp-phyindex, to describe the origin of the hardware timestamp. Remove HWTSTAMP_SOURCE_UNSPEC enum value as it is not used at all. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250519-feature_ptp_source-v4-1-5d10e19a0265@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22Merge tag 'ipsec-2025-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecPaolo Abeni8-114/+44
Steffen Klassert says: ==================== pull request (net): ipsec 2025-05-21 1) Fix some missing kfree_skb in the error paths of espintcp. From Sabrina Dubroca. 2) Fix a reference leak in espintcp. From Sabrina Dubroca. 3) Fix UDP GRO handling for ESPINUDP. From Tobias Brunner. 4) Fix ipcomp truesize computation on the receive path. From Sabrina Dubroca. 5) Sanitize marks before policy/state insertation. From Paul Chaignon. * tag 'ipsec-2025-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Sanitize marks before insert xfrm: ipcomp: fix truesize computation on receive xfrm: Fix UDP GRO handling for some corner cases espintcp: remove encap socket caching to avoid reference leak espintcp: fix skb leaks ==================== Link: https://patch.msgid.link/20250521054348.4057269-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_doneWang Liang1-0/+5
Syzbot reported a slab-use-after-free with the following call trace: ================================================================== BUG: KASAN: slab-use-after-free in tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840 Read of size 8 at addr ffff88807a733000 by task kworker/1:0/25 Call Trace: kasan_report+0xd9/0x110 mm/kasan/report.c:601 tipc_aead_encrypt_done+0x4bd/0x510 net/tipc/crypto.c:840 crypto_request_complete include/crypto/algapi.h:266 aead_request_complete include/crypto/internal/aead.h:85 cryptd_aead_crypt+0x3b8/0x750 crypto/cryptd.c:772 crypto_request_complete include/crypto/algapi.h:266 cryptd_queue_worker+0x131/0x200 crypto/cryptd.c:181 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 Allocated by task 8355: kzalloc_noprof include/linux/slab.h:778 tipc_crypto_start+0xcc/0x9e0 net/tipc/crypto.c:1466 tipc_init_net+0x2dd/0x430 net/tipc/core.c:72 ops_init+0xb9/0x650 net/core/net_namespace.c:139 setup_net+0x435/0xb40 net/core/net_namespace.c:343 copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508 create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228 ksys_unshare+0x419/0x970 kernel/fork.c:3323 __do_sys_unshare kernel/fork.c:3394 Freed by task 63: kfree+0x12a/0x3b0 mm/slub.c:4557 tipc_crypto_stop+0x23c/0x500 net/tipc/crypto.c:1539 tipc_exit_net+0x8c/0x110 net/tipc/core.c:119 ops_exit_list+0xb0/0x180 net/core/net_namespace.c:173 cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 After freed the tipc_crypto tx by delete namespace, tipc_aead_encrypt_done may still visit it in cryptd_queue_worker workqueue. I reproduce this issue by: ip netns add ns1 ip link add veth1 type veth peer name veth2 ip link set veth1 netns ns1 ip netns exec ns1 tipc bearer enable media eth dev veth1 ip netns exec ns1 tipc node set key this_is_a_master_key master ip netns exec ns1 tipc bearer disable media eth dev veth1 ip netns del ns1 The key of reproduction is that, simd_aead_encrypt is interrupted, leading to crypto_simd_usable() return false. Thus, the cryptd_queue_worker is triggered, and the tipc_crypto tx will be visited. tipc_disc_timeout tipc_bearer_xmit_skb tipc_crypto_xmit tipc_aead_encrypt crypto_aead_encrypt // encrypt() simd_aead_encrypt // crypto_simd_usable() is false child = &ctx->cryptd_tfm->base; simd_aead_encrypt crypto_aead_encrypt // encrypt() cryptd_aead_encrypt_enqueue cryptd_aead_enqueue cryptd_enqueue_request // trigger cryptd_queue_worker queue_work_on(smp_processor_id(), cryptd_wq, &cpu_queue->work) Fix this by holding net reference count before encrypt. Reported-by: syzbot+55c12726619ff85ce1f6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=55c12726619ff85ce1f6 Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Signed-off-by: Wang Liang <wangliang74@huawei.com> Link: https://patch.msgid.link/20250520101404.1341730-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-22sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()Cong Wang1-3/+3
When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the child qdisc's peek() operation before incrementing sch->q.qlen and sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may trigger an immediate dequeue and potential packet drop. In such cases, qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog have not yet been updated, leading to inconsistent queue accounting. This can leave an empty HFSC class in the active list, causing further consequences like use-after-free. This patch fixes the bug by moving the increment of sch->q.qlen and sch->qstats.backlog before the call to the child qdisc's peek() operation. This ensures that queue length and backlog are always accurate when packet drops or dequeues are triggered during the peek. Fixes: 12d0ad3be9c3 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline") Reported-by: Mingi Cho <mincho@theori.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-21net: add debug checks in ____napi_schedule() and napi_poll()Eric Dumazet1-2/+8
While tracking an IDPF bug, I found that idpf_vport_splitq_napi_poll() was not following NAPI rules. It can indeed return @budget after napi_complete() has been called. Add two debug conditions in networking core to hopefully catch this kind of bugs sooner. IDPF bug will be fixed in a separate patch. [ 72.441242] repoll requested for device eth1 idpf_vport_splitq_napi_poll [idpf] but napi is not scheduled. [ 72.446291] list_del corruption. next->prev should be ff31783d93b14040, but was ff31783d93b10080. (next=ff31783d93b10080) [ 72.446659] kernel BUG at lib/list_debug.c:67! [ 72.446816] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI [ 72.447031] CPU: 156 UID: 0 PID: 16258 Comm: ip Tainted: G W 6.15.0-dbg-DEV #1944 NONE [ 72.447340] Tainted: [W]=WARN [ 72.447702] RIP: 0010:__list_del_entry_valid_or_report (lib/list_debug.c:65) [ 72.450630] Call Trace: [ 72.450720] <IRQ> [ 72.450797] net_rx_action (include/linux/list.h:215 include/linux/list.h:287 net/core/dev.c:7385 net/core/dev.c:7516) [ 72.450928] ? lock_release (kernel/locking/lockdep.c:?) [ 72.451059] ? clockevents_program_event (kernel/time/clockevents.c:?) [ 72.451222] handle_softirqs (kernel/softirq.c:579) [ 72.451356] ? do_softirq (kernel/softirq.c:480) [ 72.451480] ? idpf_vc_xn_exec (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:462) idpf [ 72.451635] do_softirq (kernel/softirq.c:480) [ 72.451750] </IRQ> [ 72.451828] <TASK> [ 72.451905] __local_bh_enable_ip (kernel/softirq.c:?) [ 72.452051] idpf_vc_xn_exec (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:462) idpf [ 72.452210] idpf_send_delete_queues_msg (drivers/net/ethernet/intel/idpf/idpf_virtchnl.c:2083) idpf [ 72.452390] idpf_vport_stop (drivers/net/ethernet/intel/idpf/idpf_lib.c:837 drivers/net/ethernet/intel/idpf/idpf_lib.c:868) idpf [ 72.452541] ? idpf_vport_stop (include/linux/bottom_half.h:? include/linux/netdevice.h:4762 drivers/net/ethernet/intel/idpf/idpf_lib.c:855) idpf [ 72.452695] idpf_initiate_soft_reset (drivers/net/ethernet/intel/idpf/idpf_lib.c:?) idpf [ 72.452867] idpf_change_mtu (drivers/net/ethernet/intel/idpf/idpf_lib.c:2189) idpf [ 72.453015] netif_set_mtu_ext (net/core/dev.c:9437) [ 72.453157] ? packet_notifier (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/packet/af_packet.c:4240) [ 72.453292] netif_set_mtu (net/core/dev.c:9515) [ 72.453416] dev_set_mtu (net/core/dev_api.c:?) [ 72.453534] bond_change_mtu (drivers/net/bonding/bond_main.c:4833) [ 72.453666] netif_set_mtu_ext (net/core/dev.c:9437) [ 72.453803] do_setlink (net/core/rtnetlink.c:3116) [ 72.453925] ? rtnl_newlink (net/core/rtnetlink.c:3901) [ 72.454055] ? rtnl_newlink (net/core/rtnetlink.c:3901) [ 72.454185] ? rtnl_newlink (net/core/rtnetlink.c:3901) [ 72.454314] ? trace_contention_end (include/trace/events/lock.h:122) [ 72.454467] ? __mutex_lock (arch/x86/include/asm/preempt.h:85 kernel/locking/mutex.c:611 kernel/locking/mutex.c:746) [ 72.454597] ? cap_capable (include/trace/events/capability.h:26) [ 72.454721] ? security_capable (security/security.c:?) [ 72.454857] rtnl_newlink (net/core/rtnetlink.c:?) [ 72.454982] ? lock_is_held_type (kernel/locking/lockdep.c:5599 kernel/locking/lockdep.c:5938) [ 72.455121] ? __lock_acquire (kernel/locking/lockdep.c:?) [ 72.455256] ? __change_page_attr_set_clr (arch/x86/mm/pat/set_memory.c:685) [ 72.455438] ? __lock_acquire (kernel/locking/lockdep.c:?) [ 72.455582] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885) [ 72.455721] ? lock_acquire (kernel/locking/lockdep.c:5866) [ 72.455848] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885) [ 72.455987] ? lock_release (kernel/locking/lockdep.c:?) [ 72.456117] ? rcu_read_unlock (include/linux/rcupdate.h:341 include/linux/rcupdate.h:871) [ 72.456249] ? __pfx_rtnl_newlink (net/core/rtnetlink.c:3956) [ 72.456388] rtnetlink_rcv_msg (net/core/rtnetlink.c:6955) [ 72.456526] ? rtnetlink_rcv_msg (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 net/core/rtnetlink.c:6885) [ 72.456671] ? lock_acquire (kernel/locking/lockdep.c:5866) [ 72.456802] ? net_generic (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 include/net/netns/generic.h:45) [ 72.456929] ? __pfx_rtnetlink_rcv_msg (net/core/rtnetlink.c:6858) [ 72.457082] netlink_rcv_skb (net/netlink/af_netlink.c:2534) [ 72.457212] netlink_unicast (net/netlink/af_netlink.c:1313) [ 72.457344] netlink_sendmsg (net/netlink/af_netlink.c:1883) [ 72.457476] __sock_sendmsg (net/socket.c:712) [ 72.457602] ____sys_sendmsg (net/socket.c:?) [ 72.457735] ? _copy_from_user (arch/x86/include/asm/uaccess_64.h:126 arch/x86/include/asm/uaccess_64.h:134 arch/x86/include/asm/uaccess_64.h:141 include/linux/uaccess.h:178 lib/usercopy.c:18) [ 72.457875] ___sys_sendmsg (net/socket.c:2620) [ 72.458042] ? __call_rcu_common (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 arch/x86/include/asm/irqflags.h:159 kernel/rcu/tree.c:3107) [ 72.458185] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457) [ 72.458324] ? lock_acquire (kernel/locking/lockdep.c:5866) [ 72.458451] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457) [ 72.458588] ? lock_release (kernel/locking/lockdep.c:?) [ 72.458718] ? mntput_no_expire (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 fs/namespace.c:1457) [ 72.458856] __x64_sys_sendmsg (net/socket.c:2652) [ 72.458997] ? do_syscall_64 (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:119 include/linux/entry-common.h:198 arch/x86/entry/syscall_64.c:90) [ 72.459136] do_syscall_64 (arch/x86/entry/syscall_64.c:?) [ 72.459259] ? exc_page_fault (arch/x86/mm/fault.c:1542) [ 72.459387] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 72.459555] RIP: 0033:0x7fd15f17cbd0 Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250520121908.1805732-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: remove skb_copy_and_hash_datagram_iter()Eric Biggers1-37/+0
Now that skb_copy_and_hash_datagram_iter() is no longer used, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-11-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: add skb_copy_and_crc32c_datagram_iter()Eric Biggers1-0/+33
Since skb_copy_and_hash_datagram_iter() is used only with CRC32C, the crypto_ahash abstraction provides no value. Add skb_copy_and_crc32c_datagram_iter() which just calls crc32c() directly. This is faster and simpler. It also doesn't have the weird dependency issue where skb_copy_and_hash_datagram_iter() depends on CONFIG_CRYPTO_HASH=y without that being expressed explicitly in the kconfig (presumably because it was too heavyweight for NET to select). The new function is conditional on the hidden boolean symbol NET_CRC32C, which selects CRC32. So it gets compiled only when something that actually needs CRC32C packet checksums is enabled, it has no implicit dependency, and it doesn't depend on the heavyweight crypto layer. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-9-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: fold __skb_checksum() into skb_checksum()Eric Biggers1-52/+7
Now that the only remaining caller of __skb_checksum() is skb_checksum(), fold __skb_checksum() into skb_checksum(). This makes struct skb_checksum_ops unnecessary, so remove that too and simply do the "regular" net checksum. It also makes the wrapper functions csum_partial_ext() and csum_block_add_ext() unnecessary, so remove those too and just use the underlying functions. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-7-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21sctp: use skb_crc32c() instead of __skb_checksum()Eric Biggers6-7/+5
Make sctp_compute_cksum() just use the new function skb_crc32c(), instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC32C. This is faster and simpler. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-6-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21net: use skb_crc32c() in skb_crc32c_csum_help()Eric Biggers1-5/+3
Instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC32C, just call the new function skb_crc32c(). This is faster and simpler. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-4-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>