aboutsummaryrefslogtreecommitdiffstatshomepage
AgeCommit message (Collapse)AuthorFilesLines
2025-04-24ipv6: Get rid of RTNL for SIOCDELRT and RTM_DELROUTE.Kuniyuki Iwashima1-20/+28
Basically, removing an IPv6 route does not require RTNL because the IPv6 routing tables are protected by per table lock. inet6_rtm_delroute() calls nexthop_find_by_id() to check if the nexthop specified by RTA_NH_ID exists. nexthop uses rbtree and the top-down walk can be safely performed under RCU. ip6_route_del() already relies on RCU and the table lock, but we need to extend the RCU critical section a bit more to cover __ip6_del_rt(). For example, nexthop_for_each_fib6_nh() and inet6_rt_notify() needs RCU. Let's call nexthop_find_by_id() and __ip6_del_rt() under RCU and get rid of RTNL from inet6_rtm_delroute() and SIOCDELRT. Even if the nexthop is removed after rcu_read_unlock() in inet6_rtm_delroute(), __remove_nexthop_fib() cleans up the routes tied to the nexthop, and ip6_route_del() returns -ESRCH. So the request was at least valid as of nexthop_find_by_id(), and it's just a matter of timing. Note that we need to pass false to lwtunnel_valid_encap_type_attr(). The following patches also use the newroute bool. Note also that fib6_get_table() does not require RCU because once allocated fib6_table is not freed until netns dismantle. I will post a follow-up series to convert such callers to RCU-lockless version. [0] Link: https://lore.kernel.org/netdev/20250417174557.65721-1-kuniyu@amazon.com/ #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-3-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-24ipv6: Validate RTA_GATEWAY of RTA_MULTIPATH in rtm_to_fib6_config().Kuniyuki Iwashima1-39/+43
We will perform RTM_NEWROUTE and RTM_DELROUTE under RCU, and then we want to perform some validation out of the RCU scope. When creating / removing an IPv6 route with RTA_MULTIPATH, inet6_rtm_newroute() / inet6_rtm_delroute() validates RTA_GATEWAY in each multipath entry. Let's do that in rtm_to_fib6_config(). Note that now RTM_DELROUTE returns an error for RTA_MULTIPATH with 0 entries, which was accepted but should result in -EINVAL as RTM_NEWROUTE. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418000443.43734-2-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-23Merge branch 'net-mlx5-hws-improve-ip-version-handling'Jakub Kicinski4-22/+216
Mark Bloch says: ==================== net/mlx5: HWS, Improve IP version handling This small series hardens our checks against a single matcher containing rules that match on IPv4 and IPv6. This scenario is not supported by hardware steering and the implementation now signals this instead of failing silently. Patches: * Patch 1 forbids a single definer to match on mixed IP versions for source and destination address. * Patch 2 reproduces a couple of firmware checks: it forbids creating a definer that matches on IP address without matching on IP version, and also disallows matching on IPv6 addresses and the IPv4 IHL fields in the same definer. * Patch 3 forbids mixing rules that match on IPv4 and IPv6 addresses in the same matcher. The underlying definer mechanism does not support that. ==================== Link: https://patch.msgid.link/20250422092540.182091-1-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23net/mlx5: HWS, Disallow matcher IP version mixingVlad Dogaru3-0/+160
Signal clearly to the user, via an error, that mixing IPv4 and IPv6 rules in the same matcher is not supported. Previously such cases silently failed by adding a rule that did not work correctly. Rules can specify an IP version by one of two fields: IP version or ethertype. At matcher creation, store whether the template matches on any of these two fields. If yes, inspect each rule for its corresponding match value and store the IP version inside the matcher to guard against inconsistencies with subsequent rules. Furthermore, also check rules for internal consistency, i.e. verify that the ethertype and IP version match values do not contradict each other. The logic applies to inner and outer headers independently, to account for tunneling. Rules that do not match on IP addresses are not affected. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250422092540.182091-4-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23net/mlx5: HWS, Harden IP version definer checksVlad Dogaru1-2/+42
Replicate some sanity checks that firmware does, since hardware steering does not go through firmware. When creating a definer, disallow matching on IP addresses without also matching on IP version. The latter can be satisfied by matching either on the version field in the IP header, or on the ethertype field. Also refuse to match IPv4 IHL alongside IPv6. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250422092540.182091-3-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23net/mlx5: HWS, Fix IP version decisionVlad Dogaru1-22/+16
Unify the check for IP version when creating a definer. A given matcher is deemed to match on IPv6 if any of the higher order (>31) bits of source or destination address mask are set. A single packet cannot mix IP versions between source and destination addresses, so it makes no sense that they would be decided on independently. Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://patch.msgid.link/20250422092540.182091-2-mbloch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23octeontx2-pf: AF_XDP: code clean upHariprasad Kelam3-14/+35
The current API, otx2_xdp_sq_append_pkt, verifies the number of available descriptors before sending packets to the hardware. However, for AF_XDP, this check is unnecessary because the batch value is already determined based on the free descriptors. This patch introduces a new API, "otx2_xsk_sq_append_pkt" to address this. Remove the logic for releasing the TX buffers, as it is implicitly handled by xsk_tx_peek_release_desc_batch Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20250420032350.4047706-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueJakub Kicinski17-221/+876
Tony Nguyen says: ==================== igc: Add support for Frame Preemption Faizal Rahim says: Introduce support for the FPE feature in the IGC driver. The patches aligns with the upstream FPE API: https://patchwork.kernel.org/project/netdevbpf/cover/20230220122343.1156614-1-vladimir.oltean@nxp.com/ https://patchwork.kernel.org/project/netdevbpf/cover/20230119122705.73054-1-vladimir.oltean@nxp.com/ It builds upon earlier work: https://patchwork.kernel.org/project/netdevbpf/cover/20220520011538.1098888-1-vinicius.gomes@intel.com/ The patch series adds the following functionalities to the IGC driver: a) Configure FPE using `ethtool --set-mm`. b) Display FPE settings via `ethtool --show-mm`. c) View FPE statistics using `ethtool --include-statistics --show-mm'. e) Block setting preemptible tc in taprio since it is not supported yet. Existing code already blocks it in mqprio. Tested: Enabled CONFIG_PROVE_LOCKING, CONFIG_DEBUG_ATOMIC_SLEEP, CONFIG_DMA_API_DEBUG, and CONFIG_KASAN 1) selftests 2) netdev down/up cycles 3) suspend/resume cycles 4) fpe verification No bugs or unusual dmesg logs were observed. Ran 1), 2) and 3) with and without the patch series, compared dmesg and selftest logs - no differences found. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: add support to get frame preemption statistics via ethtool igc: add support to get MAC Merge data via ethtool igc: block setting preemptible traffic class in taprio igc: add support to set tx-min-frag-size igc: add support for frame preemption verification igc: set the RX packet buffer size for TSN mode igc: use FIELD_PREP and GENMASK for existing RX packet buffer size igc: optimize TX packet buffer utilization for TSN mode igc: use FIELD_PREP and GENMASK for existing TX packet buffer size igc: rename I225_RXPBSIZE_DEFAULT and I225_TXPBSIZE_DEFAULT igc: rename xdp_get_tx_ring() for non-xdp usage net: ethtool: mm: reset verification status when link is down net: ethtool: mm: extract stmmac verification logic into common library net: stmmac: move frag_size handling out of spin_lock ==================== Link: https://patch.msgid.link/20250418163822.3519810-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23Merge branch 'enable-multiple-irq-lines-support-in-airoha_eth-driver'Jakub Kicinski3-94/+283
Lorenzo Bianconi says: ==================== Enable multiple IRQ lines support in airoha_eth driver EN7581 ethernet SoC supports 4 programmable IRQ lines each one composed by 4 IRQ configuration registers to map Tx/Rx queues. Enable multiple IRQ lines support. ==================== Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-0-1ab0083ca3c1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23net: airoha: Enable multiple IRQ lines support in airoha_eth driver.Lorenzo Bianconi3-59/+206
EN7581 ethernet SoC supports 4 programmable IRQ lines for Tx and Rx interrupts. Enable multiple IRQ lines support. Map Rx/Tx queues to the available IRQ lines using the default scheme used in the vendor SDK: - IRQ0: rx queues [0-4],[7-9],15 - IRQ1: rx queues [21-30] - IRQ2: rx queues 5 - IRQ3: rx queues 6 Tx queues interrupts are managed by IRQ0. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-2-1ab0083ca3c1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23net: airoha: Introduce airoha_irq_bank structLorenzo Bianconi3-44/+86
EN7581 ethernet SoC supports 4 programmable IRQ lines each one composed by 4 IRQ configuration registers. Add airoha_irq_bank struct as a container for independent IRQ lines info (e.g. IRQ number, enabled source interrupts, ecc). This is a preliminary patch to support multiple IRQ lines in airoha_eth driver. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250418-airoha-eth-multi-irq-v1-1-1ab0083ca3c1@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23Merge branch 'r8169-merge-chip-versions'Jakub Kicinski3-30/+12
Heiner Kallweit says: ==================== r8169: merge chip versions After 2b065c098c37 ("r8169: refactor chip version detection") we can merge handling of few chip versions. ==================== Link: https://patch.msgid.link/5e1e14ea-d60f-4608-88eb-3104b6bbace8@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23r8169: merge chip versions 52 and 53 (RTL8117)Heiner Kallweit3-12/+7
Handling of both chip versions is the same, only difference is the firmware. So we can merge handling of both chip versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/ae866b71-c904-434e-befb-848c831e33ff@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23r8169: merge chip versions 64 and 65 (RTL8125D)Heiner Kallweit3-5/+1
Handling of both chip versions is the same, only difference is the firmware. So we can merge handling of both chip versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0baad123-c679-4154-923f-fdc12783e900@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23r8169: merge chip versions 70 and 71 (RTL8126A)Heiner Kallweit3-13/+4
Handling of both chip versions is the same, only difference is the firmware. So we can merge handling of both chip versions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/97d7ae79-d021-4b6b-b424-89e5e305b029@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23net: phy: remove function stubsHeiner Kallweit2-37/+1
All callers of these functions depend on PHYLIB or select it directly or indirectly by selecting PHYLINK. Stubs make sense for optional functionality, but that's not the case here. MDIO_XGENE usually is selected by NET_XGENE which also selects PHYLIB. Add a dependency to PHYLIB nevertheless, in order not to break randconfig builds. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/f7a69a1f-60e9-4ac0-8b7c-481e0cc850e7@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23net: Fix wild-memory-access in __register_pernet_operations() when CONFIG_NET_NS=n.Kuniyuki Iwashima1-6/+6
kernel test robot reported the splat below. [0] Before commit fed176bf3143 ("net: Add ops_undo_single for module load/unload."), if CONFIG_NET_NS=n, ops was linked to pernet_list only when init_net had not been initialised, and ops was unlinked from pernet_list only under the same condition. Let's say an ops is loaded before the init_net setup but unloaded after that. Then, the ops remains in pernet_list, which seems odd. The cited commit added ops_undo_single(), which calls list_add() for ops to link it to a temporary list, so a minor change was added to __register_pernet_operations() and __unregister_pernet_operations() under CONFIG_NET_NS=n to avoid the pernet_list corruption. However, the corruption must have been left as is. When CONFIG_NET_NS=n, pernet_list was used to keep ops registered before the init_net setup, and after that, pernet_list was not used at all. This was because some ops annotated with __net_initdata are cleared out of memory at some point during boot. Then, such ops is initialised by POISON_FREE_INITMEM (0xcc), resulting in that ops->list.{next,prev} suddenly switches from a valid pointer to a weird value, 0xcccccccccccccccc. To avoid such wild memory access, let's allow the pernet_list corruption for CONFIG_NET_NS=n. [0]: Oops: general protection fault, probably for non-canonical address 0xf999959999999999: 0000 [#1] SMP KASAN NOPTI KASAN: maybe wild-memory-access in range [0xccccccccccccccc8-0xcccccccccccccccf] CPU: 2 UID: 0 PID: 346 Comm: modprobe Not tainted 6.15.0-rc1-00294-ga4cba7e98e35 #85 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__list_add_valid_or_report (lib/list_debug.c:32) Code: 48 c1 ea 03 80 3c 02 00 0f 85 5a 01 00 00 49 39 74 24 08 0f 85 83 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 1f 01 00 00 4c 39 26 0f 85 ab 00 00 00 4c 39 ee RSP: 0018:ff11000135b87830 EFLAGS: 00010a07 RAX: dffffc0000000000 RBX: ffffffffc02223c0 RCX: ffffffff8406fcc2 RDX: 1999999999999999 RSI: cccccccccccccccc RDI: ffffffffc02223c0 RBP: ffffffff86064e40 R08: 0000000000000001 R09: fffffbfff0a9f5b5 R10: ffffffff854fadaf R11: 676552203a54454e R12: ffffffff86064e40 R13: ffffffffc02223c0 R14: ffffffff86064e48 R15: 0000000000000021 FS: 00007f6fb0d9e1c0(0000) GS:ff11000858ea0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6fb0eda580 CR3: 0000000122fec005 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> register_pernet_operations (./include/linux/list.h:150 (discriminator 5) ./include/linux/list.h:183 (discriminator 5) net/core/net_namespace.c:1315 (discriminator 5) net/core/net_namespace.c:1359 (discriminator 5)) register_pernet_subsys (net/core/net_namespace.c:1401) inet6_init (net/ipv6/af_inet6.c:535) ipv6 do_one_initcall (init/main.c:1257) do_init_module (kernel/module/main.c:2942) load_module (kernel/module/main.c:3409) init_module_from_file (kernel/module/main.c:3599) idempotent_init_module (kernel/module/main.c:3611) __x64_sys_finit_module (./include/linux/file.h:62 ./include/linux/file.h:83 kernel/module/main.c:3634 kernel/module/main.c:3621 kernel/module/main.c:3621) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f6fb0df7e5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007fffdc6a8968 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000055b535721b70 RCX: 00007f6fb0df7e5d RDX: 0000000000000000 RSI: 000055b51e44aa2a RDI: 0000000000000004 RBP: 0000000000040000 R08: 0000000000000000 R09: 000055b535721b30 R10: 0000000000000004 R11: 0000000000000246 R12: 000055b51e44aa2a R13: 000055b535721bf0 R14: 000055b5357220b0 R15: 0000000000000000 </TASK> Modules linked in: ipv6(+) crc_ccitt Fixes: fed176bf3143 ("net: Add ops_undo_single for module load/unload.") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202504181511.1c3f23e4-lkp@intel.com Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418215025.87871-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23Merge branch 'netlink-specs-rtnetlink-adjust-specs-for-c-codegen'Jakub Kicinski7-23/+69
Jakub Kicinski says: ==================== netlink: specs: rtnetlink: adjust specs for C codegen The first patch brings a schema extension allowing specifying "header" (as in .h file) properties in attribute sets. This is used for rare cases where we carry attributes from another family in a nest - we need to include the extra headers. If we were to generate kernel code we'd also need to skip it in the uAPI output. The remaining 11 patches are pretty boring schema adjustments. ==================== Link: https://patch.msgid.link/20250418021706.1967583-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-rule: add C naming infoJakub Kicinski1-0/+4
Add properties needed for C codegen to match names with uAPI headers. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-13-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rtnetlink: correct notify propertiesJakub Kicinski2-3/+3
The notify property should point at the object the notifications carry, usually the get object, not the cmd which triggers the notification: notify: description: Name of the command sharing the reply type with this notification. Not treating this as a fix, I think that only C codegen cares. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-12-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-neigh: make sure getneigh is consistentJakub Kicinski1-0/+1
The consistency check complains replies to do and dump don't match because dump has no value. It doesn't have to by the schema... but fixing this in code gen would be more code than adjusting the spec. This is rare. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-neigh: add C naming infoJakub Kicinski1-0/+9
Add properties needed for C codegen to match names with uAPI headers. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-link: add notification for newlinkJakub Kicinski1-0/+6
Add a notification entry for netlink so that we can test ntf handling in classic netlink and C. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-link: make bond's ipv6 address attribute fixed sizeJakub Kicinski1-0/+2
ns-ip6-target is an indexed-array. Codegen for variable size binary array would be a bit tedious, tell C that we know the size of these attributes, since they are IPv6 addrs. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-link: adjust AF_ nest for C codegenJakub Kicinski1-3/+5
The AF nest is indexed by AF ID, so it's a bit strange, but with minor adjustments C codegen deals with it just fine. Entirely unclear why the names have been in quotes here. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-link: add C naming infoJakub Kicinski1-1/+29
Add properties needed for C codegen to match names with uAPI headers. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-link: remove duplicated group in attr listJakub Kicinski1-1/+0
group is listed twice for newlink. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-link: remove if-netnsid from attr listJakub Kicinski1-1/+0
if-netnsid an alias to target-netnsid: IFLA_TARGET_NETNSID = IFLA_IF_NETNSID, /* new alias */ We don't have a definition for this attr. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: rt-link: remove the fixed members from attrsJakub Kicinski1-13/+0
The purpose of the attribute list is to list the attributes which will be included in a given message to shrink the objects for families with huge attr spaces. Fixed headers are always present in their entirety (between netlink header and the attrs) so there's no point in listing their members. Current C codegen doesn't expect them and tries to look them up in the attribute space. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23netlink: specs: allow header properties for attribute setsJakub Kicinski4-1/+10
rt-link has a number of disjoint headers, plus it uses attributes of other families (e.g. DPLL). Allow declaring a attribute set as "foreign" by specifying which header its definition is coming from. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250418021706.1967583-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23net: stmmac: dwc-qos: calibrate tegra with mdio bus idleRussell King (Oracle)1-2/+10
Thierry states that there are prerequists for Tegra's calibration that should be met before starting calibration - both the RGMII and MDIO interfaces should be idle. This commit adds the necessary MII bus locking to ensure that the MDIO interface is idle during calibration. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/E1u7EYR-001ZAS-Cr@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23bnxt_en: hide CONFIG_DETECT_HUNG_TASK specific codeArnd Bergmann1-0/+2
The CONFIG_DEFAULT_HUNG_TASK_TIMEOUT setting is only available when the hung task detection is enabled, otherwise the code now produces a build failure: drivers/net/ethernet/broadcom/bnxt/bnxt.c:10188:21: error: use of undeclared identifier 'CONFIG_DEFAULT_HUNG_TASK_TIMEOUT' 10188 | max_tmo_secs > CONFIG_DEFAULT_HUNG_TASK_TIMEOUT) { Enclose this warning logic in an #ifdef to ensure this builds. Fixes: 0fcad44a86bd ("bnxt_en: Change FW message timeout warning") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250423162827.2189658-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-23Merge branch 'bridge-mc-per-vlan-qquery'David S. Miller6-19/+261
Yong Wang says: ==================== bridge: multicast: per vlan query improvement when port or vlan state changes The current implementation of br_multicast_enable_port() only operates on port's multicast context, which doesn't take into account in case of vlan snooping, one downside is the port's igmp query timer will NOT resume when port state gets changed from BR_STATE_BLOCKING to BR_STATE_FORWARDING etc. Such code flow will briefly look like: 1.vlan snooping --> br_multicast_port_query_expired with per vlan port_mcast_ctx --> port in BR_STATE_BLOCKING state --> then one-shot timer discontinued The port state could be changed by STP daemon or kernel STP, taking mstpd as example: 2.mstpd --> netlink_sendmsg --> br_setlink --> br_set_port_state with non blocking states, i.e. BR_STATE_LEARNING or BR_STATE_FORWARDING --> br_port_state_selection --> br_multicast_enable_port --> enable multicast with port's multicast_ctx Here for per vlan snooping, the vlan context of the port should be used instead of port's multicast_ctx. The first patch corrects such behavior. Similarly, vlan state change also impacts multicast behavior, the 2nd patch adds function to update the corresponding multicast context when vlan state changes. The 3rd patch adds the selftests to confirm that IGMP/MLD query does happen when the STP state becomes forwarding. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-23selftests: net/bridge : add tests for per vlan snooping with stp state changesYong Wang3-8/+154
Change ALL_TESTS definition to "test-per-line". Add the test case of per vlan snooping with port stp state change to forwarding and also vlan equivalent case in both bridge_igmp.sh and bridge_mld.sh. Signed-off-by: Yong Wang <yongwang@nvidia.com> Reviewed-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-23net: bridge: mcast: update multicast contex when vlan state is changedYong Wang3-3/+38
When the vlan STP state is changed, which could be manipulated by "bridge vlan" commands, similar to port STP state, this also impacts multicast behaviors such as igmp query. In the scenario of per-VLAN snooping, there's a need to update the corresponding multicast context to re-arm the port query timer when vlan state becomes "forwarding" etc. Update br_vlan_set_state() function to enable vlan multicast context in such scenario. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <yongwang@nvidia.com> Reviewed-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-23net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functionsYong Wang1-8/+69
When a bridge port STP state is changed from BLOCKING/DISABLED to FORWARDING, the port's igmp query timer will NOT re-arm itself if the bridge has been configured as per-VLAN multicast snooping. Solve this by choosing the correct multicast context(s) to enable/disable port multicast based on whether per-VLAN multicast snooping is enabled or not, i.e. using per-{port, VLAN} context in case of per-VLAN multicast snooping by re-implementing br_multicast_enable_port() and br_multicast_disable_port() functions. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <yongwang@nvidia.com> Reviewed-by: Andy Roulin <aroulin@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-22xdp: create locked/unlocked instances of xdp redirect target settersJoshua Washington3-6/+26
Commit 03df156dd3a6 ("xdp: double protect netdev->xdp_flags with netdev->lock") introduces the netdev lock to xdp_set_features_flag(). The change includes a _locked version of the method, as it is possible for a driver to have already acquired the netdev lock before calling this helper. However, the same applies to xdp_features_(set|clear)_redirect_flags(), which ends up calling the unlocked version of xdp_set_features_flags() leading to deadlocks in GVE, which grabs the netdev lock as part of its suspend, reset, and shutdown processes: [ 833.265543] WARNING: possible recursive locking detected [ 833.270949] 6.15.0-rc1 #6 Tainted: G E [ 833.276271] -------------------------------------------- [ 833.281681] systemd-shutdow/1 is trying to acquire lock: [ 833.287090] ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: xdp_set_features_flag+0x29/0x90 [ 833.295470] [ 833.295470] but task is already holding lock: [ 833.301400] ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: gve_shutdown+0x44/0x90 [gve] [ 833.309508] [ 833.309508] other info that might help us debug this: [ 833.316130] Possible unsafe locking scenario: [ 833.316130] [ 833.322142] CPU0 [ 833.324681] ---- [ 833.327220] lock(&dev->lock); [ 833.330455] lock(&dev->lock); [ 833.333689] [ 833.333689] *** DEADLOCK *** [ 833.333689] [ 833.339701] May be due to missing lock nesting notation [ 833.339701] [ 833.346582] 5 locks held by systemd-shutdow/1: [ 833.351205] #0: ffffffffa9c89130 (system_transition_mutex){+.+.}-{4:4}, at: __se_sys_reboot+0xe6/0x210 [ 833.360695] #1: ffff93b399e5c1b8 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xb4/0x1f0 [ 833.369144] #2: ffff949d19a471b8 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xc2/0x1f0 [ 833.377603] #3: ffffffffa9eca050 (rtnl_mutex){+.+.}-{4:4}, at: gve_shutdown+0x33/0x90 [gve] [ 833.386138] #4: ffff949d2b148c68 (&dev->lock){+.+.}-{4:4}, at: gve_shutdown+0x44/0x90 [gve] Introduce xdp_features_(set|clear)_redirect_target_locked() versions which assume that the netdev lock has already been acquired before setting the XDP feature flag and update GVE to use the locked version. Fixes: 03df156dd3a6 ("xdp: double protect netdev->xdp_flags with netdev->lock") Tested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250422011643.3509287-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge branch 'implement-udp-tunnel-port-for-txgbe'Jakub Kicinski5-0/+71
Jiawen Wu says: ==================== Implement udp tunnel port for txgbe v3: https://lore.kernel.org/20250417080328.426554-1-jiawenwu@trustnetic.com v2: https://lore.kernel.org/20250414091022.383328-1-jiawenwu@trustnetic.com v1: https://lore.kernel.org/20250410074456.321847-1-jiawenwu@trustnetic.com ==================== Link: https://patch.msgid.link/20250421022956.508018-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: wangxun: restrict feature flags for tunnel packetsJiawen Wu4-0/+32
Implement ndo_features_check to restrict Tx checksum offload flags, since there are some inner layer length and protocols unsupported. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20250421022956.508018-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: txgbe: Support to set UDP tunnel portJiawen Wu2-0/+39
Tunnel types VXLAN/VXLAN_GPE/GENEVE are supported for txgbe devices. The hardware supports to set only one port for each tunnel type. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/20250421022956.508018-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge branch 'net-followup-series-for-exit_rtnl'Jakub Kicinski3-37/+25
Kuniyuki Iwashima says: ==================== net: Followup series for ->exit_rtnl(). Patch 1 drops the hold_rtnl arg in ops_undo_list() as suggested by Jakub. Patch 2 & 3 apply ->exit_rtnl() to pfcp and ppp. v1: https://lore.kernel.org/20250415022258.11491-1-kuniyu@amazon.com ==================== Link: https://patch.msgid.link/20250418003259.48017-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22ppp: Split ppp_exit_net() to ->exit_rtnl().Kuniyuki Iwashima1-15/+10
ppp_exit_net() unregisters devices related to the netns under RTNL and destroys lists and IDR. Let's use ->exit_rtnl() for the device unregistration part to save RTNL dances for each netns. Note that we delegate the for_each_netdev_safe() part to default_device_exit_batch() and replace unregister_netdevice_queue() with ppp_nl_dellink() to align with bond, geneve, gtp, and pfcp. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418003259.48017-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22pfcp: Convert pfcp_net_exit() to ->exit_rtnl().Kuniyuki Iwashima1-16/+7
pfcp_net_exit() holds RTNL and cleans up all devices in the netns and other devices tied to sockets in the netns. We can use ->exit_rtnl() to save RTNL dance for all dying netns. Note that we delegate the for_each_netdev() part to default_device_exit_batch() to avoid a list corruption splat like the one reported in commit 4ccacf86491d ("gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418003259.48017-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: Drop hold_rtnl arg from ops_undo_list().Kuniyuki Iwashima1-6/+8
ops_undo_list() first iterates over ops_list for ->pre_exit(). Let's check if any of the ops has ->exit_rtnl() there and drop the hold_rtnl argument. Note that nexthop uses ->exit_rtnl() and is built-in, so hold_rtnl is always true for setup_net() and cleanup_net() for now. Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/netdev/20250414170148.21f3523c@kernel.org/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250418003259.48017-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: stmmac: visconti: convert to set_clk_tx_rate() methodRussell King (Oracle)1-19/+6
Convert visconti to use the set_clk_tx_rate() method. By doing so, the GMAC control register will already have been updated (unlike with the fix_mac_speed() method) so this code can be removed while porting to the set_clk_tx_rate() method. There is also no need for the spinlock, and has never been - neither fix_mac_speed() nor set_clk_tx_rate() can be called by more than one thread at a time, so the lock does nothing useful. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1u5SiQ-001I0B-OQ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: dsa: rzn1_a5psw: Make the read-only array offsets static constColin Ian King1-2/+3
Don't populate the read-only array offsets on the stack at run time, instead make it static const. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://patch.msgid.link/20250417161353.490219-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22Merge branch 'add-gbeth-glue-layer-driver-for-renesas-rz-v2h-p-soc'Jakub Kicinski6-9/+383
Lad Prabhakar says: ==================== Add GBETH glue layer driver for Renesas RZ/V2H(P) SoC This patch series adds support for the GBETH (Gigabit Ethernet) glue layer driver for the Renesas RZ/V2H(P) SoC. The GBETH IP is integrated with the Synopsys DesignWare MAC (version 5.20). The changes include updating the device tree bindings, documenting the GBETH bindings, and adding the DWMAC glue layer for the Renesas GBETH. v1: https://lore.kernel.org/20250302181808.728734-1-prabhakar.mahadev-lad.rj@bp.renesas.com ==================== Link: https://patch.msgid.link/20250417084015.74154-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22MAINTAINERS: Add entry for Renesas RZ/V2H(P) DWMAC GBETH glue layer driverLad Prabhakar1-0/+8
Add a new MAINTAINERS entry for the Renesas RZ/V2H(P) DWMAC GBETH glue layer driver. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250417084015.74154-5-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22net: stmmac: Add DWMAC glue layer for Renesas GBETHLad Prabhakar3-0/+158
Add the DWMAC glue layer for the GBETH IP found in the Renesas RZ/V2H(P) SoC. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://patch.msgid.link/20250417084015.74154-4-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-22dt-bindings: net: Document support for Renesas RZ/V2H(P) GBETHLad Prabhakar2-0/+202
GBETH IP on the Renesas RZ/V2H(P) SoC is integrated with Synopsys DesignWare MAC (version 5.20). Document the device tree bindings for the GBETH glue layer. Generic compatible string 'renesas,rzv2h-gbeth' is added since this module is identical on both the RZ/V2H(P) and RZ/G3E SoCs. The Rx/Tx clocks supplied for GBETH on the RZ/V2H(P) SoC is depicted below: Rx / Tx -------+------------- on / off ------- | | Rx-180 / Tx-180 +---- not ---- on / off ------- Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20250417084015.74154-3-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>