aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-23netfilter: nf_tables: Support wildcard netdev hook specsPhil Sutter2-16/+15
User space may pass non-nul-terminated NFTA_DEVICE_NAME attribute values to indicate a suffix wildcard. Expect for multiple devices to match the given prefix in nft_netdev_hook_alloc() and populate 'ops_list' with them all. When checking for duplicate hooks, compare the shortest prefix so a device may never match more than a single hook spec. Finally respect the stored prefix length when hooking into new devices from event handlers. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc()Phil Sutter1-9/+7
No point in having err_hook_alloc, just call return directly. Also rename err_hook_dev - it's not about the hook's device but freeing the hook itself. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Handle NETDEV_CHANGENAME eventsPhil Sutter2-18/+48
For the sake of simplicity, treat them like consecutive NETDEV_REGISTER and NETDEV_UNREGISTER events. If the new name matches a hook spec and registration fails, escalate the error and keep things as they are. To avoid unregistering the newly registered hook again during the following fake NETDEV_UNREGISTER event, leave hooks alone if their interface spec matches the new name. Note how this patch also skips for NETDEV_REGISTER if the device is already registered. This is not yet possible as the new name would have to match the old one. This will change with wildcard interface specs, though. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Wrap netdev notifiersPhil Sutter2-26/+46
Handling NETDEV_CHANGENAME events has to traverse all chains/flowtables twice, prepare for this. No functional change intended. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Respect NETDEV_REGISTER eventsPhil Sutter2-9/+60
Hook into new devices if their name matches the hook spec. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Prepare for handling NETDEV_REGISTER eventsPhil Sutter2-14/+24
Put NETDEV_UNREGISTER handling code into a switch, no functional change intended as the function is only called for that event yet. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Have a list of nf_hook_ops in nft_hookPhil Sutter5-63/+137
Supporting a 1:n relationship between nft_hook and nf_hook_ops is convenient since a chain's or flowtable's nft_hooks may remain in place despite matching interfaces disappearing. This stabilizes ruleset dumps in that regard and opens the possibility to claim newly added interfaces which match the spec. Also it prepares for wildcard interface specs since these will potentially match multiple interfaces. All spots dealing with hook registration are updated to handle a list of multiple nf_hook_ops, but nft_netdev_hook_alloc() only adds a single item for now to retain the old behaviour. The only expected functional change here is how vanishing interfaces are handled: Instead of dropping the respective nft_hook, only the matching nf_hook_ops are dropped. To safely remove individual ops from the list in netdev handlers, an rcu_head is added to struct nf_hook_ops so kfree_rcu() may be used. There is at least nft_flowtable_find_dev() which may be iterating through the list at the same time. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook()Phil Sutter1-11/+9
The function accesses only the hook's ops field, pass it directly. This prepares for nft_hooks holding a list of nf_hook_ops in future. While at it, make use of the function in __nft_unregister_flowtable_net_hooks() as well. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce nft_register_flowtable_ops()Phil Sutter1-11/+21
Facilitate binding and registering of a flowtable hook via a single function call. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}()Phil Sutter5-5/+31
Also a pretty dull wrapper around the hook->ops.dev comparison for now. Will search the embedded nf_hook_ops list in future. The ugly cast to eliminate the const qualifier will vanish then, too. Since this future list will be RCU-protected, also introduce an _rcu() variant here. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: Introduce functions freeing nft_hook objectsPhil Sutter1-14/+24
Pointless wrappers around kfree() for now, prep work for an embedded list of nf_hook_ops. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: add packets conntrack state to debug trace infoFlorian Westphal2-1/+61
Add the minimal relevant info needed for userspace ("nftables monitor trace") to provide the conntrack view of the packet: - state (new, related, established) - direction (original, reply) - status (e.g., if connection is subject to dnat) - id (allows to query ctnetlink for remaining conntrack state info) Example: trace id a62 inet filter PRE_RAW packet: iif "enp0s3" ether [..] [..] trace id a62 inet filter PRE_MANGLE conntrack: ct direction original ct state new ct id 32 trace id a62 inet filter PRE_MANGLE packet: [..] [..] trace id a62 inet filter IN conntrack: ct direction original ct state new ct status dnat-done ct id 32 [..] In this case one can see that while NAT is active, the new connection isn't subject to a translation. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: conntrack: make nf_conntrack_id callable without a module dependencyFlorian Westphal2-0/+7
While nf_conntrack_id() doesn't need any functionaliy from conntrack, it does reside in nf_conntrack_core.c -- callers add a module dependency on conntrack. Followup patch will need to compute the conntrack id from nf_tables_trace.c to include it in nf_trace messages emitted to userspace via netlink. I don't want to introduce a module dependency between nf_tables and conntrack for this. Since trace is slowpath, the added indirection is ok. One alternative is to move nf_conntrack_id to the netfilter/core.c, but I don't see a compelling reason so far. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmitSebastian Andrzej Siewior2-4/+21
nf_dup_skb_recursion is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Move nf_dup_skb_recursion to struct netdev_xmit, provide wrappers. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctxSebastian Andrzej Siewior1-3/+15
nft_pcpu_tun_ctx is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a nft_inner_tun_ctx member (original nft_pcpu_tun_ctx) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_dup{4, 6}: Move duplication check to task_structSebastian Andrzej Siewior7-22/+9
nf_skb_duplicated is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Due to the recursion involved, the simplest change is to make it a per-task variable. Move the per-CPU variable nf_skb_duplicated to task_struct and name it in_nf_duplicate. Add it to the existing bitfield so it doesn't use additional memory. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Valentin Schneider <vschneid@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nft_tunnel: fix geneve_opt dumpFernando Fernandez Mancera1-4/+4
When dumping a nft_tunnel with more than one geneve_opt configured the netlink attribute hierarchy should be as follow: NFTA_TUNNEL_KEY_OPTS | |--NFTA_TUNNEL_KEY_OPTS_GENEVE | | | |--NFTA_TUNNEL_KEY_GENEVE_CLASS | |--NFTA_TUNNEL_KEY_GENEVE_TYPE | |--NFTA_TUNNEL_KEY_GENEVE_DATA | |--NFTA_TUNNEL_KEY_OPTS_GENEVE | | | |--NFTA_TUNNEL_KEY_GENEVE_CLASS | |--NFTA_TUNNEL_KEY_GENEVE_TYPE | |--NFTA_TUNNEL_KEY_GENEVE_DATA | |--NFTA_TUNNEL_KEY_OPTS_GENEVE ... Otherwise, userspace tools won't be able to fetch the geneve options configured correctly. Fixes: 925d844696d9 ("netfilter: nft_tunnel: add support for geneve opts") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23selftests: netfilter: nft_fib.sh: add type and oif tests with and without VRFsFlorian Westphal1-27/+365
Replace the existing VRF test with a more comprehensive one. It tests following combinations: - fib type (returns address type, e.g. unicast) - fib oif (route output interface index - both with and without 'iif' keyword (changes result, e.g. 'fib daddr type local' will be true when the destination address is configured on the local machine, but 'fib daddr . iif type local' will only be true when the destination address is configured on the incoming interface. Add all types of addresses to test with for both ipv4 and ipv6: - local address on the incoming interface - local address on another interface - local address on another interface thats part of a vrf - address on another host The ruleset stores obtained results from 'fib' in nftables sets and then queries the sets to check that it has the expected results. Perform one pass while packets are coming in on interface NOT part of a VRF and then again when it was added and make sure fib returns the expected routes and address types for the various addresses in the setup. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-23netfilter: nf_tables: nft_fib: consistent l3mdev handlingFlorian Westphal3-5/+19
fib has two modes: 1. Obtain output device according to source or destination address 2. Obtain the type of the address, e.g. local, unicast, multicast. 'fib daddr type' should return 'local' if the address is configured in this netns or unicast otherwise. 'fib daddr . iif type' should return 'local' if the address is configured on the input interface or unicast otherwise, i.e. more restrictive. However, if the interface is part of a VRF, then 'fib daddr type' returns unicast even if the address is configured on the incoming interface. This is broken for both ipv4 and ipv6. In the ipv4 case, inet_dev_addr_type must only be used if the 'iif' or 'oif' (strict mode) was requested. Else inet_addr_type_dev_table() needs to be used and the correct dev argument must be passed as well so the correct fib (vrf) table is used. In the ipv6 case, the bug is similar, without strict mode, dev is NULL so .flowi6_l3mdev will be set to 0. Add a new 'nft_fib_l3mdev_master_ifindex_rcu()' helper and use that to init the .l3mdev structure member. For ipv6, use it from nft_fib6_flowi_init() which gets called from both the 'type' and the 'route' mode eval functions. This provides consistent behaviour for all modes for both ipv4 and ipv6: If strict matching is requested, the input respectively output device of the netfilter hooks is used. Otherwise, use skb->dev to obtain the l3mdev ifindex. Without this, most type checks in updated nft_fib.sh selftest fail: FAIL: did not find veth0 . 10.9.9.1 . local in fibtype4 FAIL: did not find veth0 . dead:1::1 . local in fibtype6 FAIL: did not find veth0 . dead:9::1 . local in fibtype6 FAIL: did not find tvrf . 10.0.1.1 . local in fibtype4 FAIL: did not find tvrf . 10.9.9.1 . local in fibtype4 FAIL: did not find tvrf . dead:1::1 . local in fibtype6 FAIL: did not find tvrf . dead:9::1 . local in fibtype6 FAIL: fib expression address types match (iif in vrf) (fib errounously returns 'unicast' for all of them, even though all of these addresses are local to the vrf). Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-22netfilter: nf_tables: nft_fib_ipv6: fix VRF ipv4/ipv6 result discrepancyFlorian Westphal1-4/+9
With a VRF, ipv4 and ipv6 FIB expression behave differently. fib daddr . iif oif Will return the input interface name for ipv4, but the real device for ipv6. Example: If VRF device name is tvrf and real (incoming) device is veth0. First round is ok, both ipv4 and ipv6 will yield 'veth0'. But in the second round (incoming device will be set to "tvrf"), ipv4 will yield "tvrf" whereas ipv6 returns "veth0" for the second round too. This makes ipv6 behave like ipv4. A followup patch will add a test case for this, without this change it will fail with: get element inet t fibif6iif { tvrf . dead:1::99 . tvrf } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FAIL: did not find tvrf . dead:1::99 . tvrf in fibif6iif Alternatively we could either not do anything at all or change ipv4 to also return the lower/real device, however, nft (userspace) doc says "iif: if fib lookup provides a route then check its output interface is identical to the packets input interface." which is what the nft fib ipv4 behaviour is. Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-22selftests: netfilter: move fib vrf test to nft_fib.shFlorian Westphal2-34/+90
It was located in conntrack_vrf.sh because that already had the VRF bits. Lets not add to this and move it to nft_fib.sh where this belongs. No functional changes for the subtest intended. The subtest is limited, it only covered 'fib oif' (route output interface query) when the incoming interface is part of a VRF. Next we can extend it to cover 'fib type' for VRFs and also check fib results when there is an unrelated VRF in same netns. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-22selftests: netfilter: nft_fib.sh: add 'type' mode testsFlorian Westphal1-10/+174
fib can either lookup the interface id/name of the output interface that would be used for the given address, or it can check for the type of the address according to the fib, e.g. local, unicast, multicast and so on. This can be used to e.g. make a locally configured address only reachable through its interface. Example: given eth0:10.1.1.1 and eth1:10.1.2.1 then 'fib daddr type' for 10.1.1.1 arriving on eth1 will be 'local', but 'fib daddr . iif type' is expected to return 'unicast', whereas 'fib daddr' and 'fib daddr . iif' are expected to indicate 'local' if such a packet arrives on eth0. So far nft_fib.sh only covered oif/oifname, not type. Repeat tests both with default and a policy (ip rule) based setup. Also try to run all remaining tests even if a subtest has failed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-22netfilter: xtables: support arpt_mark and ipv6 optstrip for iptables-nft only buildsFlorian Westphal2-3/+3
Its now possible to build a kernel that has no support for the classic xtables get/setsockopt interfaces and builtin tables. In this case, we have CONFIG_IP6_NF_MANGLE=n and CONFIG_IP_NF_ARPTABLES=n. For optstript, the ipv6 code is so small that we can enable it if netfilter ipv6 support exists. For mark, check if either classic arptables or NFT_ARP_COMPAT is set. Fixes: a9525c7f6219 ("netfilter: xtables: allow xtables-nft only builds") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-22selftests: netfilter: nft_concat_range.sh: add coverage for 4bit group representationFlorian Westphal1-4/+161
Pipapo supports a more compact '4 bit group' format that is chosen when the memory needed for the default exceeds a threshold (2mb). Add coverage for those code paths, the existing tests use small sets that are handled by the default representation. This comes with a test script run-time increase, but I think its ok: normal: 2m35s -> 3m9s debug: 3m24s -> 5m29s (with KSFT_MACHINE_SLOW=yes). Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-05-19queue_api: reduce risk of name collision over txqGur Stavi1-9/+9
Rename local variable in macros from txq to _txq. When macro parameter get_desc is expended it is likely to have a txq token that refers to a different txq variable at the caller's site. Signed-off-by: Gur Stavi <gur.stavi@huawei.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/95b60d218f004308486d92ed17c8cc6f28bac09d.1747559621.git.gur.stavi@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-19selftests: drv-net: Fix "envirnoments" to "environments"Sumanth Gavini1-1/+1
Fix misspelling reported by codespell Signed-off-by: Sumanth Gavini <sumanth.gavini@yahoo.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250516225156.1122058-1-sumanth.gavini@yahoo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-19net: netlink: reduce extack cookie sizeJohannes Berg1-1/+2
Seems like the extack cookie hasn't found any users outside of wireless, which always uses nl_set_extack_cookie_u64(). Thus, allocating 20 bytes for it is pointless, reduce that to 8 bytes, and add a BUILD_BUG_ON() to ensure it's enough (obviously it is, for a u64, but in case it changes again.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250516115927.38209-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16vsock/test: check also expected errno on sigpipe testStefano Garzarella1-2/+10
In the sigpipe test, we expect send() to fail, but we do not check if send() fails with the errno we expect (EPIPE). Add this check and repeat the send() in case of EINTR as we do in other tests. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250514141927.159456-4-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16vsock/test: retry send() to avoid occasional failure in sigpipe testStefano Garzarella1-8/+30
When the other peer calls shutdown(SHUT_RD), there is a chance that the send() call could occur before the message carrying the close information arrives over the transport. In such cases, the send() might still succeed. To avoid this race, let's retry the send() call a few times, ensuring the test is more reliable. Sleep a little before trying again to avoid flooding the other peer and filling its receive buffer, causing false-negative. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250514141927.159456-3-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16vsock/test: add timeout_usleep() to allow sleeping in timeout sectionsStefano Garzarella2-0/+19
The timeout API uses signals, so we have documented not to use sleep(), but we can use nanosleep(2) since POSIX.1 explicitly specifies that it does not interact with signals. Let's provide timeout_usleep() for that. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250514141927.159456-2-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl: add a sample for rt-linkJakub Kicinski2-0/+185
Add a fairly complete example of rt-link usage. If run without any arguments it simply lists the interfaces and some of their attrs. If run with an arg it tries to create and delete a netkit device. 1 # ./tools/net/ynl/samples/rt-link 1 2 Trying to create a Netkit interface 3 Testing error message for policy being bad: 4 Kernel error: 'Provided default xmit policy not supported' (bad attribute: .linkinfo.data(netkit).policy) 5 1: lo: mtu 65536 6 2: wlp0s1: mtu 1500 7 3: enp0s13: mtu 1500 8 4: dummy0: mtu 1500 kind dummy altname one two 9 5: nk0: mtu 1500 kind netkit primary 0 policy forward 10 6: nk1: mtu 1500 kind netkit primary 1 policy blackhole 11 Trying to delete a Netkit interface (ifindex 6) Sample creates the device first, it sets an invalid value for a netkit attribute to trigger reverse parsing. Line 4 shows the error with the attribute path correctly generated by YNL. Then sample fixes the bad attribute and re-issues the request, with NLM_F_ECHO set. This flag causes the notification to be looped back to the initiating socket (our socket). Sample parses this notification to save the ifindex of the created netkit. Sample then proceeds to list the devices. Line 8 above shows a dummy device with two alt names. Lines 9 and 10 show the netkit devices the sample itself created. The "primary" and "policy" attrs are from inside the netkit submsg. The string values are auto-generated for the enums by YNL. To clean up sample deletes the interface it created (line 11). Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl: enable codegen for all rt- familiesJakub Kicinski2-4/+7
Switch from including Classic netlink families one by one to excluding. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl: submsg: reverse parse / error reportingJakub Kicinski3-11/+107
Reverse parsing lets YNL convert bad and missing attr pointers from extack into a string like "missing attribute nest1.nest2.attr_name". It's a feature that's unique to YNL C AFAIU (even the Python YNL can't do nested reverse parsing). Add support for reverse-parsing of sub-messages. To simplify the logic and the code annotate the type policies with extra metadata. Mark the selectors and the messages with the information we need. We assume that key / selector always precedes the sub-message while parsing (and also if there are multiple sub-messages like in rt-link they are interleaved selector 1 ... submsg 1 ... selector 2 .. submsg 2, not selector 1 ... selector 2 ... submsg 1 ... submsg 2). The rt-link sample in a subsequent changes shows reverse parsing of sub-messages in action. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: submsg: support parsing and rendering sub-messagesJakub Kicinski4-4/+89
Adjust parsing and rendering appropriately to make sub-messages work. Rendering is pretty trivial, as the submsg -> netlink conversion looks like rendering a nest in which only one attr was set. Only trick is that we use the enum value of the sub-message rather than the nest as the type, and effectively skip one layer of nesting. A real double nested struct would look like this: [SELECTOR] [SUBMSG] [NEST] [MSG1-ATTR] A submsg "is" the nest so by skipping I mean: [SELECTOR] [SUBMSG] [MSG1-ATTR] There is no extra validation in YNL if caller has set the selector matching the submsg type (e.g. link type = "macvlan" but the nest attrs are set to carry "veth"). Let the kernel handle that. Parsing side is a little more specialized as we need to render and insert a new kind of function which switches between what to parse based on the selector. But code isn't too complicated. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: submsg: render the structsJakub Kicinski1-3/+43
The easiest (or perhaps only sane) way to support submessages in C is to treat them as if they were nests. Build fake attributes to that effect in the codegen. Render the submsg as a big nest of all possible values. With this in place the main missing part is to hook in the switch which selects how to parse based on the key. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: submsg: plumb thru an empty typeJakub Kicinski2-2/+23
Hook in handling of sub-messages, for now treat them as ignored attrs. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: prepare for submsg structsJakub Kicinski1-23/+39
Prepare for constructing Struct() instances which represent sub-messages rather than nested attributes. Restructure the code / indentation to more easily insert a case where nested reference comes from annotation other than the 'nested-attributes' property. Make sure we don't construct the Struct() object from scratch in multiple places as the constructor will soon have more arguments. This should cause no functional change. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: factor out the annotation of pure nested structJakub Kicinski1-17/+22
We're about to add some code here for sub-messages. Factor out the nest-related logic to make the code readable. No functional change. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16netlink: specs: rt-link: add C naming info for ovpnJakub Kicinski1-0/+4
C naming info for OVPN which was added since I adjusted the existing attrs. Also add missing reference to a header needed for a bridge struct. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: phy: microchip: document where the LAN88xx PHYs are usedOleksij Rempel1-0/+2
The driver uses the name LAN88xx for PHYs with phy_id = 0x0007c132. But with this placeholder name no documentation can be found on the net. Document the fact that these PHYs are build into the LAN7800 and LAN7850 USB/Ethernet controllers. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250515082051.2644450-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: phy: fixed_phy: remove fixed_phy_register_with_gpiodHeiner Kallweit2-39/+7
Since its introduction 6 yrs ago this functions has never had a user. So remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/ccbeef28-65ae-4e28-b1db-816c44338dee@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: rfs: add sock_rps_delete_flow() helperEric Dumazet4-3/+31
RFS can exhibit lower performance for workloads using short-lived flows and a small set of 4-tuple. This is often the case for load-testers, using a pair of hosts, if the server has a single listener port. Typical use case : Server : tcp_crr -T128 -F1000 -6 -U -l30 -R 14250 Client : tcp_crr -T128 -F1000 -6 -U -l30 -c -H server | grep local_throughput This is because RFS global hash table contains stale information, when the same RSS key is recycled for another socket and another cpu. Make sure to undo the changes and go back to initial state when a flow is disconnected. Performance of the above test is increased by 22 %, going from 372604 transactions per second to 457773. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Octavian Purdila <tavip@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20250515100354.3339920-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16r8169: add support for RTL8127AChunHao Lin3-3/+193
This adds support for 10Gbs chip RTL8127A. Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20250515095303.3138-1-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: dlink: add synchronization for stats updateMoon Yeounsu2-1/+15
This patch synchronizes code that accesses from both user-space and IRQ contexts. The `get_stats()` function can be called from both context. `dev->stats.tx_errors` and `dev->stats.collisions` are also updated in the `tx_errors()` function. Therefore, these fields must also be protected by synchronized. There is no code that accessses `dev->stats.tx_errors` between the previous and updated lines, so the updating point can be moved. Signed-off-by: Moon Yeounsu <yyyynoom@gmail.com> Link: https://patch.msgid.link/20250515075333.48290-1-yyyynoom@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net/mlx5e: Reuse per-RQ XDP buffer to avoid stack zeroing overheadCarolina Jubran3-43/+51
CONFIG_INIT_STACK_ALL_ZERO introduces a performance cost by zero-initializing all stack variables on function entry. The mlx5 XDP RX path previously allocated a struct mlx5e_xdp_buff on the stack per received CQE, resulting in measurable performance degradation under this config. This patch reuses a mlx5e_xdp_buff stored in the mlx5e_rq struct, avoiding per-CQE stack allocations and repeated zeroing. With this change, XDP_DROP and XDP_TX performance matches that of kernels built without CONFIG_INIT_STACK_ALL_ZERO. Performance was measured on a ConnectX-6Dx using a single RX channel (1 CPU at 100% usage) at ~50 Mpps. The baseline results were taken from net-next-6.15. Stack zeroing disabled: - XDP_DROP: * baseline: 31.47 Mpps * baseline + per-RQ allocation: 32.31 Mpps (+2.68%) - XDP_TX: * baseline: 12.41 Mpps * baseline + per-RQ allocation: 12.95 Mpps (+4.30%) Stack zeroing enabled: - XDP_DROP: * baseline: 24.32 Mpps * baseline + per-RQ allocation: 32.27 Mpps (+32.7%) - XDP_TX: * baseline: 11.80 Mpps * baseline + per-RQ allocation: 12.24 Mpps (+3.72%) Reported-by: Sebastiano Miano <mianosebastiano@gmail.com> Reported-by: Samuel Dobron <sdobron@redhat.com> Link: https://lore.kernel.org/all/CAMENy5pb8ea+piKLg5q5yRTMZacQqYWAoVLE1FE9WhQPq92E0g@mail.gmail.com/ Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/1747253032-663457-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: phy: mediatek: do not require syscon compatible for pio propertyFrank Wunderlich1-1/+9
Current implementation requires syscon compatible for pio property which is used for driving the switch leds on mt7988. Replace syscon_regmap_lookup_by_phandle with of_parse_phandle and device_node_to_regmap to get the regmap already assigned by pinctrl driver. Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Link: https://patch.msgid.link/20250510174933.154589-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16idpf: add support for Rx timestampingMilena Olech6-4/+150
Add Rx timestamp function when the Rx timestamp value is read directly from the Rx descriptor. In order to extend the Rx timestamp value to 64 bit in hot path, the PHC time is cached in the receive groups. Add supported Rx timestamp modes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: YiFei Zhu <zhuyifei@google.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add Tx timestamp flowsMilena Olech10-14/+805
Add functions to request Tx timestamp for the PTP packets, read the Tx timestamp when the completion tag for that packet is being received, extend the Tx timestamp value and set the supported timestamping modes. Tx timestamp is requested for the PTP packets by setting a TSYN bit and index value in the Tx context descriptor. The driver assumption is that the Tx timestamp value is ready to be read when the completion tag is received. Then the driver schedules delayed work and the Tx timestamp value read is requested through virtchnl message. At the end, the Tx timestamp value is extended to 64-bit and provided back to the skb. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add Tx timestamp capabilities negotiationMilena Olech6-4/+308
Tx timestamp capabilities are negotiated for the uplink Vport. Driver receives information about the number of available Tx timestamp latches, the size of Tx timestamp value and the set of indexes used for Tx timestamping. Add function to get the Tx timestamp capabilities and parse the uplink vport flag. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add PTP clock configurationMilena Olech4-3/+386
PTP clock configuration operations - set time, adjust time and adjust frequency are required to control the clock and maintain synchronization process. Extend get PTP capabilities function to request for the clock adjustments and add functions to enable these actions using dedicated virtchnl messages. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>