aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-19selftests: drv-net: Fix "envirnoments" to "environments"Sumanth Gavini1-1/+1
Fix misspelling reported by codespell Signed-off-by: Sumanth Gavini <sumanth.gavini@yahoo.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250516225156.1122058-1-sumanth.gavini@yahoo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-19net: netlink: reduce extack cookie sizeJohannes Berg1-1/+2
Seems like the extack cookie hasn't found any users outside of wireless, which always uses nl_set_extack_cookie_u64(). Thus, allocating 20 bytes for it is pointless, reduce that to 8 bytes, and add a BUILD_BUG_ON() to ensure it's enough (obviously it is, for a u64, but in case it changes again.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250516115927.38209-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16vsock/test: check also expected errno on sigpipe testStefano Garzarella1-2/+10
In the sigpipe test, we expect send() to fail, but we do not check if send() fails with the errno we expect (EPIPE). Add this check and repeat the send() in case of EINTR as we do in other tests. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250514141927.159456-4-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16vsock/test: retry send() to avoid occasional failure in sigpipe testStefano Garzarella1-8/+30
When the other peer calls shutdown(SHUT_RD), there is a chance that the send() call could occur before the message carrying the close information arrives over the transport. In such cases, the send() might still succeed. To avoid this race, let's retry the send() call a few times, ensuring the test is more reliable. Sleep a little before trying again to avoid flooding the other peer and filling its receive buffer, causing false-negative. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250514141927.159456-3-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16vsock/test: add timeout_usleep() to allow sleeping in timeout sectionsStefano Garzarella2-0/+19
The timeout API uses signals, so we have documented not to use sleep(), but we can use nanosleep(2) since POSIX.1 explicitly specifies that it does not interact with signals. Let's provide timeout_usleep() for that. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250514141927.159456-2-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl: add a sample for rt-linkJakub Kicinski2-0/+185
Add a fairly complete example of rt-link usage. If run without any arguments it simply lists the interfaces and some of their attrs. If run with an arg it tries to create and delete a netkit device. 1 # ./tools/net/ynl/samples/rt-link 1 2 Trying to create a Netkit interface 3 Testing error message for policy being bad: 4 Kernel error: 'Provided default xmit policy not supported' (bad attribute: .linkinfo.data(netkit).policy) 5 1: lo: mtu 65536 6 2: wlp0s1: mtu 1500 7 3: enp0s13: mtu 1500 8 4: dummy0: mtu 1500 kind dummy altname one two 9 5: nk0: mtu 1500 kind netkit primary 0 policy forward 10 6: nk1: mtu 1500 kind netkit primary 1 policy blackhole 11 Trying to delete a Netkit interface (ifindex 6) Sample creates the device first, it sets an invalid value for a netkit attribute to trigger reverse parsing. Line 4 shows the error with the attribute path correctly generated by YNL. Then sample fixes the bad attribute and re-issues the request, with NLM_F_ECHO set. This flag causes the notification to be looped back to the initiating socket (our socket). Sample parses this notification to save the ifindex of the created netkit. Sample then proceeds to list the devices. Line 8 above shows a dummy device with two alt names. Lines 9 and 10 show the netkit devices the sample itself created. The "primary" and "policy" attrs are from inside the netkit submsg. The string values are auto-generated for the enums by YNL. To clean up sample deletes the interface it created (line 11). Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl: enable codegen for all rt- familiesJakub Kicinski2-4/+7
Switch from including Classic netlink families one by one to excluding. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl: submsg: reverse parse / error reportingJakub Kicinski3-11/+107
Reverse parsing lets YNL convert bad and missing attr pointers from extack into a string like "missing attribute nest1.nest2.attr_name". It's a feature that's unique to YNL C AFAIU (even the Python YNL can't do nested reverse parsing). Add support for reverse-parsing of sub-messages. To simplify the logic and the code annotate the type policies with extra metadata. Mark the selectors and the messages with the information we need. We assume that key / selector always precedes the sub-message while parsing (and also if there are multiple sub-messages like in rt-link they are interleaved selector 1 ... submsg 1 ... selector 2 .. submsg 2, not selector 1 ... selector 2 ... submsg 1 ... submsg 2). The rt-link sample in a subsequent changes shows reverse parsing of sub-messages in action. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: submsg: support parsing and rendering sub-messagesJakub Kicinski4-4/+89
Adjust parsing and rendering appropriately to make sub-messages work. Rendering is pretty trivial, as the submsg -> netlink conversion looks like rendering a nest in which only one attr was set. Only trick is that we use the enum value of the sub-message rather than the nest as the type, and effectively skip one layer of nesting. A real double nested struct would look like this: [SELECTOR] [SUBMSG] [NEST] [MSG1-ATTR] A submsg "is" the nest so by skipping I mean: [SELECTOR] [SUBMSG] [MSG1-ATTR] There is no extra validation in YNL if caller has set the selector matching the submsg type (e.g. link type = "macvlan" but the nest attrs are set to carry "veth"). Let the kernel handle that. Parsing side is a little more specialized as we need to render and insert a new kind of function which switches between what to parse based on the selector. But code isn't too complicated. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: submsg: render the structsJakub Kicinski1-3/+43
The easiest (or perhaps only sane) way to support submessages in C is to treat them as if they were nests. Build fake attributes to that effect in the codegen. Render the submsg as a big nest of all possible values. With this in place the main missing part is to hook in the switch which selects how to parse based on the key. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: submsg: plumb thru an empty typeJakub Kicinski2-2/+23
Hook in handling of sub-messages, for now treat them as ignored attrs. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: prepare for submsg structsJakub Kicinski1-23/+39
Prepare for constructing Struct() instances which represent sub-messages rather than nested attributes. Restructure the code / indentation to more easily insert a case where nested reference comes from annotation other than the 'nested-attributes' property. Make sure we don't construct the Struct() object from scratch in multiple places as the constructor will soon have more arguments. This should cause no functional change. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16tools: ynl-gen: factor out the annotation of pure nested structJakub Kicinski1-17/+22
We're about to add some code here for sub-messages. Factor out the nest-related logic to make the code readable. No functional change. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16netlink: specs: rt-link: add C naming info for ovpnJakub Kicinski1-0/+4
C naming info for OVPN which was added since I adjusted the existing attrs. Also add missing reference to a header needed for a bridge struct. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250515231650.1325372-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: phy: microchip: document where the LAN88xx PHYs are usedOleksij Rempel1-0/+2
The driver uses the name LAN88xx for PHYs with phy_id = 0x0007c132. But with this placeholder name no documentation can be found on the net. Document the fact that these PHYs are build into the LAN7800 and LAN7850 USB/Ethernet controllers. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250515082051.2644450-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: phy: fixed_phy: remove fixed_phy_register_with_gpiodHeiner Kallweit2-39/+7
Since its introduction 6 yrs ago this functions has never had a user. So remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/ccbeef28-65ae-4e28-b1db-816c44338dee@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: rfs: add sock_rps_delete_flow() helperEric Dumazet4-3/+31
RFS can exhibit lower performance for workloads using short-lived flows and a small set of 4-tuple. This is often the case for load-testers, using a pair of hosts, if the server has a single listener port. Typical use case : Server : tcp_crr -T128 -F1000 -6 -U -l30 -R 14250 Client : tcp_crr -T128 -F1000 -6 -U -l30 -c -H server | grep local_throughput This is because RFS global hash table contains stale information, when the same RSS key is recycled for another socket and another cpu. Make sure to undo the changes and go back to initial state when a flow is disconnected. Performance of the above test is increased by 22 %, going from 372604 transactions per second to 457773. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Octavian Purdila <tavip@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20250515100354.3339920-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16r8169: add support for RTL8127AChunHao Lin3-3/+193
This adds support for 10Gbs chip RTL8127A. Signed-off-by: ChunHao Lin <hau@realtek.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20250515095303.3138-1-hau@realtek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: dlink: add synchronization for stats updateMoon Yeounsu2-1/+15
This patch synchronizes code that accesses from both user-space and IRQ contexts. The `get_stats()` function can be called from both context. `dev->stats.tx_errors` and `dev->stats.collisions` are also updated in the `tx_errors()` function. Therefore, these fields must also be protected by synchronized. There is no code that accessses `dev->stats.tx_errors` between the previous and updated lines, so the updating point can be moved. Signed-off-by: Moon Yeounsu <yyyynoom@gmail.com> Link: https://patch.msgid.link/20250515075333.48290-1-yyyynoom@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net/mlx5e: Reuse per-RQ XDP buffer to avoid stack zeroing overheadCarolina Jubran3-43/+51
CONFIG_INIT_STACK_ALL_ZERO introduces a performance cost by zero-initializing all stack variables on function entry. The mlx5 XDP RX path previously allocated a struct mlx5e_xdp_buff on the stack per received CQE, resulting in measurable performance degradation under this config. This patch reuses a mlx5e_xdp_buff stored in the mlx5e_rq struct, avoiding per-CQE stack allocations and repeated zeroing. With this change, XDP_DROP and XDP_TX performance matches that of kernels built without CONFIG_INIT_STACK_ALL_ZERO. Performance was measured on a ConnectX-6Dx using a single RX channel (1 CPU at 100% usage) at ~50 Mpps. The baseline results were taken from net-next-6.15. Stack zeroing disabled: - XDP_DROP: * baseline: 31.47 Mpps * baseline + per-RQ allocation: 32.31 Mpps (+2.68%) - XDP_TX: * baseline: 12.41 Mpps * baseline + per-RQ allocation: 12.95 Mpps (+4.30%) Stack zeroing enabled: - XDP_DROP: * baseline: 24.32 Mpps * baseline + per-RQ allocation: 32.27 Mpps (+32.7%) - XDP_TX: * baseline: 11.80 Mpps * baseline + per-RQ allocation: 12.24 Mpps (+3.72%) Reported-by: Sebastiano Miano <mianosebastiano@gmail.com> Reported-by: Samuel Dobron <sdobron@redhat.com> Link: https://lore.kernel.org/all/CAMENy5pb8ea+piKLg5q5yRTMZacQqYWAoVLE1FE9WhQPq92E0g@mail.gmail.com/ Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/1747253032-663457-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: phy: mediatek: do not require syscon compatible for pio propertyFrank Wunderlich1-1/+9
Current implementation requires syscon compatible for pio property which is used for driving the switch leds on mt7988. Replace syscon_regmap_lookup_by_phandle with of_parse_phandle and device_node_to_regmap to get the regmap already assigned by pinctrl driver. Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Link: https://patch.msgid.link/20250510174933.154589-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16idpf: add support for Rx timestampingMilena Olech6-4/+150
Add Rx timestamp function when the Rx timestamp value is read directly from the Rx descriptor. In order to extend the Rx timestamp value to 64 bit in hot path, the PHC time is cached in the receive groups. Add supported Rx timestamp modes. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: YiFei Zhu <zhuyifei@google.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add Tx timestamp flowsMilena Olech10-14/+805
Add functions to request Tx timestamp for the PTP packets, read the Tx timestamp when the completion tag for that packet is being received, extend the Tx timestamp value and set the supported timestamping modes. Tx timestamp is requested for the PTP packets by setting a TSYN bit and index value in the Tx context descriptor. The driver assumption is that the Tx timestamp value is ready to be read when the completion tag is received. Then the driver schedules delayed work and the Tx timestamp value read is requested through virtchnl message. At the end, the Tx timestamp value is extended to 64-bit and provided back to the skb. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Josh Hay <joshua.a.hay@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add Tx timestamp capabilities negotiationMilena Olech6-4/+308
Tx timestamp capabilities are negotiated for the uplink Vport. Driver receives information about the number of available Tx timestamp latches, the size of Tx timestamp value and the set of indexes used for Tx timestamping. Add function to get the Tx timestamp capabilities and parse the uplink vport flag. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Co-developed-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add PTP clock configurationMilena Olech4-3/+386
PTP clock configuration operations - set time, adjust time and adjust frequency are required to control the clock and maintain synchronization process. Extend get PTP capabilities function to request for the clock adjustments and add functions to enable these actions using dedicated virtchnl messages. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add mailbox access to read PTP clock timeMilena Olech5-0/+164
When the access to read PTP clock is specified as mailbox, the driver needs to send virtchnl message to perform PTP actions. Message is sent using idpf_mbq_opc_send_msg_to_peer_drv mailbox opcode, with the parameters received during PTP capabilities negotiation. Add functions to recognize PTP messages, move them to dedicated secondary mailbox, read the PTP clock time and cross timestamp using mailbox messages. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: negotiate PTP capabilities and get PTP clockMilena Olech7-0/+354
PTP capabilities are negotiated using virtchnl command. Add get capabilities function, direct access to read the PTP clock. Set initial PTP capabilities exposed to the stack. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Willem de Bruijn <willemb@google.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: move virtchnl structures to the header fileMilena Olech2-84/+86
Move virtchnl structures to the header file to expose them for the PTP virtchnl file. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16virtchnl: add PTP virtchnl definitionsMilena Olech1-0/+302
PTP capabilities are negotiated using virtchnl commands. There are two available modes of the PTP support: direct and mailbox. When the direct access to PTP resources is negotiated, virtchnl messages returns a set of registers that allow read/write directly. When the mailbox access to PTP resources is negotiated, virtchnl messages are used to access PTP clock and to read the timestamp values. Virtchnl API covers both modes and exposes a set of PTP capabilities. Using virtchnl API, the driver recognizes also HW abilities - maximum adjustment of the clock and the basic increment value. Additionally, API allows to configure the secondary mailbox, dedicated exclusively for PTP purposes. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: add initial PTP supportMilena Olech7-1/+140
PTP feature is supported if the VIRTCHNL2_CAP_PTP is negotiated during the capabilities recognition. Initial PTP support includes PTP initialization and registration of the clock. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Mina Almasry <almasrymina@google.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-16idpf: change the method for mailbox workqueue allocationMilena Olech1-3/+2
Since workqueues are created per CPU, the works scheduled to this workqueues are run on the CPU they were assigned. It may result in overloaded CPU that is not able to handle virtchnl messages in relatively short time. Allocating workqueue with WQ_UNBOUND and WQ_HIGHPRI flags allows scheduler to queue virtchl messages on less loaded CPUs, what eliminates delays. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Milena Olech <milena.olech@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-05-15net: stmmac: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()Vladimir Oltean2-44/+42
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the stmmac driver to the new API, so that timestamping configuration can be removed from the ndo_eth_ioctl() path completely. The existing timestamping calls are guarded by netif_running(). For stmmac_hwtstamp_get() that is probably unnecessary, since no hardware access is performed. But for stmmac_hwtstamp_set() I've preserved it, since at least some IPs probably need pm_runtime_resume_and_get() to access registers, which is otherwise called by __stmmac_open(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250514143249.1808377-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15net: lan743x: implement ndo_hwtstamp_get()Vladimir Oltean4-1/+23
Permit programs such as "hwtstamp_ctl -i eth0" to retrieve the current timestamping configuration of the NIC, rather than returning "Device driver does not have support for non-destructive SIOCGHWTSTAMP." The driver configures all channels with the same timestamping settings. On TX, retrieve the settings of the first channel, those should be representative for the entire NIC. On RX, save the filter settings in a new adapter field. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250514151931.1988047-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15net: lan743x: convert to ndo_hwtstamp_set()Vladimir Oltean3-27/+12
New timestamping API was introduced in commit 66f7223039c0 ("net: add NDOs for configuring hardware timestamping") from kernel v6.6. It is time to convert the lan743x driver to the new API, so that timestamping configuration can be removed from the ndo_eth_ioctl() path completely. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20250514151931.1988047-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tools: ynl-gen: array-nest: support arrays of nestsJakub Kicinski1-0/+3
TC needs arrays of nests, but just a put for now. Fairly straightforward addition. Link: https://patch.msgid.link/20250513222011.844106-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15net: sched: uapi: add more sanely named duplicate definesJakub Kicinski2-0/+2
The TCA_FLOWER_KEY_CFM enum has a UNSPEC and MAX with _OPT in the name, but the real attributes don't. Add a MAX that more reasonably matches the attrs. The PAD in TCA_TAPRIO is the only attr which doesn't have _ATTR in it, perhaps signifying that it's not a real attr? If so interesting idea in abstract but it makes codegen painful. Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250513221752.843102-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: increase tcp_rmem[2] to 32 MBEric Dumazet2-2/+2
Last change to tcp_rmem[2] happened in 2012, in commit b49960a05e32 ("tcp: change tcp_adv_win_scale and tcp_rmem[2]") TCP performance on WAN is mostly limited by tcp_rmem[2] for receivers. After this series improvements, it is time to increase the default. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250513193919.1089692-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: always use tcp_limit_output_bytes limitationEric Dumazet1-3/+2
This partially reverts commit c73e5807e4f6 ("tcp: tsq: no longer use limit_output_bytes for paced flows") Overriding the tcp_limit_output_bytes sysctl value for FQ enabled flows has the following problem: It allows TCP to queue around 2 ms worth of data per flow, defeating tcp_rcv_rtt_update() accuracy on the receiver, forcing it to increase sk->sk_rcvbuf even if the real RTT is around 100 us. After this change, we keep enough packets in flight to fill the pipe, and let receive queues small enough to get good cache behavior (cpu caches and/or NIC driver page pools). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250513193919.1089692-11-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: increase tcp_limit_output_bytes default value to 4MBEric Dumazet2-3/+3
Last change happened in 2018 with commit c73e5807e4f6 ("tcp: tsq: no longer use limit_output_bytes for paced flows") Modern NIC speeds got a 4x increase since then. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250513193919.1089692-10-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: skip big rtt sample if receive queue is not emptyEric Dumazet1-0/+3
tcp_rcv_rtt_update() role is to keep an estimation of RTT (tp->rcv_rtt_est.rtt_us) for receivers. If an application is too slow to drain the TCP receive queue, it is better to leave the RTT estimation small, so that tcp_rcv_space_adjust() does not inflate tp->rcvq_space.space and sk->sk_rcvbuf. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250513193919.1089692-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: always seek for minimal rtt in tcp_rcv_rtt_update()Eric Dumazet1-14/+8
tcp_rcv_rtt_update() goal is to maintain an estimation of the RTT in tp->rcv_rtt_est.rtt_us, used by tcp_rcv_space_adjust() When TCP TS are enabled, tcp_rcv_rtt_update() is using EWMA to smooth the samples. Change this to immediately latch the incoming value if it is lower than tp->rcv_rtt_est.rtt_us, so that tcp_rcv_space_adjust() does not overshoot tp->rcvq_space.space and sk->sk_rcvbuf. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250513193919.1089692-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: fix initial tp->rcvq_space.space value for passive TS enabled flowsEric Dumazet1-3/+3
tcp_rcv_state_process() must tweak tp->advmss for TS enabled flows before the call to tcp_init_transfer() / tcp_init_buffer_space(). Otherwise tp->rcvq_space.space is off by 120 bytes (TCP_INIT_CWND * TCPOLEN_TSTAMP_ALIGNED). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Link: https://patch.msgid.link/20250513193919.1089692-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: remove zero TCP TS samples for autotuningEric Dumazet1-5/+5
For TCP flows using ms RFC 7323 timestamp granularity tcp_rcv_rtt_update() can be fed with 1 ms samples, breaking TCP autotuning for data center flows with sub ms RTT. Instead, rely on the window based samples, fed by tcp_rcv_rtt_measure() tcp_rcvbuf_grow() for a 10 second TCP_STREAM sesssion now looks saner. We can see rcvbuf is kept at a reasonable value. 222.234976: tcp:tcp_rcvbuf_grow: time=348 rtt_us=330 copied=110592 inq=0 space=40960 ooo=0 scaling_ratio=230 rcvbuf=131072 ... 222.235276: tcp:tcp_rcvbuf_grow: time=300 rtt_us=288 copied=126976 inq=0 space=110592 ooo=0 scaling_ratio=230 rcvbuf=246187 ... 222.235569: tcp:tcp_rcvbuf_grow: time=294 rtt_us=288 copied=184320 inq=0 space=126976 ooo=0 scaling_ratio=230 rcvbuf=282659 ... 222.235833: tcp:tcp_rcvbuf_grow: time=264 rtt_us=244 copied=373760 inq=0 space=184320 ooo=0 scaling_ratio=230 rcvbuf=410312 ... 222.236142: tcp:tcp_rcvbuf_grow: time=308 rtt_us=219 copied=424960 inq=20480 space=373760 ooo=0 scaling_ratio=230 rcvbuf=832022 ... 222.236378: tcp:tcp_rcvbuf_grow: time=236 rtt_us=219 copied=692224 inq=49152 space=404480 ooo=0 scaling_ratio=230 rcvbuf=900407 ... 222.236602: tcp:tcp_rcvbuf_grow: time=225 rtt_us=219 copied=730112 inq=49152 space=643072 ooo=0 scaling_ratio=230 rcvbuf=1431534 ... 222.237050: tcp:tcp_rcvbuf_grow: time=229 rtt_us=219 copied=1160192 inq=49152 space=680960 ooo=0 scaling_ratio=230 rcvbuf=1515876 ... 222.237618: tcp:tcp_rcvbuf_grow: time=305 rtt_us=218 copied=2228224 inq=49152 space=1111040 ooo=0 scaling_ratio=230 rcvbuf=2473271 ... 222.238591: tcp:tcp_rcvbuf_grow: time=224 rtt_us=218 copied=3063808 inq=360448 space=2179072 ooo=0 scaling_ratio=230 rcvbuf=4850803 ... 222.240647: tcp:tcp_rcvbuf_grow: time=260 rtt_us=218 copied=2752512 inq=0 space=2703360 ooo=0 scaling_ratio=230 rcvbuf=6017914 ... 222.243535: tcp:tcp_rcvbuf_grow: time=224 rtt_us=218 copied=2834432 inq=49152 space=2752512 ooo=0 scaling_ratio=230 rcvbuf=6127331 ... 222.245108: tcp:tcp_rcvbuf_grow: time=240 rtt_us=218 copied=2883584 inq=49152 space=2785280 ooo=0 scaling_ratio=230 rcvbuf=6200275 ... 222.245333: tcp:tcp_rcvbuf_grow: time=224 rtt_us=218 copied=2859008 inq=0 space=2834432 ooo=0 scaling_ratio=230 rcvbuf=6309692 ... 222.301021: tcp:tcp_rcvbuf_grow: time=222 rtt_us=218 copied=2883584 inq=0 space=2859008 ooo=0 scaling_ratio=230 rcvbuf=6364400 ... 222.989242: tcp:tcp_rcvbuf_grow: time=225 rtt_us=218 copied=2899968 inq=0 space=2883584 ooo=0 scaling_ratio=230 rcvbuf=6419108 ... 224.139553: tcp:tcp_rcvbuf_grow: time=224 rtt_us=218 copied=3014656 inq=65536 space=2899968 ooo=0 scaling_ratio=230 rcvbuf=6455580 ... 224.584608: tcp:tcp_rcvbuf_grow: time=232 rtt_us=218 copied=3014656 inq=49152 space=2949120 ooo=0 scaling_ratio=230 rcvbuf=6564997 ... 230.145560: tcp:tcp_rcvbuf_grow: time=223 rtt_us=218 copied=2981888 inq=0 space=2965504 ooo=0 scaling_ratio=230 rcvbuf=6601469 ... Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Link: https://patch.msgid.link/20250513193919.1089692-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: add receive queue awareness in tcp_rcv_space_adjust()Eric Dumazet2-3/+5
If the application can not drain fast enough a TCP socket queue, tcp_rcv_space_adjust() can overestimate tp->rcvq_space.space. Then sk->sk_rcvbuf can grow and hit tcp_rmem[2] for no good reason. Fix this by taking into acount the number of available bytes. Keeping sk->sk_rcvbuf at the right size allows better cache efficiency. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Link: https://patch.msgid.link/20250513193919.1089692-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: adjust rcvbuf in presence of reordersEric Dumazet1-0/+4
This patch takes care of the needed provisioning when incoming packets are stored in the out of order queue. This part was not implemented in the correct way, we need to decouple it from tcp_rcv_space_adjust() logic. Without it, stalls in the pipe could happen. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250513193919.1089692-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: fix sk_rcvbuf overshootEric Dumazet1-34/+25
Current autosizing in tcp_rcv_space_adjust() is too aggressive. Instead of betting on possible losses and over estimate BDP, it is better to only account for slow start. The following patch is then adding a more precise tuning in the events of packet losses. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250513193919.1089692-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tcp: add tcp_rcvbuf_grow() tracepointEric Dumazet2-0/+75
Provide a new tracepoint to better understand tcp_rcv_space_adjust() (currently broken) behavior. Call it only when tcp_rcv_space_adjust() has a chance to make a change. I chose to leave trace_tcp_rcv_space_adjust() as is, because commit 6163849d289b ("net: introduce a new tracepoint for tcp_rcv_space_adjust") intent was to get it called after each data delivery to user space. Tested: Pair of hosts in the same rack. Ideally, sk->sk_rcvbuf should be kept small. echo "4096 131072 33554432" >/proc/sys/net/ipv4/tcp_rmem ./netserver perf record -C10 -e tcp:tcp_rcvbuf_grow sleep 30 <launch from client : netperf -H server -T,10> Trace for a TS enabled TCP flow (with standard ms granularity) perf script // We can see that sk_rcvbuf is growing very fast to tcp_mem[2] 260.500397: tcp:tcp_rcvbuf_grow: time=291 rtt_us=274 copied=110592 inq=0 space=41080 ooo=0 scaling_ratio=230 rcvbuf=131072 ... 260.501333: tcp:tcp_rcvbuf_grow: time=555 rtt_us=364 copied=333824 inq=0 space=110592 ooo=0 scaling_ratio=230 rcvbuf=1399144 ... 260.501664: tcp:tcp_rcvbuf_grow: time=331 rtt_us=330 copied=798720 inq=0 space=333824 ooo=0 scaling_ratio=230 rcvbuf=4110551 ... 260.502003: tcp:tcp_rcvbuf_grow: time=340 rtt_us=330 copied=1040384 inq=49152 space=798720 ooo=0 scaling_ratio=230 rcvbuf=7006410 ... 260.502483: tcp:tcp_rcvbuf_grow: time=479 rtt_us=330 copied=2658304 inq=49152 space=1040384 ooo=0 scaling_ratio=230 rcvbuf=7006410 ... 260.502899: tcp:tcp_rcvbuf_grow: time=416 rtt_us=413 copied=4026368 inq=147456 space=2658304 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.504233: tcp:tcp_rcvbuf_grow: time=493 rtt_us=487 copied=4800512 inq=196608 space=4026368 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.504792: tcp:tcp_rcvbuf_grow: time=559 rtt_us=551 copied=5672960 inq=49152 space=4800512 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.506614: tcp:tcp_rcvbuf_grow: time=610 rtt_us=607 copied=6688768 inq=180224 space=5672960 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.507280: tcp:tcp_rcvbuf_grow: time=666 rtt_us=656 copied=6868992 inq=49152 space=6688768 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.507979: tcp:tcp_rcvbuf_grow: time=699 rtt_us=699 copied=7000064 inq=0 space=6868992 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.508681: tcp:tcp_rcvbuf_grow: time=703 rtt_us=699 copied=7208960 inq=0 space=7000064 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.509426: tcp:tcp_rcvbuf_grow: time=744 rtt_us=737 copied=7569408 inq=0 space=7208960 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.510213: tcp:tcp_rcvbuf_grow: time=787 rtt_us=770 copied=7880704 inq=49152 space=7569408 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.511013: tcp:tcp_rcvbuf_grow: time=801 rtt_us=798 copied=8339456 inq=0 space=7880704 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.511860: tcp:tcp_rcvbuf_grow: time=847 rtt_us=824 copied=8601600 inq=49152 space=8339456 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.512710: tcp:tcp_rcvbuf_grow: time=850 rtt_us=846 copied=8814592 inq=65536 space=8601600 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.514428: tcp:tcp_rcvbuf_grow: time=871 rtt_us=865 copied=8855552 inq=49152 space=8814592 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.515333: tcp:tcp_rcvbuf_grow: time=905 rtt_us=882 copied=9228288 inq=49152 space=8855552 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.516237: tcp:tcp_rcvbuf_grow: time=905 rtt_us=896 copied=9371648 inq=49152 space=9228288 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.517149: tcp:tcp_rcvbuf_grow: time=911 rtt_us=909 copied=9543680 inq=49152 space=9371648 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.518070: tcp:tcp_rcvbuf_grow: time=921 rtt_us=921 copied=9793536 inq=0 space=9543680 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.520895: tcp:tcp_rcvbuf_grow: time=948 rtt_us=947 copied=10203136 inq=114688 space=9793536 ooo=0 scaling_ratio=230 rcvbuf=24622616 ... 260.521853: tcp:tcp_rcvbuf_grow: time=959 rtt_us=954 copied=10293248 inq=57344 space=10203136 ooo=0 scaling_ratio=230 rcvbuf=24691992 ... 260.522818: tcp:tcp_rcvbuf_grow: time=964 rtt_us=959 copied=10330112 inq=0 space=10293248 ooo=0 scaling_ratio=230 rcvbuf=24691992 ... 260.524760: tcp:tcp_rcvbuf_grow: time=979 rtt_us=969 copied=10633216 inq=49152 space=10330112 ooo=0 scaling_ratio=230 rcvbuf=24691992 ... 260.526709: tcp:tcp_rcvbuf_grow: time=975 rtt_us=973 copied=12013568 inq=163840 space=10633216 ooo=0 scaling_ratio=230 rcvbuf=25136755 ... 260.527694: tcp:tcp_rcvbuf_grow: time=985 rtt_us=976 copied=12025856 inq=32768 space=12013568 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 260.530655: tcp:tcp_rcvbuf_grow: time=991 rtt_us=986 copied=12050432 inq=98304 space=12025856 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 260.533626: tcp:tcp_rcvbuf_grow: time=993 rtt_us=989 copied=12124160 inq=0 space=12050432 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 260.538606: tcp:tcp_rcvbuf_grow: time=1000 rtt_us=994 copied=12222464 inq=49152 space=12124160 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 260.545605: tcp:tcp_rcvbuf_grow: time=1005 rtt_us=998 copied=12263424 inq=81920 space=12222464 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 260.553626: tcp:tcp_rcvbuf_grow: time=1005 rtt_us=999 copied=12320768 inq=12288 space=12263424 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 260.589749: tcp:tcp_rcvbuf_grow: time=1001 rtt_us=1000 copied=12398592 inq=16384 space=12320768 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 260.806577: tcp:tcp_rcvbuf_grow: time=1010 rtt_us=1000 copied=12402688 inq=32768 space=12398592 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 261.002386: tcp:tcp_rcvbuf_grow: time=1002 rtt_us=1000 copied=12419072 inq=98304 space=12402688 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 261.803432: tcp:tcp_rcvbuf_grow: time=1013 rtt_us=1000 copied=12468224 inq=49152 space=12419072 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 261.829533: tcp:tcp_rcvbuf_grow: time=1004 rtt_us=1000 copied=12615680 inq=0 space=12468224 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... 265.505435: tcp:tcp_rcvbuf_grow: time=1007 rtt_us=1000 copied=12632064 inq=32768 space=12615680 ooo=0 scaling_ratio=230 rcvbuf=33554432 ... We also see rtt_us going gradually to 1000 usec, causing massive overshoot. Trace for a usec TS enabled TCP flow (us granularity) perf script // We can see that sk_rcvbuf is growing to a smaller value, thanks to tight rtt_us values. 1509.273955: tcp:tcp_rcvbuf_grow: time=396 rtt_us=377 copied=110592 inq=0 space=41080 ooo=0 scaling_ratio=230 rcvbuf=131072 ... 1509.274366: tcp:tcp_rcvbuf_grow: time=412 rtt_us=365 copied=129024 inq=0 space=110592 ooo=0 scaling_ratio=230 rcvbuf=1399144 ... 1509.274738: tcp:tcp_rcvbuf_grow: time=372 rtt_us=355 copied=194560 inq=0 space=129024 ooo=0 scaling_ratio=230 rcvbuf=1399144 ... 1509.275020: tcp:tcp_rcvbuf_grow: time=282 rtt_us=257 copied=401408 inq=0 space=194560 ooo=0 scaling_ratio=230 rcvbuf=1399144 ... 1509.275190: tcp:tcp_rcvbuf_grow: time=170 rtt_us=144 copied=741376 inq=229376 space=401408 ooo=0 scaling_ratio=230 rcvbuf=3021625 ... 1509.275300: tcp:tcp_rcvbuf_grow: time=110 rtt_us=110 copied=1146880 inq=65536 space=741376 ooo=0 scaling_ratio=230 rcvbuf=4642390 ... 1509.275449: tcp:tcp_rcvbuf_grow: time=149 rtt_us=106 copied=1310720 inq=737280 space=1146880 ooo=0 scaling_ratio=230 rcvbuf=5498637 ... 1509.275560: tcp:tcp_rcvbuf_grow: time=111 rtt_us=107 copied=1388544 inq=430080 space=1310720 ooo=0 scaling_ratio=230 rcvbuf=5498637 ... 1509.275674: tcp:tcp_rcvbuf_grow: time=114 rtt_us=113 copied=1495040 inq=421888 space=1388544 ooo=0 scaling_ratio=230 rcvbuf=5498637 ... 1509.275800: tcp:tcp_rcvbuf_grow: time=126 rtt_us=126 copied=1572864 inq=77824 space=1495040 ooo=0 scaling_ratio=230 rcvbuf=5498637 ... 1509.275968: tcp:tcp_rcvbuf_grow: time=168 rtt_us=161 copied=1863680 inq=172032 space=1572864 ooo=0 scaling_ratio=230 rcvbuf=5498637 ... 1509.276129: tcp:tcp_rcvbuf_grow: time=161 rtt_us=161 copied=1941504 inq=204800 space=1863680 ooo=0 scaling_ratio=230 rcvbuf=5782790 ... 1509.276288: tcp:tcp_rcvbuf_grow: time=159 rtt_us=158 copied=1990656 inq=131072 space=1941504 ooo=0 scaling_ratio=230 rcvbuf=5782790 ... 1509.276900: tcp:tcp_rcvbuf_grow: time=228 rtt_us=226 copied=2883584 inq=266240 space=1990656 ooo=0 scaling_ratio=230 rcvbuf=5782790 ... 1509.277819: tcp:tcp_rcvbuf_grow: time=242 rtt_us=236 copied=3022848 inq=0 space=2883584 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.278072: tcp:tcp_rcvbuf_grow: time=253 rtt_us=247 copied=3055616 inq=49152 space=3022848 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.279560: tcp:tcp_rcvbuf_grow: time=268 rtt_us=264 copied=3133440 inq=180224 space=3055616 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.279833: tcp:tcp_rcvbuf_grow: time=274 rtt_us=270 copied=3424256 inq=0 space=3133440 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.282187: tcp:tcp_rcvbuf_grow: time=277 rtt_us=273 copied=3465216 inq=180224 space=3424256 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.284685: tcp:tcp_rcvbuf_grow: time=292 rtt_us=292 copied=3481600 inq=147456 space=3465216 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.284983: tcp:tcp_rcvbuf_grow: time=297 rtt_us=295 copied=3702784 inq=45056 space=3481600 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.285596: tcp:tcp_rcvbuf_grow: time=311 rtt_us=310 copied=3723264 inq=40960 space=3702784 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.285909: tcp:tcp_rcvbuf_grow: time=313 rtt_us=304 copied=3846144 inq=196608 space=3723264 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.291654: tcp:tcp_rcvbuf_grow: time=322 rtt_us=311 copied=3960832 inq=49152 space=3846144 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.291986: tcp:tcp_rcvbuf_grow: time=333 rtt_us=330 copied=4075520 inq=360448 space=3960832 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.292319: tcp:tcp_rcvbuf_grow: time=332 rtt_us=332 copied=4079616 inq=65536 space=4075520 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.292666: tcp:tcp_rcvbuf_grow: time=348 rtt_us=347 copied=4177920 inq=212992 space=4079616 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.293015: tcp:tcp_rcvbuf_grow: time=349 rtt_us=345 copied=4276224 inq=262144 space=4177920 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.293371: tcp:tcp_rcvbuf_grow: time=356 rtt_us=346 copied=4415488 inq=49152 space=4276224 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... 1509.515798: tcp:tcp_rcvbuf_grow: time=424 rtt_us=411 copied=4833280 inq=81920 space=4415488 ooo=0 scaling_ratio=230 rcvbuf=12316197 ... Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20250513193919.1089692-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15net: devmem: fix kernel panic when netlink socket close after module unloadTaehee Yoo3-0/+20
Kernel panic occurs when a devmem TCP socket is closed after NIC module is unloaded. This is Devmem TCP unregistration scenarios. number is an order. (a)netlink socket close (b)pp destroy (c)uninstall result 1 2 3 OK 1 3 2 (d)Impossible 2 1 3 OK 3 1 2 (e)Kernel panic 2 3 1 (d)Impossible 3 2 1 (d)Impossible (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is closed. (b) page_pool_destroy() is called when the interface is down. (c) mp_ops->uninstall() is called when an interface is unregistered. (d) There is no scenario in mp_ops->uninstall() is called before page_pool_destroy(). Because unregister_netdevice_many_notify() closes interfaces first and then calls mp_ops->uninstall(). (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire netdev_lock(). But if the interface module has already been removed, net_device pointer is invalid, so it causes kernel panic. In summary, there are only 3 possible scenarios. A. sk close -> pp destroy -> uninstall. B. pp destroy -> sk close -> uninstall. C. pp destroy -> uninstall -> sk close. Case C is a kernel panic scenario. In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set binding->dev to NULL. It indicates an bound net_device was unregistered. It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock() if binding->dev is NULL. A new binding->lock is added to protect a dev of a binding. So, lock ordering is like below. priv->lock netdev_lock(dev) binding->lock Tests: Scenario A: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 kill $pid ip link set $interface down modprobe -rv $module Scenario B: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 ip link set $interface down kill $pid modprobe -rv $module Scenario C: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 modprobe -rv $module sleep 5 kill $pid Splat looks like: Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf] CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G B W 6.15.0-rc1+ #2 PREEMPT(undef) 0947ec89efa0fd68838b78e36aa1617e97ff5d7f Tainted: [B]=BAD_PAGE, [W]=WARN RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746) Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811 RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4 R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88 FS: 0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ... netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3)) genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705) ... netlink_release (net/netlink/af_netlink.c:737) ... __sock_release (net/socket.c:647) sock_close (net/socket.c:1393) Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250514154028.1062909-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15tsnep: fix timestamping with a stacked DSA driverGerhard Engleder1-11/+19
This driver is susceptible to a form of the bug explained in commit c26a2c2ddc01 ("gianfar: Fix TX timestamping with a stacked DSA driver") and in Documentation/networking/timestamping.rst section "Other caveats for MAC drivers", specifically it timestamps any skb which has SKBTX_HW_TSTAMP, and does not consider if timestamping has been enabled in adapter->hwtstamp_config.tx_type. Evaluate the proper TX timestamping condition only once on the TX path (in tsnep_xmit_frame_ring()) and store the result in an additional TX entry flag. Evaluate the new TX entry flag in the TX confirmation path (in tsnep_tx_poll()). This way SKBTX_IN_PROGRESS is set by the driver as required, but never evaluated. SKBTX_IN_PROGRESS shall not be evaluated as it can be set by a stacked DSA driver and evaluating it would lead to unwanted timestamps. Fixes: 403f69bbdbad ("tsnep: Add TSN endpoint Ethernet MAC driver") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20250514195657.25874-1-gerhard@engleder-embedded.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15net: prestera: Use to_delayed_work()Chen Ni1-2/+1
Use to_delayed_work() instead of open-coding it. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Link: https://patch.msgid.link/20250514064053.2513921-1-nichen@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>