wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
4 days	tools: ynl: add sample for wireguard	Asbjørn Sloth Tønnesen	4	-0/+108
	Add a sample application for WireGuard, using the generated C library. The main benefit of this is to exercise the generated library, which might be useful for future selftests. In order to support usage with a pre-YNL wireguard.h in /usr/include, the former header guard is added to Makefile.deps as well. Example: $ make -C tools/net/ynl/lib $ make -C tools/net/ynl/generated $ make -C tools/net/ynl/samples wireguard $ ./tools/net/ynl/samples/wireguard usage: ./tools/net/ynl/samples/wireguard <ifindex\|ifname> $ sudo ./tools/net/ynl/samples/wireguard wg-test Interface 3: wg-test Peer 6adfb183a4a2c94a2f92dab5ade762a4788[...]: Data: rx: 42 / tx: 42 bytes Allowed IPs: 0.0.0.0/0 ::/0 Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	wireguard: uapi: generate header with ynl-gen	Asbjørn Sloth Tønnesen	2	-22/+22
	Use ynl-gen to generate the UAPI header for WireGuard. The cosmetic changes in this patch confirms that the spec is aligned with the implementation. By using the generated version, it ensures that they stay in sync. Changes in the generated header: * Trivial header guard rename. * Trivial white space changes. * Trivial comment changes. * Precompute bitflags in ynl-gen (see [1]). * Drop __*_F_ALL constants (see [1]). [1] https://lore.kernel.org/r/20251014123201.6ecfd146@kernel.org/ No behavioural changes intended. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	wireguard: uapi: move flag enums	Asbjørn Sloth Tønnesen	1	-11/+14
	Move the wg*_flag enums, so they are defined above the attribute set enums, where ynl-gen would place them. This is an incremental step towards adopting an UAPI header generated by ynl-gen. This is split out to keep the patches readable. This is a trivial patch with no behavioural changes intended. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	wireguard: uapi: move enum wg_cmd	Asbjørn Sloth Tønnesen	1	-7/+8
	This patch moves enum wg_cmd to the end of the file, where ynl-gen would generate it. This is an incremental step towards adopting an UAPI header generated by ynl-gen. This is split out to keep the patches readable. This is a trivial patch with no behavioural changes intended. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	netlink: specs: add specification for wireguard	Asbjørn Sloth Tønnesen	3	-129/+299
	This patch adds a near[1] complete YNL specification for WireGuard, documenting the protocol in a machine-readable format, rather than comments in wireguard.h, and eases usage from C and non-C programming languages alike. The generated C library will be featured in a later patch, so in this patch I will use the in-kernel python client for examples. This makes the documentation in the UAPI header redundant, it is therefore removed. The in-line documentation in the spec is based on the existing comment in wireguard.h, and once released it will be available in the kernel documentation at: https://docs.kernel.org/netlink/specs/wireguard.html (until then run: make htmldocs) Generate wireguard.rst from this spec: $ make -C tools/net/ynl/generated/ wireguard.rst Query wireguard interface through pyynl: $ sudo ./tools/net/ynl/pyynl/cli.py --family wireguard \ --dump get-device \ --json '{"ifindex":3}' [{'fwmark': 0, 'ifindex': 3, 'ifname': 'wg-test', 'listen-port': 54318, 'peers': [{0: {'allowedips': [{0: {'cidr-mask': 0, 'family': 2, 'ipaddr': '0.0.0.0'}}, {0: {'cidr-mask': 0, 'family': 10, 'ipaddr': '::'}}], 'endpoint': b'[...]', 'last-handshake-time': {'nsec': 42, 'sec': 42}, 'persistent-keepalive-interval': 42, 'preshared-key': '[...]', 'protocol-version': 1, 'public-key': '[...]', 'rx-bytes': 42, 'tx-bytes': 42}}], 'private-key': '[...]', 'public-key': '[...]'}] Add another allowed IP prefix: $ sudo ./tools/net/ynl/pyynl/cli.py --family wireguard \ --do set-device --json '{"ifindex":3,"peers":[ {"public-key":"6a df b1 83 a4 ..","allowedips":[ {"cidr-mask":0,"family":10,"ipaddr":"::"}]}]}' [1] As can be seen above, the "endpoint" is only dumped as binary data, as it can't be fully described in YNL. It's either a struct sockaddr_in or struct sockaddr_in6 depending on the attribute length. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	wireguard: netlink: lower .maxattr for WG_CMD_GET_DEVICE	Asbjørn Sloth Tønnesen	1	-1/+1
	Previously .maxattr was shared for both WG_CMD_GET_DEVICE and WG_CMD_SET_DEVICE. Now that it is split, then we can lower it for WG_CMD_GET_DEVICE to follow the documentation which defines .maxattr as WGDEVICE_A_IFNAME for WG_CMD_GET_DEVICE. $ grep -hC5 'one but not both of:' include/uapi/linux/wireguard.h * WG_CMD_GET_DEVICE * ----------------- * * May only be called via NLM_F_REQUEST \| NLM_F_DUMP. The command * should contain one but not both of: * * WGDEVICE_A_IFINDEX: NLA_U32 * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1 * * The kernel will then return several messages [...] While other attributes weren't rejected previously, the consensus is that nobody sends those attributes, so nothing should break. Link: https://lore.kernel.org/r/aRyLoy2iqbkUipZW@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	wireguard: netlink: convert to split ops	Asbjørn Sloth Tønnesen	1	-7/+9
	This patch converts WireGuard from using the legacy struct genl_ops to struct genl_split_ops, by applying the same transformation as genl_cmd_full_to_split() would otherwise do at runtime. WGDEVICE_A_MAX is swapped for WGDEVICE_A_PEERS, while they are currently equivalent, then .maxattr should be the maximum attribute that a given command supports, and not change along with WGDEVICE_A_MAX. This is an incremental step towards adopting netlink policy code generated by ynl-gen, ensuring that the code and spec is aligned. This is a trivial patch with no behavioural changes intended. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	wireguard: netlink: use WG_KEY_LEN in policies	Asbjørn Sloth Tønnesen	1	-4/+7
	When converting the netlink policies to YNL, the constants used in the policy have to be visible to userspace. As NOISE__KEY_LEN isn't visible to userspace, change the policy to use WG_KEY_LEN, as also documented in the UAPI header: $ grep WG_KEY_LEN include/uapi/linux/wireguard.h WGDEVICE_A_PRIVATE_KEY: NLA_EXACT_LEN, len WG_KEY_LEN * WGDEVICE_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN * WGPEER_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN * WGPEER_A_PRESHARED_KEY: NLA_EXACT_LEN, len WG_KEY_LEN [...] Add a couple of BUILD_BUG_ON() to ensure that they stay in sync. No behavioural changes intended. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	wireguard: netlink: validate nested arrays in policy	Asbjørn Sloth Tønnesen	1	-4/+6
	Use NLA_POLICY_NESTED_ARRAY() to perform nested array validation in the policy validation step. The nested policy was already enforced through nla_parse_nested(), however extack wasn't passed previously, so no fancy error messages. With the nested attributes being validated directly in the policy, the policy argument can be set to NULL in the calls to nla_parse_nested(). Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 days	wireguard: netlink: enable strict genetlink validation	Asbjørn Sloth Tønnesen	1	-1/+0
	WireGuard is a modern enough genetlink family, that it doesn't need resv_start_op. It already had policies in place when it was first merged, it has also never used the reserved field, or other things toggled by resv_start_op. wireguard-tools have always used zero initialized memory, and have never touched the reserved field, neither have any other clients I have checked. Closed-source clients are much more likely to use the embeddedable library from wireguard-tools, than a DIY implementation using uninitialized memory. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
5 days	drivers: net: fbnic: Return the true error in fbnic_alloc_napi_vectors.davem/net-next	Dimitri Daskalakis	1	-1/+1
	The error case in fbnic_alloc_napi_vectors defaulted to returning ENOMEM. This can mask the true error case, causing confusion. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Link: https://patch.msgid.link/20251124200518.1848029-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	selftest: af_unix: Extend recv() timeout in so_peek_off.c.	Kuniyuki Iwashima	1	-2/+2
	so_peek_off.c is reported to be flaky on NIPA: # # so_peek_off.c:149:two_chunks_overlap_blocking:Expected -1 (-1) != bytes (-1) # # two_chunks_overlap_blocking: Test terminated by assertion # # FAIL so_peek_off.stream.two_chunks_overlap_blocking The test fork()s a child process to send() data after 1ms to wake up the parent process being blocked (up to 3ms) on recv(). But, from the log, the parent woke up after 3ms timeout, so it could be too short when the host is overloaded. Let's extend it to 5s. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20251124070722.1e828c53@kernel.org/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124212805.486235-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	selftest: af_unix: Create its own .gitignore.	Kuniyuki Iwashima	2	-8/+8
	Somehow AF_UNIX tests have reused ../.gitignore, but now NIPA warns about it. Let's create .gitignore under af_unix/. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124212805.486235-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	virtio_net: enhance wake/stop tx queue statistics accounting	Liming Wu	1	-18/+26
	This patch refines and strengthens the statistics collection of TX queue wake/stop events introduced by commit c39add9b2423 ("virtio_net: Add TX stopped and wake counters"). Previously, the driver only recorded partial wake/stop statistics for TX queues. Some wake events triggered by 'skb_xmit_done()' or resume operations were not counted, which made the per-queue metrics incomplete. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251120015320.1418-1-liming.wu@jaguarmicro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	tcp: remove icsk->icsk_retransmit_timer	Eric Dumazet	8	-30/+25
	Now sk->sk_timer is no longer used by TCP keepalive, we can use its storage for TCP and MPTCP retransmit timers for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	tcp: introduce icsk->icsk_keepalive_timer	Eric Dumazet	11	-24/+35
	sk->sk_timer has been used for TCP keepalives. Keepalive timers are not in fast path, we want to use sk->sk_timer storage for retransmit timers, for better cache locality. Create icsk->icsk_keepalive_timer and change keepalive code to no longer use sk->sk_timer. Added space is reclaimed in the following patch. This includes changes to MPTCP, which was also using sk_timer. Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer for inet_sk_diag_fill() sake. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: move sk_dst_pending_confirm and sk_pacing_status to sock_read_tx group	Eric Dumazet	2	-4/+4
	These two fields are mostly read in TCP tx path, move them in an more appropriate group for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	tcp: rename icsk_timeout() to tcp_timeout_expires()	Eric Dumazet	6	-13/+12
	In preparation of sk->tcp_timeout_timer introduction, rename icsk_timeout() helper and change its argument to plain 'const struct sock *sk'. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	ice: fix broken Rx on VFs	Alexander Lobakin	1	-0/+3
	Since the tagged commit, ice stopped respecting Rx buffer length passed from VFs. At that point, the buffer length was hardcoded in ice, so VFs still worked up to some point (until, for example, a VF wanted an MTU larger than its PF). The next commit 93f53db9f9dc ("ice: switch to Page Pool"), broke Rx on VFs completely since ice started accounting per-queue buffer lengths again, but now VF queues always had their length zeroed, as ice was already ignoring what iavf was passing to it. Restore the line that initializes the buffer length on VF queues basing on the virtchnl messages. Fixes: 3a4f419f7509 ("ice: drop page splitting and recycling") Reported-by: Jakub Slepecki <jakub.slepecki@intel.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Jakub Slepecki <jakub.slepecki@intel.com> Link: https://patch.msgid.link/20251124170735.3077425-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	chtls: Avoid -Wflex-array-member-not-at-end warning	Gustavo A. R. Silva	1	-7/+1
	-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Use the `DEFINE_RAW_FLEX()` helper for on-stack definitions of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. So, with these changes, fix the following warning: drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c:163:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/aSQocKoJGkN0wzEj@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	tools: ynl-gen: add regeneration comment	Asbjørn Sloth Tønnesen	41	-0/+41
	Add a comment on regeneration to the generated files. The comment is placed after the YNL-GEN line[1], as to not interfere with ynl-regen.sh's detection logic. [1] and after the optional YNL-ARG line. Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	tools: ynl-gen: add function prefix argument	Asbjørn Sloth Tønnesen	1	-9/+16
	This patch adds a new CLI argument for overriding the default function prefix, as used for naming the doit/dumpit functions in the generated kernel code. When not specified the default "$(FAMILY)-nl" is used. This can also be specified persistently in generated files: /* YNL-ARG --function-prefix wg */ In the above example it causes the following changes: wireguard_nl_get_device_dumpit() -> wg_get_device_dumpit() wireguard_nl_get_device_doit() -> wg_get_device_doit() The variable name fn_prefix, was chosen as it relates to op_prefix which is used to prefix the UAPI commands enum entries. Link: https://lore.kernel.org/r/aRvWzC8qz3iXDAb3@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Link: https://patch.msgid.link/20251120174429.390574-2-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	ptp: ocp: Reuse META's PCI vendor ID	Andy Shevchenko	1	-3/+2
	The META's PCI vendor ID is listed already in the pci_ids.h. Reuse it here. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-5-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	ptp: ocp: Apply standard pattern for cleaning up loop	Andy Shevchenko	1	-2/+1
	The while (i--) is a standard pattern for the cleaning up loops. Apply this pattern where it makes sense in the driver. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-4-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	ptp: ocp: Make ptp_ocp_unregister_ext() NULL-aware	Andy Shevchenko	1	-14/+10
	It's a common practice to make resource release functions be NULL-aware. Make ptp_ocp_unregister_ext() NULL-aware. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	ptp: ocp: Refactor signal_show() and fix %ptT misuse	Andy Shevchenko	1	-9/+5
	Refactor signal_show() to avoid sequential calls to sysfs_emit*() and use the same pattern to get the index of a signal as it's done in signal_store(). While at it, fix wrong use of %ptT against struct timespec64. It's kinda lucky that it worked just because the first member there 64-bit and it's of time64_t type. Now with %ptS it may be used correctly. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	vsock/test: Extend transport change null-ptr-deref test	Michal Luczaj	1	-1/+6
	syzkaller reported a lockdep lock order inversion warning[1] due to commit 687aa0c5581b ("vsock: Fix transport_* TOCTOU"). This was fixed in commit f7c877e75352 ("vsock: fix lock inversion in vsock_assign_transport()"). Redo syzkaller's repro by piggybacking on a somewhat related test implemented in commit 3a764d93385c ("vsock/test: Add test for null ptr deref when transport changes"). [1]: https://lore.kernel.org/netdev/68f6cdb0.a70a0220.205af.0039.GAE@google.com/ Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251123-vsock_test-linger-lockdep-warn-v1-1-4b1edf9d8cdc@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	r8169: improve MAC EEE handling	Heiner Kallweit	1	-37/+36
	Let phydev->enable_tx_lpi control whether MAC enables TX LPI, instead of enabling it unconditionally. This way TX LPI is disabled if e.g. link partner doesn't support EEE. This helps to avoid potential issues like link flaps. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/91bcb837-3fab-4b4e-b495-038df0932e44@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: phy: mxl-gpy: add support for MxL86252 and MxL86282	Daniel Golle	1	-2/+89
	Add PHY driver support for Maxlinear MxL86252 and MxL86282 switches. The PHYs built-into those switches are just like any other GPY 2.5G PHYs with the exception of the temperature sensor data being encoded in a different way. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/a6cd7fe461b011cec2b59dffaf34e9c8b0819059.1763818120.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: phy: mxl-gpy: add support for MxL86211C	Chad Monroe	1	-0/+24
	MxL86211C is a smaller and more efficient version of the GPY211C. Add the PHY ID and phy_driver instance to the mxl-gpy driver. Signed-off-by: Chad Monroe <chad@monroe.io> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/cabf3559d6511bed6b8a925f540e3162efc20f6b.1763818120.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: mdio: remove redundant fwnode cleanup	Buday Csaba	2	-7/+1
	Remove redundant fwnode cleanup in of_mdiobus_register_device() and xpcs_plat_init_dev(). mdio_device_free() eventually calls mdio_device_release(), which already performs fwnode_handle_put(), making the manual cleanup unnecessary. Combine fwnode_handle_get() with device_set_node() in of_mdiobus_register_device() for clarity. Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/00847693daa8f7c8ff5dfa19dd35fc712fa4e2b5.1763995734.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: mdio: eliminate kdoc warnings in mdio_device.c and mdio_bus.c	Buday Csaba	2	-7/+55
	Fix all warnings reported by scripts/kernel-doc in mdio_device.c and mdio_bus.c Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/7ef7b80669da2b899d38afdb6c45e122229c3d8c.1763968667.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: enetc: update the base address of port MDIO registers for ENETC v4	Wei Fang	2	-2/+18
	Each ENETC has a set of external MDIO registers to access its external PHY based on its port EMDIO bus, these registers are used for MDIO bus access, such as setting the PHY address, PHY register address and value, read or write operations, C22 or C45 format, etc. The base address of this set of registers has been modified in ENETC v4 and is different from that in ENETC v1. So the base address needs to be updated so that ENETC v4 can use port MDIO to manage its own external PHY. Additionally, if ENETC has the PCS layer, it also has a set of internal MDIO registers for managing its on-die PHY (PCS/Serdes). The base address of this set of registers is also different from that of ENETC v1, so the base address also needs to be updated so that ENETC v4 can support the management of on-die PHY through the internal MDIO bus. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: enetc: set external PHY address in IERB for i.MX94 ENETC	Wei Fang	1	-0/+57
	NETC IP has only one external master MDIO interface (eMDIO) for managing the external PHYs. ENETC can use the interfaces provided by the EMDIO function or its port MDIO to access and manage its external PHY. Both the EMDIO function and the port MDIO are all virtual ports of the eMDIO. The difference is that the EMDIO function is a 'global port', it can access all the PHYs on the eMDIO, but port MDIO can only access its own PHY. To ensure that ENETC can only access its own PHY through port MDIO, LaBCR[MDIO_PHYAD_PRTAD] needs to be set, which represents the address of the external PHY connected to ENETC. If the accessed PHY address is not consistent with LaBCR[MDIO_PHYAD_PRTAD], then the MDIO access initiated by port MDIO will be invalid. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: enetc: set the external PHY address in IERB for port MDIO usage	Wei Fang	1	-1/+140
	The ENETC supports managing its own external PHY through its port MDIO functionality. To use this function, the PHY address needs be set in the corresponding LaBCR register in the Integrated Endpoint Register Block (IERB), which is used for pre-boot initialization of NETC PCIe functions. The port MDIO can only work properly when the PHY address accessed by the port MDIO matches the corresponding LaBCR[MDIO_PHYAD_PRTAD] value. Because the ENETC driver only registers the MDIO bus (port MDIO bus) when it detects an MDIO child node in its node, similarly, the netc-blk-ctrl driver only resolves the PHY address and sets it in the corresponding LaBCR when it detects an MDIO child node in the ENETC node. Co-developed-by: Aziz Sellami <aziz.sellami@nxp.com> Signed-off-by: Aziz Sellami <aziz.sellami@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: dsa: append ethtool counters of all hidden ports to conduit	Vladimir Oltean	1	-33/+93
	Currently there is no way to see packet counters on cascade ports, and no clarity on how the API for that would look like. Because it's something that is currently needed, just extend the hack where ethtool -S on the conduit interface dumps CPU port counters, and also use it to dump counters of cascade ports. Note that the "pXX_" naming convention changes to "sXX_pYY", to distinguish between ports having the same index but belonging to different switches. This has a slight chance of causing regressions to existing tooling: - grepping for "p04_counter_name" still works, but might return more than one string now - grepping for " p04_counter_name" no longer works Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: dsa: use kernel data types for ethtool ops on conduit	Vladimir Oltean	1	-6/+5
	Suppress some checkpatch 'CHECK' messages about u8 being preferable over uint8_t, etc. No functional change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: dsa: cpu_dp->orig_ethtool_ops might be NULL	Vladimir Oltean	1	-7/+7
	In theory this would have been seen by now, but it seems that all drivers used as DSA conduit interfaces thus far have had ethtool_ops set, and it's hard to even find modern Ethernet drivers (and not VF ones) which don't use ethtool. Here is the unfiltered list of drivers which register any sort of net_device but don't set its ethtool_ops pointer. I don't think any of them 'risks' being used as a DSA conduit, maybe except for moxart, rnpbge and icssm, I'm not sure. - drivers/net/can/dev/dev.c - drivers/net/wwan/qcom_bam_dmux.c - drivers/net/wwan/t7xx/t7xx_netdev.c - drivers/net/arcnet/arcnet.c - drivers/net/hamradio/ - drivers/net/slip/slip.c - drivers/net/ethernet/ezchip/nps_enet.c - drivers/net/ethernet/moxa/moxart_ether.c - drivers/net/ethernet/wangxun/txgbevf/txgbevf_main.c - drivers/net/ethernet/wangxun/ngbevf/ngbevf_main.c - drivers/net/ethernet/huawei/hinic3/hinic3_main.c - drivers/net/ethernet/i825xx/ - drivers/net/ethernet/ti/icssm/icssm_prueth.c - drivers/net/ethernet/seeq/ - drivers/net/ethernet/litex/litex_liteeth.c - drivers/net/ethernet/sunplus/spl2sw_driver.c - drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c - drivers/net/ipa/ - drivers/net/wireless/microchip/wilc1000/ - drivers/net/wireless/mediatek/mt76/dma.c - drivers/net/wireless/ath/ath12k/ - drivers/net/wireless/ath/ath11k/ - drivers/net/wireless/ath/ath6kl/ - drivers/net/wireless/ath/ath10k/ - drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c - drivers/net/wireless/virtual/mac80211_hwsim.c - drivers/net/wireless/quantenna/qtnfmac/pcie/pcie.c - drivers/net/wireless/realtek/rtw89/core.c - drivers/net/wireless/realtek/rtw88/pci.c - drivers/net/caif/ - drivers/net/plip/ - drivers/net/wan/ - drivers/net/mctp/ - drivers/net/ppp/ - drivers/net/thunderbolt/ Nonetheless, it's good for the framework not to make such assumptions, and not panic when coming across such kind of host device in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	cxgb4: Rename sched_class to avoid type clash	Alan Maguire	5	-32/+32
	drivers/net/ethernet/chelsio/cxgb4/sched.h declares a sched_class struct which has a type name clash with struct sched_class in kernel/sched/sched.h (a type used in a field in task_struct). When cxgb4 is a builtin we end up with both sched_class types, and as a result of this we wind up with DWARF (and derived from that BTF) with a duplicate incorrect task_struct representation. When cxgb4 is built-in this type clash can cause kernel builds to fail as resolve_btfids will fail when confused which task_struct to use. See [1] for more details. As such, renaming sched_class to ch_sched_class (in line with other structs like ch_sched_flowc) makes sense. [1] https://lore.kernel.org/bpf/2412725b-916c-47bd-91c3-c2d57e3e6c7b@acm.org/ Reported-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://patch.msgid.link/20251121181231.64337-1-alan.maguire@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	r8169: add support for RTL9151A	Javen Xu	1	-0/+3
	This adds support for chip RTL9151A. Its XID is 0x68b. It is bascially basd on the one with XID 0x688, but with different firmware file. Signed-off-by: Javen Xu <javen_xu@realsil.com.cn> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251121090104.3753-1-javen_xu@realsil.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net_sched: use qdisc_dequeue_drop() in cake, codel, fq_codel	Eric Dumazet	3	-3/+10
	cake, codel and fq_codel can drop many packets from dequeue(). Use qdisc_dequeue_drop() so that the freeing can happen outside of the qdisc spinlock scope. Add TCQ_F_DEQUEUE_DROPS to sch->flags. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-15-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net_sched: add qdisc_dequeue_drop() helper	Eric Dumazet	3	-14/+43
	Some qdisc like cake, codel, fq_codel might drop packets in their dequeue() method. This is currently problematic because dequeue() runs with the qdisc spinlock held. Freeing skbs can be extremely expensive. Add qdisc_dequeue_drop() method and a new TCQ_F_DEQUEUE_DROPS so that these qdiscs can opt-in to defer the skb frees after the socket spinlock is released. TCQ_F_DEQUEUE_DROPS is an attempt to not penalize other qdiscs with an extra cache line miss. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-14-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net_sched: add tcf_kfree_skb_list() helper	Eric Dumazet	2	-10/+16
	Using kfree_skb_list_reason() to free list of skbs from qdisc operations seems wrong as each skb might have a different drop reason. Cleanup __dev_xmit_skb() to call tcf_kfree_skb_list() once in preparation of the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-13-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net: annotate a data-race in __dev_xmit_skb()	Eric Dumazet	1	-1/+1
	q->limit is read locklessly, add a READ_ONCE(). Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-12-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net: prefech skb->priority in __dev_xmit_skb()	Eric Dumazet	1	-0/+1
	Most qdiscs need to read skb->priority at enqueue time(). In commit 100dfa74cad9 ("net: dev_queue_xmit() llist adoption") I added a prefetch(next), lets add another one for the second half of skb. Note that skb->priority and skb->hash share a common cache line, so this patch helps qdiscs needing both fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-11-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net_sched: sch_fq: prefetch one skb ahead in dequeue()	Eric Dumazet	1	-2/+5
	prefetch the skb that we are likely to dequeue at the next dequeue(). Also call fq_dequeue_skb() a bit sooner in fq_dequeue(). This reduces the window between read of q.qlen and changes of fields in the cache line that could be dirtied by another cpu trying to queue a packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-10-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net_sched: sch_fq: move qdisc_bstats_update() to fq_dequeue_skb()	Eric Dumazet	1	-1/+1
	Group together changes to qdisc fields to reduce chances of false sharing if another cpu attempts to acquire the qdisc spinlock. qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; qdisc_bstats_update(sch, skb); Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-9-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net_sched: add Qdisc_read_mostly and Qdisc_write groups	Eric Dumazet	1	-11/+18
	It is possible to reorg Qdisc to avoid always dirtying 2 cache lines in fast path by reducing this to a single dirtied cache line. In current layout, we change only four/six fields in the first cache line: - q.spinlock - q.qlen - bstats.bytes - bstats.packets - some Qdisc also change q.next/q.prev In the second cache line we change in the fast path: - running - state - qstats.backlog /* --- cacheline 2 boundary (128 bytes) --- / struct sk_buff_head gso_skb __attribute__((__aligned__(64))); / 0x80 0x18 / struct qdisc_skb_head q; / 0x98 0x18 / struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); / 0xb0 0x10 / / --- cacheline 3 boundary (192 bytes) --- / struct gnet_stats_queue qstats; / 0xc0 0x14 / bool running; / 0xd4 0x1 / / XXX 3 bytes hole, try to pack / unsigned long state; / 0xd8 0x8 / struct Qdisc next_sched; /* 0xe0 0x8 / struct sk_buff_head skb_bad_txq; / 0xe8 0x18 / / --- cacheline 4 boundary (256 bytes) --- / Reorganize things to have a first cache line mostly read, then a mostly written one. This gives a ~3% increase of performance under tx stress. Note that there is an additional hole because @qstats now spans over a third cache line. / --- cacheline 2 boundary (128 bytes) --- / __u8 __cacheline_group_begin__Qdisc_read_mostly[0] __attribute__((__aligned__(64))); / 0x80 0 / struct sk_buff_head gso_skb; / 0x80 0x18 / struct Qdisc next_sched; /* 0x98 0x8 / struct sk_buff_head skb_bad_txq; / 0xa0 0x18 / __u8 __cacheline_group_end__Qdisc_read_mostly[0]; / 0xb8 0 / / XXX 8 bytes hole, try to pack / / --- cacheline 3 boundary (192 bytes) --- / __u8 __cacheline_group_begin__Qdisc_write[0] __attribute__((__aligned__(64))); / 0xc0 0 / struct qdisc_skb_head q; / 0xc0 0x18 / unsigned long state; / 0xd8 0x8 / struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); / 0xe0 0x10 / bool running; / 0xf0 0x1 / / XXX 3 bytes hole, try to pack / struct gnet_stats_queue qstats; / 0xf4 0x14 / / --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- / __u8 __cacheline_group_end__Qdisc_write[0]; / 0x108 0 / / XXX 56 bytes hole, try to pack */ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-8-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net_sched: cake: use qdisc_pkt_segs()	Eric Dumazet	1	-9/+3
	Use new qdisc_pkt_segs() to avoid a cache line miss in cake_enqueue() for non GSO packets. cake_overhead() does not have to recompute it. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-7-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net_sched: use qdisc_skb_cb(skb)->pkt_segs in bstats_update()	Eric Dumazet	7	-4/+16
	Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. This gives a 5 % improvement in a TX intensive workload. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-6-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>