| Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a sample application for WireGuard, using the generated C library.
The main benefit of this is to exercise the generated library,
which might be useful for future selftests.
In order to support usage with a pre-YNL wireguard.h in /usr/include,
the former header guard is added to Makefile.deps as well.
Example:
$ make -C tools/net/ynl/lib
$ make -C tools/net/ynl/generated
$ make -C tools/net/ynl/samples wireguard
$ ./tools/net/ynl/samples/wireguard
usage: ./tools/net/ynl/samples/wireguard <ifindex|ifname>
$ sudo ./tools/net/ynl/samples/wireguard wg-test
Interface 3: wg-test
Peer 6adfb183a4a2c94a2f92dab5ade762a4788[...]:
Data: rx: 42 / tx: 42 bytes
Allowed IPs:
0.0.0.0/0
::/0
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Use ynl-gen to generate the UAPI header for WireGuard.
The cosmetic changes in this patch confirms that the spec is aligned
with the implementation. By using the generated version, it ensures
that they stay in sync.
Changes in the generated header:
* Trivial header guard rename.
* Trivial white space changes.
* Trivial comment changes.
* Precompute bitflags in ynl-gen (see [1]).
* Drop __*_F_ALL constants (see [1]).
[1] https://lore.kernel.org/r/20251014123201.6ecfd146@kernel.org/
No behavioural changes intended.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Move the wg*_flag enums, so they are defined above the attribute set
enums, where ynl-gen would place them.
This is an incremental step towards adopting an UAPI header generated
by ynl-gen. This is split out to keep the patches readable.
This is a trivial patch with no behavioural changes intended.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This patch moves enum wg_cmd to the end of the file, where ynl-gen
would generate it.
This is an incremental step towards adopting an UAPI header generated
by ynl-gen. This is split out to keep the patches readable.
This is a trivial patch with no behavioural changes intended.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This patch adds a near[1] complete YNL specification for WireGuard,
documenting the protocol in a machine-readable format, rather than
comments in wireguard.h, and eases usage from C and non-C programming
languages alike.
The generated C library will be featured in a later patch, so in
this patch I will use the in-kernel python client for examples.
This makes the documentation in the UAPI header redundant, it is
therefore removed. The in-line documentation in the spec is based
on the existing comment in wireguard.h, and once released it will
be available in the kernel documentation at:
https://docs.kernel.org/netlink/specs/wireguard.html
(until then run: make htmldocs)
Generate wireguard.rst from this spec:
$ make -C tools/net/ynl/generated/ wireguard.rst
Query wireguard interface through pyynl:
$ sudo ./tools/net/ynl/pyynl/cli.py --family wireguard \
--dump get-device \
--json '{"ifindex":3}'
[{'fwmark': 0,
'ifindex': 3,
'ifname': 'wg-test',
'listen-port': 54318,
'peers': [{0: {'allowedips': [{0: {'cidr-mask': 0,
'family': 2,
'ipaddr': '0.0.0.0'}},
{0: {'cidr-mask': 0,
'family': 10,
'ipaddr': '::'}}],
'endpoint': b'[...]',
'last-handshake-time': {'nsec': 42, 'sec': 42},
'persistent-keepalive-interval': 42,
'preshared-key': '[...]',
'protocol-version': 1,
'public-key': '[...]',
'rx-bytes': 42,
'tx-bytes': 42}}],
'private-key': '[...]',
'public-key': '[...]'}]
Add another allowed IP prefix:
$ sudo ./tools/net/ynl/pyynl/cli.py --family wireguard \
--do set-device --json '{"ifindex":3,"peers":[
{"public-key":"6a df b1 83 a4 ..","allowedips":[
{"cidr-mask":0,"family":10,"ipaddr":"::"}]}]}'
[1] As can be seen above, the "endpoint" is only dumped as binary data,
as it can't be fully described in YNL. It's either a struct
sockaddr_in or struct sockaddr_in6 depending on the attribute length.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Previously .maxattr was shared for both WG_CMD_GET_DEVICE and
WG_CMD_SET_DEVICE. Now that it is split, then we can lower it
for WG_CMD_GET_DEVICE to follow the documentation which defines
.maxattr as WGDEVICE_A_IFNAME for WG_CMD_GET_DEVICE.
$ grep -hC5 'one but not both of:' include/uapi/linux/wireguard.h
* WG_CMD_GET_DEVICE
* -----------------
*
* May only be called via NLM_F_REQUEST | NLM_F_DUMP. The command
* should contain one but not both of:
*
* WGDEVICE_A_IFINDEX: NLA_U32
* WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1
*
* The kernel will then return several messages [...]
While other attributes weren't rejected previously, the consensus
is that nobody sends those attributes, so nothing should break.
Link: https://lore.kernel.org/r/aRyLoy2iqbkUipZW@zx2c4.com/
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
This patch converts WireGuard from using the legacy struct genl_ops
to struct genl_split_ops, by applying the same transformation as
genl_cmd_full_to_split() would otherwise do at runtime.
WGDEVICE_A_MAX is swapped for WGDEVICE_A_PEERS, while they are
currently equivalent, then .maxattr should be the maximum attribute
that a given command supports, and not change along with WGDEVICE_A_MAX.
This is an incremental step towards adopting netlink policy code
generated by ynl-gen, ensuring that the code and spec is aligned.
This is a trivial patch with no behavioural changes intended.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
When converting the netlink policies to YNL, the constants used
in the policy have to be visible to userspace.
As NOISE_*_KEY_LEN isn't visible to userspace, change the policy
to use WG_KEY_LEN, as also documented in the UAPI header:
$ grep WG_KEY_LEN include/uapi/linux/wireguard.h
* WGDEVICE_A_PRIVATE_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
* WGDEVICE_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
* WGPEER_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
* WGPEER_A_PRESHARED_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
[...]
Add a couple of BUILD_BUG_ON() to ensure that they stay in sync.
No behavioural changes intended.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Use NLA_POLICY_NESTED_ARRAY() to perform nested array validation
in the policy validation step.
The nested policy was already enforced through nla_parse_nested(),
however extack wasn't passed previously, so no fancy error messages.
With the nested attributes being validated directly in the policy, the
policy argument can be set to NULL in the calls to nla_parse_nested().
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
WireGuard is a modern enough genetlink family, that it doesn't need
resv_start_op. It already had policies in place when it was first
merged, it has also never used the reserved field, or other things
toggled by resv_start_op.
wireguard-tools have always used zero initialized memory, and have never
touched the reserved field, neither have any other clients I have
checked. Closed-source clients are much more likely to use the
embeddedable library from wireguard-tools, than a DIY implementation
using uninitialized memory.
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
The error case in fbnic_alloc_napi_vectors defaulted to returning
ENOMEM. This can mask the true error case, causing confusion.
Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com>
Link: https://patch.msgid.link/20251124200518.1848029-1-dimitri.daskalakis1@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
so_peek_off.c is reported to be flaky on NIPA:
# # so_peek_off.c:149:two_chunks_overlap_blocking:Expected -1 (-1) != bytes (-1)
# # two_chunks_overlap_blocking: Test terminated by assertion
# # FAIL so_peek_off.stream.two_chunks_overlap_blocking
The test fork()s a child process to send() data after 1ms to
wake up the parent process being blocked (up to 3ms) on recv().
But, from the log, the parent woke up after 3ms timeout, so it
could be too short when the host is overloaded.
Let's extend it to 5s.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251124070722.1e828c53@kernel.org/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124212805.486235-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Somehow AF_UNIX tests have reused ../.gitignore,
but now NIPA warns about it.
Let's create .gitignore under af_unix/.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124212805.486235-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch refines and strengthens the statistics collection of TX queue
wake/stop events introduced by commit c39add9b2423 ("virtio_net: Add TX
stopped and wake counters").
Previously, the driver only recorded partial wake/stop statistics
for TX queues. Some wake events triggered by 'skb_xmit_done()' or resume
operations were not counted, which made the per-queue metrics incomplete.
Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://patch.msgid.link/20251120015320.1418-1-liming.wu@jaguarmicro.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now sk->sk_timer is no longer used by TCP keepalive, we can use
its storage for TCP and MPTCP retransmit timers for better
cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sk->sk_timer has been used for TCP keepalives.
Keepalive timers are not in fast path, we want to use sk->sk_timer
storage for retransmit timers, for better cache locality.
Create icsk->icsk_keepalive_timer and change keepalive
code to no longer use sk->sk_timer.
Added space is reclaimed in the following patch.
This includes changes to MPTCP, which was also using sk_timer.
Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer
for inet_sk_diag_fill() sake.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These two fields are mostly read in TCP tx path, move them
in an more appropriate group for better cache locality.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation of sk->tcp_timeout_timer introduction,
rename icsk_timeout() helper and change its argument to plain
'const struct sock *sk'.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251124175013.1473655-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the tagged commit, ice stopped respecting Rx buffer length
passed from VFs.
At that point, the buffer length was hardcoded in ice, so VFs still
worked up to some point (until, for example, a VF wanted an MTU
larger than its PF).
The next commit 93f53db9f9dc ("ice: switch to Page Pool"), broke
Rx on VFs completely since ice started accounting per-queue buffer
lengths again, but now VF queues always had their length zeroed, as
ice was already ignoring what iavf was passing to it.
Restore the line that initializes the buffer length on VF queues
basing on the virtchnl messages.
Fixes: 3a4f419f7509 ("ice: drop page splitting and recycling")
Reported-by: Jakub Slepecki <jakub.slepecki@intel.com>
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Jakub Slepecki <jakub.slepecki@intel.com>
Link: https://patch.msgid.link/20251124170735.3077425-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.
Use the `DEFINE_RAW_FLEX()` helper for on-stack definitions of
a flexible structure where the size of the flexible-array member
is known at compile-time, and refactor the rest of the code,
accordingly.
So, with these changes, fix the following warning:
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c:163:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/aSQocKoJGkN0wzEj@kspp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a comment on regeneration to the generated files.
The comment is placed after the YNL-GEN line[1], as to not interfere
with ynl-regen.sh's detection logic.
[1] and after the optional YNL-ARG line.
Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new CLI argument for overriding the default
function prefix, as used for naming the doit/dumpit functions
in the generated kernel code.
When not specified the default "$(FAMILY)-nl" is used.
This can also be specified persistently in generated files:
/* YNL-ARG --function-prefix wg */
In the above example it causes the following changes:
wireguard_nl_get_device_dumpit() -> wg_get_device_dumpit()
wireguard_nl_get_device_doit() -> wg_get_device_doit()
The variable name fn_prefix, was chosen as it relates to op_prefix
which is used to prefix the UAPI commands enum entries.
Link: https://lore.kernel.org/r/aRvWzC8qz3iXDAb3@zx2c4.com/
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Link: https://patch.msgid.link/20251120174429.390574-2-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The META's PCI vendor ID is listed already in the pci_ids.h.
Reuse it here.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251124084816.205035-5-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The while (i--) is a standard pattern for the cleaning up loops.
Apply this pattern where it makes sense in the driver.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251124084816.205035-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It's a common practice to make resource release functions be NULL-aware.
Make ptp_ocp_unregister_ext() NULL-aware.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251124084816.205035-3-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor signal_show() to avoid sequential calls to sysfs_emit*()
and use the same pattern to get the index of a signal as it's done
in signal_store().
While at it, fix wrong use of %ptT against struct timespec64.
It's kinda lucky that it worked just because the first member
there 64-bit and it's of time64_t type. Now with %ptS it may
be used correctly.
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://patch.msgid.link/20251124084816.205035-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller reported a lockdep lock order inversion warning[1] due to
commit 687aa0c5581b ("vsock: Fix transport_* TOCTOU"). This was fixed in
commit f7c877e75352 ("vsock: fix lock inversion in
vsock_assign_transport()").
Redo syzkaller's repro by piggybacking on a somewhat related test
implemented in commit 3a764d93385c ("vsock/test: Add test for null ptr
deref when transport changes").
[1]: https://lore.kernel.org/netdev/68f6cdb0.a70a0220.205af.0039.GAE@google.com/
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20251123-vsock_test-linger-lockdep-warn-v1-1-4b1edf9d8cdc@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Let phydev->enable_tx_lpi control whether MAC enables TX LPI, instead of
enabling it unconditionally. This way TX LPI is disabled if e.g. link
partner doesn't support EEE. This helps to avoid potential issues like
link flaps.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/91bcb837-3fab-4b4e-b495-038df0932e44@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add PHY driver support for Maxlinear MxL86252 and MxL86282 switches.
The PHYs built-into those switches are just like any other GPY 2.5G PHYs
with the exception of the temperature sensor data being encoded in a
different way.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/a6cd7fe461b011cec2b59dffaf34e9c8b0819059.1763818120.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
MxL86211C is a smaller and more efficient version of the GPY211C.
Add the PHY ID and phy_driver instance to the mxl-gpy driver.
Signed-off-by: Chad Monroe <chad@monroe.io>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/cabf3559d6511bed6b8a925f540e3162efc20f6b.1763818120.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove redundant fwnode cleanup in of_mdiobus_register_device()
and xpcs_plat_init_dev().
mdio_device_free() eventually calls mdio_device_release(),
which already performs fwnode_handle_put(), making the manual
cleanup unnecessary.
Combine fwnode_handle_get() with device_set_node() in
of_mdiobus_register_device() for clarity.
Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/00847693daa8f7c8ff5dfa19dd35fc712fa4e2b5.1763995734.git.buday.csaba@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix all warnings reported by scripts/kernel-doc in
mdio_device.c and mdio_bus.c
Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/7ef7b80669da2b899d38afdb6c45e122229c3d8c.1763968667.git.buday.csaba@prolan.hu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each ENETC has a set of external MDIO registers to access its external
PHY based on its port EMDIO bus, these registers are used for MDIO bus
access, such as setting the PHY address, PHY register address and value,
read or write operations, C22 or C45 format, etc. The base address of
this set of registers has been modified in ENETC v4 and is different
from that in ENETC v1. So the base address needs to be updated so that
ENETC v4 can use port MDIO to manage its own external PHY.
Additionally, if ENETC has the PCS layer, it also has a set of internal
MDIO registers for managing its on-die PHY (PCS/Serdes). The base address
of this set of registers is also different from that of ENETC v1, so the
base address also needs to be updated so that ENETC v4 can support the
management of on-die PHY through the internal MDIO bus.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://patch.msgid.link/20251119102557.1041881-4-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NETC IP has only one external master MDIO interface (eMDIO) for managing
the external PHYs. ENETC can use the interfaces provided by the EMDIO
function or its port MDIO to access and manage its external PHY. Both
the EMDIO function and the port MDIO are all virtual ports of the eMDIO.
The difference is that the EMDIO function is a 'global port', it can
access all the PHYs on the eMDIO, but port MDIO can only access its own
PHY. To ensure that ENETC can only access its own PHY through port MDIO,
LaBCR[MDIO_PHYAD_PRTAD] needs to be set, which represents the address of
the external PHY connected to ENETC. If the accessed PHY address is not
consistent with LaBCR[MDIO_PHYAD_PRTAD], then the MDIO access initiated
by port MDIO will be invalid.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://patch.msgid.link/20251119102557.1041881-3-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The ENETC supports managing its own external PHY through its port MDIO
functionality. To use this function, the PHY address needs be set in the
corresponding LaBCR register in the Integrated Endpoint Register Block
(IERB), which is used for pre-boot initialization of NETC PCIe functions.
The port MDIO can only work properly when the PHY address accessed by the
port MDIO matches the corresponding LaBCR[MDIO_PHYAD_PRTAD] value.
Because the ENETC driver only registers the MDIO bus (port MDIO bus) when
it detects an MDIO child node in its node, similarly, the netc-blk-ctrl
driver only resolves the PHY address and sets it in the corresponding
LaBCR when it detects an MDIO child node in the ENETC node.
Co-developed-by: Aziz Sellami <aziz.sellami@nxp.com>
Signed-off-by: Aziz Sellami <aziz.sellami@nxp.com>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://patch.msgid.link/20251119102557.1041881-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently there is no way to see packet counters on cascade ports, and
no clarity on how the API for that would look like.
Because it's something that is currently needed, just extend the hack
where ethtool -S on the conduit interface dumps CPU port counters, and
also use it to dump counters of cascade ports.
Note that the "pXX_" naming convention changes to "sXX_pYY", to
distinguish between ports having the same index but belonging to
different switches. This has a slight chance of causing regressions to
existing tooling:
- grepping for "p04_counter_name" still works, but might return more
than one string now
- grepping for " p04_counter_name" no longer works
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251122112311.138784-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Suppress some checkpatch 'CHECK' messages about u8 being preferable over
uint8_t, etc. No functional change.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251122112311.138784-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In theory this would have been seen by now, but it seems that all
drivers used as DSA conduit interfaces thus far have had ethtool_ops
set, and it's hard to even find modern Ethernet drivers (and not VF
ones) which don't use ethtool.
Here is the unfiltered list of drivers which register any sort of
net_device but don't set its ethtool_ops pointer. I don't think any of
them 'risks' being used as a DSA conduit, maybe except for moxart,
rnpbge and icssm, I'm not sure.
- drivers/net/can/dev/dev.c
- drivers/net/wwan/qcom_bam_dmux.c
- drivers/net/wwan/t7xx/t7xx_netdev.c
- drivers/net/arcnet/arcnet.c
- drivers/net/hamradio/
- drivers/net/slip/slip.c
- drivers/net/ethernet/ezchip/nps_enet.c
- drivers/net/ethernet/moxa/moxart_ether.c
- drivers/net/ethernet/wangxun/txgbevf/txgbevf_main.c
- drivers/net/ethernet/wangxun/ngbevf/ngbevf_main.c
- drivers/net/ethernet/huawei/hinic3/hinic3_main.c
- drivers/net/ethernet/i825xx/
- drivers/net/ethernet/ti/icssm/icssm_prueth.c
- drivers/net/ethernet/seeq/
- drivers/net/ethernet/litex/litex_liteeth.c
- drivers/net/ethernet/sunplus/spl2sw_driver.c
- drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c
- drivers/net/ipa/
- drivers/net/wireless/microchip/wilc1000/
- drivers/net/wireless/mediatek/mt76/dma.c
- drivers/net/wireless/ath/ath12k/
- drivers/net/wireless/ath/ath11k/
- drivers/net/wireless/ath/ath6kl/
- drivers/net/wireless/ath/ath10k/
- drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c
- drivers/net/wireless/virtual/mac80211_hwsim.c
- drivers/net/wireless/quantenna/qtnfmac/pcie/pcie.c
- drivers/net/wireless/realtek/rtw89/core.c
- drivers/net/wireless/realtek/rtw88/pci.c
- drivers/net/caif/
- drivers/net/plip/
- drivers/net/wan/
- drivers/net/mctp/
- drivers/net/ppp/
- drivers/net/thunderbolt/
Nonetheless, it's good for the framework not to make such assumptions,
and not panic when coming across such kind of host device in the future.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20251122112311.138784-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
drivers/net/ethernet/chelsio/cxgb4/sched.h declares a sched_class
struct which has a type name clash with struct sched_class
in kernel/sched/sched.h (a type used in a field in task_struct).
When cxgb4 is a builtin we end up with both sched_class types,
and as a result of this we wind up with DWARF (and derived from
that BTF) with a duplicate incorrect task_struct representation.
When cxgb4 is built-in this type clash can cause kernel builds to
fail as resolve_btfids will fail when confused which task_struct
to use. See [1] for more details.
As such, renaming sched_class to ch_sched_class (in line with
other structs like ch_sched_flowc) makes sense.
[1] https://lore.kernel.org/bpf/2412725b-916c-47bd-91c3-c2d57e3e6c7b@acm.org/
Reported-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Potnuri Bharat Teja <bharat@chelsio.com>
Link: https://patch.msgid.link/20251121181231.64337-1-alan.maguire@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This adds support for chip RTL9151A. Its XID is 0x68b. It is bascially
basd on the one with XID 0x688, but with different firmware file.
Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20251121090104.3753-1-javen_xu@realsil.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
cake, codel and fq_codel can drop many packets from dequeue().
Use qdisc_dequeue_drop() so that the freeing can happen
outside of the qdisc spinlock scope.
Add TCQ_F_DEQUEUE_DROPS to sch->flags.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-15-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Some qdisc like cake, codel, fq_codel might drop packets
in their dequeue() method.
This is currently problematic because dequeue() runs with
the qdisc spinlock held. Freeing skbs can be extremely expensive.
Add qdisc_dequeue_drop() method and a new TCQ_F_DEQUEUE_DROPS
so that these qdiscs can opt-in to defer the skb frees
after the socket spinlock is released.
TCQ_F_DEQUEUE_DROPS is an attempt to not penalize other qdiscs
with an extra cache line miss.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-14-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Using kfree_skb_list_reason() to free list of skbs from qdisc
operations seems wrong as each skb might have a different drop reason.
Cleanup __dev_xmit_skb() to call tcf_kfree_skb_list() once
in preparation of the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-13-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
q->limit is read locklessly, add a READ_ONCE().
Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-12-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Most qdiscs need to read skb->priority at enqueue time().
In commit 100dfa74cad9 ("net: dev_queue_xmit() llist adoption")
I added a prefetch(next), lets add another one for the second
half of skb.
Note that skb->priority and skb->hash share a common cache line,
so this patch helps qdiscs needing both fields.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-11-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
prefetch the skb that we are likely to dequeue at the next dequeue().
Also call fq_dequeue_skb() a bit sooner in fq_dequeue().
This reduces the window between read of q.qlen and
changes of fields in the cache line that could be dirtied
by another cpu trying to queue a packet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-10-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Group together changes to qdisc fields to reduce chances of false sharing
if another cpu attempts to acquire the qdisc spinlock.
qdisc_qstats_backlog_dec(sch, skb);
sch->q.qlen--;
qdisc_bstats_update(sch, skb);
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-9-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
It is possible to reorg Qdisc to avoid always dirtying 2 cache lines in
fast path by reducing this to a single dirtied cache line.
In current layout, we change only four/six fields in the first cache line:
- q.spinlock
- q.qlen
- bstats.bytes
- bstats.packets
- some Qdisc also change q.next/q.prev
In the second cache line we change in the fast path:
- running
- state
- qstats.backlog
/* --- cacheline 2 boundary (128 bytes) --- */
struct sk_buff_head gso_skb __attribute__((__aligned__(64))); /* 0x80 0x18 */
struct qdisc_skb_head q; /* 0x98 0x18 */
struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); /* 0xb0 0x10 */
/* --- cacheline 3 boundary (192 bytes) --- */
struct gnet_stats_queue qstats; /* 0xc0 0x14 */
bool running; /* 0xd4 0x1 */
/* XXX 3 bytes hole, try to pack */
unsigned long state; /* 0xd8 0x8 */
struct Qdisc * next_sched; /* 0xe0 0x8 */
struct sk_buff_head skb_bad_txq; /* 0xe8 0x18 */
/* --- cacheline 4 boundary (256 bytes) --- */
Reorganize things to have a first cache line mostly read,
then a mostly written one.
This gives a ~3% increase of performance under tx stress.
Note that there is an additional hole because @qstats now spans over a third cache line.
/* --- cacheline 2 boundary (128 bytes) --- */
__u8 __cacheline_group_begin__Qdisc_read_mostly[0] __attribute__((__aligned__(64))); /* 0x80 0 */
struct sk_buff_head gso_skb; /* 0x80 0x18 */
struct Qdisc * next_sched; /* 0x98 0x8 */
struct sk_buff_head skb_bad_txq; /* 0xa0 0x18 */
__u8 __cacheline_group_end__Qdisc_read_mostly[0]; /* 0xb8 0 */
/* XXX 8 bytes hole, try to pack */
/* --- cacheline 3 boundary (192 bytes) --- */
__u8 __cacheline_group_begin__Qdisc_write[0] __attribute__((__aligned__(64))); /* 0xc0 0 */
struct qdisc_skb_head q; /* 0xc0 0x18 */
unsigned long state; /* 0xd8 0x8 */
struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); /* 0xe0 0x10 */
bool running; /* 0xf0 0x1 */
/* XXX 3 bytes hole, try to pack */
struct gnet_stats_queue qstats; /* 0xf4 0x14 */
/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
__u8 __cacheline_group_end__Qdisc_write[0]; /* 0x108 0 */
/* XXX 56 bytes hole, try to pack */
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-8-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use new qdisc_pkt_segs() to avoid a cache line miss in cake_enqueue()
for non GSO packets.
cake_overhead() does not have to recompute it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-7-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Avoid up to two cache line misses in qdisc dequeue() to fetch
skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held.
This gives a 5 % improvement in a TX intensive workload.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20251121083256.674562-6-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|