aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/6lowpan.rst (renamed from Documentation/networking/6lowpan.txt)29
-rw-r--r--Documentation/networking/bareudp.rst52
-rw-r--r--Documentation/networking/device_drivers/mellanox/mlx5.rst2
-rw-r--r--Documentation/networking/device_drivers/stmicro/stmmac.rst7
-rw-r--r--Documentation/networking/devlink/bnxt.rst14
-rw-r--r--Documentation/networking/devlink/devlink-flash.rst93
-rw-r--r--Documentation/networking/devlink/devlink-info.rst144
-rw-r--r--Documentation/networking/devlink/devlink-params.rst2
-rw-r--r--Documentation/networking/devlink/devlink-region.rst14
-rw-r--r--Documentation/networking/devlink/devlink-trap.rst35
-rw-r--r--Documentation/networking/devlink/ice.rst96
-rw-r--r--Documentation/networking/devlink/index.rst2
-rw-r--r--Documentation/networking/devlink/mlx5.rst6
-rw-r--r--Documentation/networking/ethtool-netlink.rst497
-rw-r--r--Documentation/networking/filter.txt2
-rw-r--r--Documentation/networking/index.rst2
-rw-r--r--Documentation/networking/ip-sysctl.txt9
-rw-r--r--Documentation/networking/page_pool.rst159
-rw-r--r--Documentation/networking/sfp-phylink.rst49
-rw-r--r--Documentation/networking/snmp_counter.rst4
20 files changed, 1125 insertions, 93 deletions
diff --git a/Documentation/networking/6lowpan.txt b/Documentation/networking/6lowpan.rst
index 2e5a939d7e6f..e70a6520cc33 100644
--- a/Documentation/networking/6lowpan.txt
+++ b/Documentation/networking/6lowpan.rst
@@ -1,37 +1,40 @@
+.. SPDX-License-Identifier: GPL-2.0
-Netdev private dataroom for 6lowpan interfaces:
+==============================================
+Netdev private dataroom for 6lowpan interfaces
+==============================================
All 6lowpan able net devices, means all interfaces with ARPHRD_6LOWPAN,
must have "struct lowpan_priv" placed at beginning of netdev_priv.
-The priv_size of each interface should be calculate by:
+The priv_size of each interface should be calculate by::
dev->priv_size = LOWPAN_PRIV_SIZE(LL_6LOWPAN_PRIV_DATA);
Where LL_PRIV_6LOWPAN_DATA is sizeof linklayer 6lowpan private data struct.
-To access the LL_PRIV_6LOWPAN_DATA structure you can cast:
+To access the LL_PRIV_6LOWPAN_DATA structure you can cast::
lowpan_priv(dev)-priv;
to your LL_6LOWPAN_PRIV_DATA structure.
-Before registering the lowpan netdev interface you must run:
+Before registering the lowpan netdev interface you must run::
lowpan_netdev_setup(dev, LOWPAN_LLTYPE_FOOBAR);
wheres LOWPAN_LLTYPE_FOOBAR is a define for your 6LoWPAN linklayer type of
enum lowpan_lltypes.
-Example to evaluate the private usually you can do:
+Example to evaluate the private usually you can do::
-static inline struct lowpan_priv_foobar *
-lowpan_foobar_priv(struct net_device *dev)
-{
+ static inline struct lowpan_priv_foobar *
+ lowpan_foobar_priv(struct net_device *dev)
+ {
return (struct lowpan_priv_foobar *)lowpan_priv(dev)->priv;
-}
+ }
-switch (dev->type) {
-case ARPHRD_6LOWPAN:
+ switch (dev->type) {
+ case ARPHRD_6LOWPAN:
lowpan_priv = lowpan_priv(dev);
/* do great stuff which is ARPHRD_6LOWPAN related */
switch (lowpan_priv->lltype) {
@@ -42,8 +45,8 @@ case ARPHRD_6LOWPAN:
...
}
break;
-...
-}
+ ...
+ }
In case of generic 6lowpan branch ("net/6lowpan") you can remove the check
on ARPHRD_6LOWPAN, because you can be sure that these function are called
diff --git a/Documentation/networking/bareudp.rst b/Documentation/networking/bareudp.rst
new file mode 100644
index 000000000000..465a8b251bfe
--- /dev/null
+++ b/Documentation/networking/bareudp.rst
@@ -0,0 +1,52 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+Bare UDP Tunnelling Module Documentation
+========================================
+
+There are various L3 encapsulation standards using UDP being discussed to
+leverage the UDP based load balancing capability of different networks.
+MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them.
+
+The Bareudp tunnel module provides a generic L3 encapsulation tunnelling
+support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside
+a UDP tunnel.
+
+Special Handling
+----------------
+The bareudp device supports special handling for MPLS & IP as they can have
+multiple ethertypes.
+MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast).
+IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6).
+This special handling can be enabled only for ethertypes ETH_P_IP & ETH_P_MPLS_UC
+with a flag called multiproto mode.
+
+Usage
+------
+
+1) Device creation & deletion
+
+ a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847.
+
+ This creates a bareudp tunnel device which tunnels L3 traffic with ethertype
+ 0x8847 (MPLS traffic). The destination port of the UDP header will be set to
+ 6635.The device will listen on UDP port 6635 to receive traffic.
+
+ b) ip link delete bareudp0
+
+2) Device creation with multiple proto mode enabled
+
+There are two ways to create a bareudp device for MPLS & IP with multiproto mode
+enabled.
+
+ a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847 multiproto
+
+ b) ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls
+
+3) Device Usage
+
+The bareudp device could be used along with OVS or flower filter in TC.
+The OVS or TC flower layer must set the tunnel information in SKB dst field before
+sending packet buffer to the bareudp device for transmission. On reception the
+bareudp device extracts and stores the tunnel information in SKB dst field before
+passing the packet buffer to the network stack.
diff --git a/Documentation/networking/device_drivers/mellanox/mlx5.rst b/Documentation/networking/device_drivers/mellanox/mlx5.rst
index f575a49790e8..e9b65035cd47 100644
--- a/Documentation/networking/device_drivers/mellanox/mlx5.rst
+++ b/Documentation/networking/device_drivers/mellanox/mlx5.rst
@@ -101,7 +101,7 @@ Enabling the driver and kconfig options
**External options** ( Choose if the corresponding mlx5 feature is required )
- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled
-- CONFIG_VXLAN: When chosen, mlx5 vxaln support will be enabled.
+- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled.
- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool).
Devlink info
diff --git a/Documentation/networking/device_drivers/stmicro/stmmac.rst b/Documentation/networking/device_drivers/stmicro/stmmac.rst
index c34bab3d2df0..5d46e5036129 100644
--- a/Documentation/networking/device_drivers/stmicro/stmmac.rst
+++ b/Documentation/networking/device_drivers/stmicro/stmmac.rst
@@ -32,7 +32,8 @@ is also supported.
DesignWare(R) Cores Ethernet MAC 10/100/1000 Universal version 3.70a
(and older) and DesignWare(R) Cores Ethernet Quality-of-Service version 4.0
(and upper) have been used for developing this driver as well as
-DesignWare(R) Cores XGMAC - 10G Ethernet MAC.
+DesignWare(R) Cores XGMAC - 10G Ethernet MAC and DesignWare(R) Cores
+Enterprise MAC - 100G Ethernet MAC.
This driver supports both the platform bus and PCI.
@@ -48,6 +49,8 @@ Cores Ethernet Controllers and corresponding minimum and maximum versions:
+-------------------------------+--------------+--------------+--------------+
| XGMAC - 10G Ethernet MAC | 2.10a | N/A | XGMAC2+ |
+-------------------------------+--------------+--------------+--------------+
+| XLGMAC - 100G Ethernet MAC | 2.00a | N/A | XLGMAC2+ |
++-------------------------------+--------------+--------------+--------------+
For questions related to hardware requirements, refer to the documentation
supplied with your Ethernet adapter. All hardware requirements listed apply
@@ -57,7 +60,7 @@ Feature List
============
The following features are available in this driver:
- - GMII/MII/RGMII/SGMII/RMII/XGMII Interface
+ - GMII/MII/RGMII/SGMII/RMII/XGMII/XLGMII Interface
- Half-Duplex / Full-Duplex Operation
- Energy Efficient Ethernet (EEE)
- IEEE 802.3x PAUSE Packets (Flow Control)
diff --git a/Documentation/networking/devlink/bnxt.rst b/Documentation/networking/devlink/bnxt.rst
index 82ef9ec46707..3dfd84ccb1c7 100644
--- a/Documentation/networking/devlink/bnxt.rst
+++ b/Documentation/networking/devlink/bnxt.rst
@@ -51,6 +51,9 @@ The ``bnxt_en`` driver reports the following versions
* - Name
- Type
- Description
+ * - ``board.id``
+ - fixed
+ - Part number identifying the board design
* - ``asic.id``
- fixed
- ASIC design identifier
@@ -63,12 +66,15 @@ The ``bnxt_en`` driver reports the following versions
* - ``fw``
- stored, running
- Overall board firmware version
- * - ``fw.app``
- - stored, running
- - Data path firmware version
* - ``fw.mgmt``
- stored, running
- - Management firmware version
+ - NIC hardware resource management firmware version
+ * - ``fw.mgmt.api``
+ - running
+ - Minimum firmware interface spec version supported between driver and firmware
+ * - ``fw.nsci``
+ - stored, running
+ - General platform management firmware version
* - ``fw.roce``
- stored, running
- RoCE management firmware version
diff --git a/Documentation/networking/devlink/devlink-flash.rst b/Documentation/networking/devlink/devlink-flash.rst
new file mode 100644
index 000000000000..40a87c0222cb
--- /dev/null
+++ b/Documentation/networking/devlink/devlink-flash.rst
@@ -0,0 +1,93 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+.. _devlink_flash:
+
+=============
+Devlink Flash
+=============
+
+The ``devlink-flash`` API allows updating device firmware. It replaces the
+older ``ethtool-flash`` mechanism, and doesn't require taking any
+networking locks in the kernel to perform the flash update. Example use::
+
+ $ devlink dev flash pci/0000:05:00.0 file flash-boot.bin
+
+Note that the file name is a path relative to the firmware loading path
+(usually ``/lib/firmware/``). Drivers may send status updates to inform
+user space about the progress of the update operation.
+
+Firmware Loading
+================
+
+Devices which require firmware to operate usually store it in non-volatile
+memory on the board, e.g. flash. Some devices store only basic firmware on
+the board, and the driver loads the rest from disk during probing.
+``devlink-info`` allows users to query firmware information (loaded
+components and versions).
+
+In other cases the device can both store the image on the board, load from
+disk, or automatically flash a new image from disk. The ``fw_load_policy``
+devlink parameter can be used to control this behavior
+(:ref:`Documentation/networking/devlink/devlink-params.rst <devlink_params_generic>`).
+
+On-disk firmware files are usually stored in ``/lib/firmware/``.
+
+Firmware Version Management
+===========================
+
+Drivers are expected to implement ``devlink-flash`` and ``devlink-info``
+functionality, which together allow for implementing vendor-independent
+automated firmware update facilities.
+
+``devlink-info`` exposes the ``driver`` name and three version groups
+(``fixed``, ``running``, ``stored``).
+
+The ``driver`` attribute and ``fixed`` group identify the specific device
+design, e.g. for looking up applicable firmware updates. This is why
+``serial_number`` is not part of the ``fixed`` versions (even though it
+is fixed) - ``fixed`` versions should identify the design, not a single
+device.
+
+``running`` and ``stored`` firmware versions identify the firmware running
+on the device, and firmware which will be activated after reboot or device
+reset.
+
+The firmware update agent is supposed to be able to follow this simple
+algorithm to update firmware contents, regardless of the device vendor:
+
+.. code-block:: sh
+
+ # Get unique HW design identifier
+ $hw_id = devlink-dev-info['fixed']
+
+ # Find out which FW flash we want to use for this NIC
+ $want_flash_vers = some-db-backed.lookup($hw_id, 'flash')
+
+ # Update flash if necessary
+ if $want_flash_vers != devlink-dev-info['stored']:
+ $file = some-db-backed.download($hw_id, 'flash')
+ devlink-dev-flash($file)
+
+ # Find out the expected overall firmware versions
+ $want_fw_vers = some-db-backed.lookup($hw_id, 'all')
+
+ # Update on-disk file if necessary
+ if $want_fw_vers != devlink-dev-info['running']:
+ $file = some-db-backed.download($hw_id, 'disk')
+ write($file, '/lib/firmware/')
+
+ # Try device reset, if available
+ if $want_fw_vers != devlink-dev-info['running']:
+ devlink-reset()
+
+ # Reboot, if reset wasn't enough
+ if $want_fw_vers != devlink-dev-info['running']:
+ reboot()
+
+Note that each reference to ``devlink-dev-info`` in this pseudo-code
+is expected to fetch up-to-date information from the kernel.
+
+For the convenience of identifying firmware files some vendors add
+``bundle_id`` information to the firmware versions. This meta-version covers
+multiple per-component versions and can be used e.g. in firmware file names
+(all component versions could get rather long.)
diff --git a/Documentation/networking/devlink/devlink-info.rst b/Documentation/networking/devlink/devlink-info.rst
index 70981dd1b981..3fe11401b838 100644
--- a/Documentation/networking/devlink/devlink-info.rst
+++ b/Documentation/networking/devlink/devlink-info.rst
@@ -5,34 +5,119 @@ Devlink Info
============
The ``devlink-info`` mechanism enables device drivers to report device
-information in a generic fashion. It is extensible, and enables exporting
-even device or driver specific information.
+(hardware and firmware) information in a standard, extensible fashion.
-devlink supports representing the following types of versions
+The original motivation for the ``devlink-info`` API was twofold:
-.. list-table:: List of version types
+ - making it possible to automate device and firmware management in a fleet
+ of machines in a vendor-independent fashion (see also
+ :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`);
+ - name the per component FW versions (as opposed to the crowded ethtool
+ version string).
+
+``devlink-info`` supports reporting multiple types of objects. Reporting driver
+versions is generally discouraged - here, and via any other Linux API.
+
+.. list-table:: List of top level info objects
:widths: 5 95
- * - Type
+ * - Name
- Description
+ * - ``driver``
+ - Name of the currently used device driver, also available through sysfs.
+
+ * - ``serial_number``
+ - Serial number of the device.
+
+ This is usually the serial number of the ASIC, also often available
+ in PCI config space of the device in the *Device Serial Number*
+ capability.
+
+ The serial number should be unique per physical device.
+ Sometimes the serial number of the device is only 48 bits long (the
+ length of the Ethernet MAC address), and since PCI DSN is 64 bits long
+ devices pad or encode additional information into the serial number.
+ One example is adding port ID or PCI interface ID in the extra two bytes.
+ Drivers should make sure to strip or normalize any such padding
+ or interface ID, and report only the part of the serial number
+ which uniquely identifies the hardware. In other words serial number
+ reported for two ports of the same device or on two hosts of
+ a multi-host device should be identical.
+
+ .. note:: ``devlink-info`` API should be extended with a new field
+ if devices want to report board/product serial number (often
+ reported in PCI *Vital Product Data* capability).
+
* - ``fixed``
- - Represents fixed versions, which cannot change. For example,
+ - Group for hardware identifiers, and versions of components
+ which are not field-updatable.
+
+ Versions in this section identify the device design. For example,
component identifiers or the board version reported in the PCI VPD.
+ Data in ``devlink-info`` should be broken into the smallest logical
+ components, e.g. PCI VPD may concatenate various information
+ to form the Part Number string, while in ``devlink-info`` all parts
+ should be reported as separate items.
+
+ This group must not contain any frequently changing identifiers,
+ such as serial numbers. See
+ :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`
+ to understand why.
+
* - ``running``
- - Represents the version of the currently running component. For
- example the running version of firmware. These versions generally
- only update after a reboot.
+ - Group for information about currently running software/firmware.
+ These versions often only update after a reboot, sometimes device reset.
+
* - ``stored``
- - Represents the version of a component as stored, such as after a
- flash update. Stored values should update to reflect changes in the
- flash even if a reboot has not yet occurred.
+ - Group for software/firmware versions in device flash.
+
+ Stored values must update to reflect changes in the flash even
+ if reboot has not yet occurred. If device is not capable of updating
+ ``stored`` versions when new software is flashed, it must not report
+ them.
+
+Each version can be reported at most once in each version group. Firmware
+components stored on the flash should feature in both the ``running`` and
+``stored`` sections, if device is capable of reporting ``stored`` versions
+(see :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`).
+In case software/firmware components are loaded from the disk (e.g.
+``/lib/firmware``) only the running version should be reported via
+the kernel API.
Generic Versions
================
It is expected that drivers use the following generic names for exporting
-version information. Other information may be exposed using driver-specific
-names, but these should be documented in the driver-specific file.
+version information. If a generic name for a given component doesn't exist yet,
+driver authors should consult existing driver-specific versions and attempt
+reuse. As last resort, if a component is truly unique, using driver-specific
+names is allowed, but these should be documented in the driver-specific file.
+
+All versions should try to use the following terminology:
+
+.. list-table:: List of common version suffixes
+ :widths: 10 90
+
+ * - Name
+ - Description
+ * - ``id``, ``revision``
+ - Identifiers of designs and revision, mostly used for hardware versions.
+
+ * - ``api``
+ - Version of API between components. API items are usually of limited
+ value to the user, and can be inferred from other versions by the vendor,
+ so adding API versions is generally discouraged as noise.
+
+ * - ``bundle_id``
+ - Identifier of a distribution package which was flashed onto the device.
+ This is an attribute of a firmware package which covers multiple versions
+ for ease of managing firmware images (see
+ :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`).
+
+ ``bundle_id`` can appear in both ``running`` and ``stored`` versions,
+ but it must not be reported if any of the components covered by the
+ ``bundle_id`` was changed and no longer matches the version from
+ the bundle.
board.id
--------
@@ -52,7 +137,7 @@ ASIC design identifier.
asic.rev
--------
-ASIC design revision.
+ASIC design revision/stepping.
board.manufacture
-----------------
@@ -72,6 +157,12 @@ Control unit firmware version. This firmware is responsible for house
keeping tasks, PHY control etc. but not the packet-by-packet data path
operation.
+fw.mgmt.api
+-----------
+
+Firmware interface specification version of the software interfaces between
+driver and firmware.
+
fw.app
------
@@ -91,10 +182,31 @@ Network Controller Sideband Interface.
fw.psid
-------
-Unique identifier of the firmware parameter set.
+Unique identifier of the firmware parameter set. These are usually
+parameters of a particular board, defined at manufacturing time.
fw.roce
-------
RoCE firmware version which is responsible for handling roce
management.
+
+fw.bundle_id
+------------
+
+Unique identifier of the entire firmware bundle.
+
+Future work
+===========
+
+The following extensions could be useful:
+
+ - product serial number - NIC boards often get labeled with a board serial
+ number rather than ASIC serial number; it'd be useful to add board serial
+ numbers to the API if they can be retrieved from the device;
+
+ - on-disk firmware file names - drivers list the file names of firmware they
+ may need to load onto devices via the ``MODULE_FIRMWARE()`` macro. These,
+ however, are per module, rather than per device. It'd be useful to list
+ the names of firmware files the driver will try to load for a given device,
+ in order of priority.
diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index da2f85c0fa21..d075fd090b3d 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -41,6 +41,8 @@ In order for ``driverinit`` parameters to take effect, the driver must
support reloading via the ``devlink-reload`` command. This command will
request a reload of the device driver.
+.. _devlink_params_generic:
+
Generic configuration parameters
================================
The following is a list of generic configuration parameters that drivers may
diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst
index 8b46e8591fe0..04e04d1ff627 100644
--- a/Documentation/networking/devlink/devlink-region.rst
+++ b/Documentation/networking/devlink/devlink-region.rst
@@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user.
Regions may also be used to provide an additional way to debug complex error
states, but see also :doc:`devlink-health`
+Regions may optionally support capturing a snapshot on demand via the
+``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow
+requested snapshots must implement the ``.snapshot`` callback for the region
+in its ``devlink_region_ops`` structure.
+
example usage
-------------
@@ -29,8 +34,7 @@ example usage
$ devlink region show [ DEV/REGION ]
$ devlink region del DEV/REGION snapshot SNAPSHOT_ID
$ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ]
- $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ]
- address ADDRESS length length
+ $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length length
# Show all of the exposed regions with region sizes:
$ devlink region show
@@ -40,6 +44,9 @@ example usage
# Delete a snapshot using:
$ devlink region del pci/0000:00:05.0/cr-space snapshot 1
+ # Request an immediate snapshot, if supported by the region
+ $ devlink region new pci/0000:00:05.0/cr-space snapshot 5
+
# Dump a snapshot:
$ devlink region dump pci/0000:00:05.0/fw-health snapshot 1
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
@@ -48,8 +55,7 @@ example usage
0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
# Read a specific part of a snapshot:
- $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0
- length 16
+ $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16
0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
As regions are likely very device or driver specific, no generic regions are
diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst
index 47a429bb8658..a09971c2115c 100644
--- a/Documentation/networking/devlink/devlink-trap.rst
+++ b/Documentation/networking/devlink/devlink-trap.rst
@@ -238,6 +238,12 @@ be added to the following table:
- ``drop``
- Traps NVE packets that the device decided to drop because their overlay
source MAC is multicast
+ * - ``ingress_flow_action_drop``
+ - ``drop``
+ - Traps packets dropped during processing of ingress flow action drop
+ * - ``egress_flow_action_drop``
+ - ``drop``
+ - Traps packets dropped during processing of egress flow action drop
Driver-specific Packet Traps
============================
@@ -277,6 +283,35 @@ narrow. The description of these groups must be added to the following table:
* - ``tunnel_drops``
- Contains packet traps for packets that were dropped by the device during
tunnel encapsulation / decapsulation
+ * - ``acl_drops``
+ - Contains packet traps for packets that were dropped by the device during
+ ACL processing
+
+Packet Trap Policers
+====================
+
+As previously explained, the underlying device can trap certain packets to the
+CPU for processing. In most cases, the underlying device is capable of handling
+packet rates that are several orders of magnitude higher compared to those that
+can be handled by the CPU.
+
+Therefore, in order to prevent the underlying device from overwhelming the CPU,
+devices usually include packet trap policers that are able to police the
+trapped packets to rates that can be handled by the CPU.
+
+The ``devlink-trap`` mechanism allows capable device drivers to register their
+supported packet trap policers with ``devlink``. The device driver can choose
+to associate these policers with supported packet trap groups (see
+:ref:`Generic-Packet-Trap-Groups`) during its initialization, thereby exposing
+its default control plane policy to user space.
+
+Device drivers should allow user space to change the parameters of the policers
+(e.g., rate, burst size) as well as the association between the policers and
+trap groups by implementing the relevant callbacks.
+
+If possible, device drivers should implement a callback that allows user space
+to retrieve the number of packets that were dropped by the policer because its
+configured policy was violated.
Testing
=======
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
new file mode 100644
index 000000000000..5b58fc4e1268
--- /dev/null
+++ b/Documentation/networking/devlink/ice.rst
@@ -0,0 +1,96 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+ice devlink support
+===================
+
+This document describes the devlink features implemented by the ``ice``
+device driver.
+
+Info versions
+=============
+
+The ``ice`` driver reports the following versions
+
+.. list-table:: devlink info versions implemented
+ :widths: 5 5 5 90
+
+ * - Name
+ - Type
+ - Example
+ - Description
+ * - ``board.id``
+ - fixed
+ - K65390-000
+ - The Product Board Assembly (PBA) identifier of the board.
+ * - ``fw.mgmt``
+ - running
+ - 2.1.7
+ - 3-digit version number of the management firmware that controls the
+ PHY, link, etc.
+ * - ``fw.mgmt.api``
+ - running
+ - 1.5
+ - 2-digit version number of the API exported over the AdminQ by the
+ management firmware. Used by the driver to identify what commands
+ are supported.
+ * - ``fw.mgmt.build``
+ - running
+ - 0x305d955f
+ - Unique identifier of the source for the management firmware.
+ * - ``fw.undi``
+ - running
+ - 1.2581.0
+ - Version of the Option ROM containing the UEFI driver. The version is
+ reported in ``major.minor.patch`` format. The major version is
+ incremented whenever a major breaking change occurs, or when the
+ minor version would overflow. The minor version is incremented for
+ non-breaking changes and reset to 1 when the major version is
+ incremented. The patch version is normally 0 but is incremented when
+ a fix is delivered as a patch against an older base Option ROM.
+ * - ``fw.psid.api``
+ - running
+ - 0.80
+ - Version defining the format of the flash contents.
+ * - ``fw.bundle_id``
+ - running
+ - 0x80002ec0
+ - Unique identifier of the firmware image file that was loaded onto
+ the device. Also referred to as the EETRACK identifier of the NVM.
+ * - ``fw.app.name``
+ - running
+ - ICE OS Default Package
+ - The name of the DDP package that is active in the device. The DDP
+ package is loaded by the driver during initialization. Each varation
+ of DDP package shall have a unique name.
+ * - ``fw.app``
+ - running
+ - 1.3.1.0
+ - The version of the DDP package that is active in the device. Note
+ that both the name (as reported by ``fw.app.name``) and version are
+ required to uniquely identify the package.
+
+Regions
+=======
+
+The ``ice`` driver enables access to the contents of the Non Volatile Memory
+flash chip via the ``nvm-flash`` region.
+
+Users can request an immediate capture of a snapshot via the
+``DEVLINK_CMD_REGION_NEW``
+
+.. code:: shell
+
+ $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1
+ $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1
+
+ $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1
+ 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
+ 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
+ 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
+ 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
+
+ $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16
+ 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
+
+ $ devlink region delete pci/0000:01:00.0/nvm-flash snapshot 1
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index 087ff54d53fc..c536db2cc0f9 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -16,6 +16,7 @@ general.
devlink-dpipe
devlink-health
devlink-info
+ devlink-flash
devlink-params
devlink-region
devlink-resource
@@ -32,6 +33,7 @@ parameters, info versions, and other features it supports.
bnxt
ionic
+ ice
mlx4
mlx5
mlxsw
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 629a6e69c036..4e4b97f7971a 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -37,6 +37,12 @@ parameters.
* ``smfs`` Software managed flow steering. In SMFS mode, the HW
steering entities are created and manage through the driver without
firmware intervention.
+ * - ``fdb_large_groups``
+ - u32
+ - driverinit
+ - Control the number of large groups (size > 1) in the FDB table.
+
+ * The default value is 15, and the range is between 1 and 1024.
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index f1f868479ceb..567326491f80 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -189,6 +189,21 @@ Userspace to kernel:
``ETHTOOL_MSG_DEBUG_SET`` set debugging settings
``ETHTOOL_MSG_WOL_GET`` get wake-on-lan settings
``ETHTOOL_MSG_WOL_SET`` set wake-on-lan settings
+ ``ETHTOOL_MSG_FEATURES_GET`` get device features
+ ``ETHTOOL_MSG_FEATURES_SET`` set device features
+ ``ETHTOOL_MSG_PRIVFLAGS_GET`` get private flags
+ ``ETHTOOL_MSG_PRIVFLAGS_SET`` set private flags
+ ``ETHTOOL_MSG_RINGS_GET`` get ring sizes
+ ``ETHTOOL_MSG_RINGS_SET`` set ring sizes
+ ``ETHTOOL_MSG_CHANNELS_GET`` get channel counts
+ ``ETHTOOL_MSG_CHANNELS_SET`` set channel counts
+ ``ETHTOOL_MSG_COALESCE_GET`` get coalescing parameters
+ ``ETHTOOL_MSG_COALESCE_SET`` set coalescing parameters
+ ``ETHTOOL_MSG_PAUSE_GET`` get pause parameters
+ ``ETHTOOL_MSG_PAUSE_SET`` set pause parameters
+ ``ETHTOOL_MSG_EEE_GET`` get EEE settings
+ ``ETHTOOL_MSG_EEE_SET`` set EEE settings
+ ``ETHTOOL_MSG_TSINFO_GET`` get timestamping info
===================================== ================================
Kernel to userspace:
@@ -204,6 +219,22 @@ Kernel to userspace:
``ETHTOOL_MSG_DEBUG_NTF`` debugging settings notification
``ETHTOOL_MSG_WOL_GET_REPLY`` wake-on-lan settings
``ETHTOOL_MSG_WOL_NTF`` wake-on-lan settings notification
+ ``ETHTOOL_MSG_FEATURES_GET_REPLY`` device features
+ ``ETHTOOL_MSG_FEATURES_SET_REPLY`` optional reply to FEATURES_SET
+ ``ETHTOOL_MSG_FEATURES_NTF`` netdev features notification
+ ``ETHTOOL_MSG_PRIVFLAGS_GET_REPLY`` private flags
+ ``ETHTOOL_MSG_PRIVFLAGS_NTF`` private flags
+ ``ETHTOOL_MSG_RINGS_GET_REPLY`` ring sizes
+ ``ETHTOOL_MSG_RINGS_NTF`` ring sizes
+ ``ETHTOOL_MSG_CHANNELS_GET_REPLY`` channel counts
+ ``ETHTOOL_MSG_CHANNELS_NTF`` channel counts
+ ``ETHTOOL_MSG_COALESCE_GET_REPLY`` coalescing parameters
+ ``ETHTOOL_MSG_COALESCE_NTF`` coalescing parameters
+ ``ETHTOOL_MSG_PAUSE_GET_REPLY`` pause parameters
+ ``ETHTOOL_MSG_PAUSE_NTF`` pause parameters
+ ``ETHTOOL_MSG_EEE_GET_REPLY`` EEE settings
+ ``ETHTOOL_MSG_EEE_NTF`` EEE settings
+ ``ETHTOOL_MSG_TSINFO_GET_REPLY`` timestamping info
===================================== =================================
``GET`` requests are sent by userspace applications to retrieve device
@@ -521,6 +552,410 @@ Request contents:
``WAKE_MAGICSECURE`` mode.
+FEATURES_GET
+============
+
+Gets netdev features like ``ETHTOOL_GFEATURES`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_FEATURES_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_FEATURES_HEADER`` nested reply header
+ ``ETHTOOL_A_FEATURES_HW`` bitset dev->hw_features
+ ``ETHTOOL_A_FEATURES_WANTED`` bitset dev->wanted_features
+ ``ETHTOOL_A_FEATURES_ACTIVE`` bitset dev->features
+ ``ETHTOOL_A_FEATURES_NOCHANGE`` bitset NETIF_F_NEVER_CHANGE
+ ==================================== ====== ==========================
+
+Bitmaps in kernel response have the same meaning as bitmaps used in ioctl
+interference but attribute names are different (they are based on
+corresponding members of struct net_device). Legacy "flags" are not provided,
+if userspace needs them (most likely only ethtool for backward compatibility),
+it can calculate their values from related feature bits itself.
+ETHA_FEATURES_HW uses mask consisting of all features recognized by kernel (to
+provide all names when using verbose bitmap format), the other three use no
+mask (simple bit lists).
+
+
+FEATURES_SET
+============
+
+Request to set netdev features like ``ETHTOOL_SFEATURES`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_FEATURES_HEADER`` nested request header
+ ``ETHTOOL_A_FEATURES_WANTED`` bitset requested features
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_FEATURES_HEADER`` nested reply header
+ ``ETHTOOL_A_FEATURES_WANTED`` bitset diff wanted vs. result
+ ``ETHTOOL_A_FEATURES_ACTIVE`` bitset diff old vs. new active
+ ==================================== ====== ==========================
+
+Request constains only one bitset which can be either value/mask pair (request
+to change specific feature bits and leave the rest) or only a value (request
+to set all features to specified set).
+
+As request is subject to netdev_change_features() sanity checks, optional
+kernel reply (can be suppressed by ``ETHTOOL_FLAG_OMIT_REPLY`` flag in request
+header) informs client about the actual result. ``ETHTOOL_A_FEATURES_WANTED``
+reports the difference between client request and actual result: mask consists
+of bits which differ between requested features and result (dev->features
+after the operation), value consists of values of these bits in the request
+(i.e. negated values from resulting features). ``ETHTOOL_A_FEATURES_ACTIVE``
+reports the difference between old and new dev->features: mask consists of
+bits which have changed, values are their values in new dev->features (after
+the operation).
+
+``ETHTOOL_MSG_FEATURES_NTF`` notification is sent not only if device features
+are modified using ``ETHTOOL_MSG_FEATURES_SET`` request or on of ethtool ioctl
+request but also each time features are modified with netdev_update_features()
+or netdev_change_features().
+
+
+PRIVFLAGS_GET
+=============
+
+Gets private flags like ``ETHTOOL_GPFLAGS`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested reply header
+ ``ETHTOOL_A_PRIVFLAGS_FLAGS`` bitset private flags
+ ==================================== ====== ==========================
+
+``ETHTOOL_A_PRIVFLAGS_FLAGS`` is a bitset with values of device private flags.
+These flags are defined by driver, their number and names (and also meaning)
+are device dependent. For compact bitset format, names can be retrieved as
+``ETH_SS_PRIV_FLAGS`` string set. If verbose bitset format is requested,
+response uses all private flags supported by the device as mask so that client
+gets the full information without having to fetch the string set with names.
+
+
+PRIVFLAGS_SET
+=============
+
+Sets or modifies values of device private flags like ``ETHTOOL_SPFLAGS``
+ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested request header
+ ``ETHTOOL_A_PRIVFLAGS_FLAGS`` bitset private flags
+ ==================================== ====== ==========================
+
+``ETHTOOL_A_PRIVFLAGS_FLAGS`` can either set the whole set of private flags or
+modify only values of some of them.
+
+
+RINGS_GET
+=========
+
+Gets ring sizes like ``ETHTOOL_GRINGPARAM`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_RINGS_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_RINGS_HEADER`` nested reply header
+ ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring
+ ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
+ ==================================== ====== ==========================
+
+
+RINGS_SET
+=========
+
+Sets ring sizes like ``ETHTOOL_SRINGPARAM`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_RINGS_HEADER`` nested reply header
+ ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring
+ ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring
+ ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring
+ ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring
+ ==================================== ====== ==========================
+
+Kernel checks that requested ring sizes do not exceed limits reported by
+driver. Driver may impose additional constraints and may not suspport all
+attributes.
+
+
+CHANNELS_GET
+============
+
+Gets channel counts like ``ETHTOOL_GCHANNELS`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_CHANNELS_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_CHANNELS_HEADER`` nested reply header
+ ``ETHTOOL_A_CHANNELS_RX_MAX`` u32 max receive channels
+ ``ETHTOOL_A_CHANNELS_TX_MAX`` u32 max transmit channels
+ ``ETHTOOL_A_CHANNELS_OTHER_MAX`` u32 max other channels
+ ``ETHTOOL_A_CHANNELS_COMBINED_MAX`` u32 max combined channels
+ ``ETHTOOL_A_CHANNELS_RX_COUNT`` u32 receive channel count
+ ``ETHTOOL_A_CHANNELS_TX_COUNT`` u32 transmit channel count
+ ``ETHTOOL_A_CHANNELS_OTHER_COUNT`` u32 other channel count
+ ``ETHTOOL_A_CHANNELS_COMBINED_COUNT`` u32 combined channel count
+ ===================================== ====== ==========================
+
+
+CHANNELS_SET
+============
+
+Sets channel counts like ``ETHTOOL_SCHANNELS`` ioctl request.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_CHANNELS_HEADER`` nested request header
+ ``ETHTOOL_A_CHANNELS_RX_COUNT`` u32 receive channel count
+ ``ETHTOOL_A_CHANNELS_TX_COUNT`` u32 transmit channel count
+ ``ETHTOOL_A_CHANNELS_OTHER_COUNT`` u32 other channel count
+ ``ETHTOOL_A_CHANNELS_COMBINED_COUNT`` u32 combined channel count
+ ===================================== ====== ==========================
+
+Kernel checks that requested channel counts do not exceed limits reported by
+driver. Driver may impose additional constraints and may not suspport all
+attributes.
+
+
+COALESCE_GET
+============
+
+Gets coalescing parameters like ``ETHTOOL_GCOALESCE`` ioctl request.
+
+Request contents:
+
+ ==================================== ====== ==========================
+ ``ETHTOOL_A_COALESCE_HEADER`` nested request header
+ ==================================== ====== ==========================
+
+Kernel response contents:
+
+ =========================================== ====== =======================
+ ``ETHTOOL_A_COALESCE_HEADER`` nested reply header
+ ``ETHTOOL_A_COALESCE_RX_USECS`` u32 delay (us), normal Rx
+ ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES`` u32 max packets, normal Rx
+ ``ETHTOOL_A_COALESCE_RX_USECS_IRQ`` u32 delay (us), Rx in IRQ
+ ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_IRQ`` u32 max packets, Rx in IRQ
+ ``ETHTOOL_A_COALESCE_TX_USECS`` u32 delay (us), normal Tx
+ ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES`` u32 max packets, normal Tx
+ ``ETHTOOL_A_COALESCE_TX_USECS_IRQ`` u32 delay (us), Tx in IRQ
+ ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_IRQ`` u32 IRQ packets, Tx in IRQ
+ ``ETHTOOL_A_COALESCE_STATS_BLOCK_USECS`` u32 delay of stats update
+ ``ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX`` bool adaptive Rx coalesce
+ ``ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX`` bool adaptive Tx coalesce
+ ``ETHTOOL_A_COALESCE_PKT_RATE_LOW`` u32 threshold for low rate
+ ``ETHTOOL_A_COALESCE_RX_USECS_LOW`` u32 delay (us), low Rx
+ ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_LOW`` u32 max packets, low Rx
+ ``ETHTOOL_A_COALESCE_TX_USECS_LOW`` u32 delay (us), low Tx
+ ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_LOW`` u32 max packets, low Tx
+ ``ETHTOOL_A_COALESCE_PKT_RATE_HIGH`` u32 threshold for high rate
+ ``ETHTOOL_A_COALESCE_RX_USECS_HIGH`` u32 delay (us), high Rx
+ ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_HIGH`` u32 max packets, high Rx
+ ``ETHTOOL_A_COALESCE_TX_USECS_HIGH`` u32 delay (us), high Tx
+ ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH`` u32 max packets, high Tx
+ ``ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL`` u32 rate sampling interval
+ =========================================== ====== =======================
+
+Attributes are only included in reply if their value is not zero or the
+corresponding bit in ``ethtool_ops::supported_coalesce_params`` is set (i.e.
+they are declared as supported by driver).
+
+
+COALESCE_SET
+============
+
+Sets coalescing parameters like ``ETHTOOL_SCOALESCE`` ioctl request.
+
+Request contents:
+
+ =========================================== ====== =======================
+ ``ETHTOOL_A_COALESCE_HEADER`` nested request header
+ ``ETHTOOL_A_COALESCE_RX_USECS`` u32 delay (us), normal Rx
+ ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES`` u32 max packets, normal Rx
+ ``ETHTOOL_A_COALESCE_RX_USECS_IRQ`` u32 delay (us), Rx in IRQ
+ ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_IRQ`` u32 max packets, Rx in IRQ
+ ``ETHTOOL_A_COALESCE_TX_USECS`` u32 delay (us), normal Tx
+ ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES`` u32 max packets, normal Tx
+ ``ETHTOOL_A_COALESCE_TX_USECS_IRQ`` u32 delay (us), Tx in IRQ
+ ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_IRQ`` u32 IRQ packets, Tx in IRQ
+ ``ETHTOOL_A_COALESCE_STATS_BLOCK_USECS`` u32 delay of stats update
+ ``ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX`` bool adaptive Rx coalesce
+ ``ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX`` bool adaptive Tx coalesce
+ ``ETHTOOL_A_COALESCE_PKT_RATE_LOW`` u32 threshold for low rate
+ ``ETHTOOL_A_COALESCE_RX_USECS_LOW`` u32 delay (us), low Rx
+ ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_LOW`` u32 max packets, low Rx
+ ``ETHTOOL_A_COALESCE_TX_USECS_LOW`` u32 delay (us), low Tx
+ ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_LOW`` u32 max packets, low Tx
+ ``ETHTOOL_A_COALESCE_PKT_RATE_HIGH`` u32 threshold for high rate
+ ``ETHTOOL_A_COALESCE_RX_USECS_HIGH`` u32 delay (us), high Rx
+ ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_HIGH`` u32 max packets, high Rx
+ ``ETHTOOL_A_COALESCE_TX_USECS_HIGH`` u32 delay (us), high Tx
+ ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH`` u32 max packets, high Tx
+ ``ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL`` u32 rate sampling interval
+ =========================================== ====== =======================
+
+Request is rejected if it attributes declared as unsupported by driver (i.e.
+such that the corresponding bit in ``ethtool_ops::supported_coalesce_params``
+is not set), regardless of their values. Driver may impose additional
+constraints on coalescing parameters and their values.
+
+
+PAUSE_GET
+============
+
+Gets channel counts like ``ETHTOOL_GPAUSE`` ioctl request.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_PAUSE_HEADER`` nested request header
+ ===================================== ====== ==========================
+
+Kernel response contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_PAUSE_HEADER`` nested request header
+ ``ETHTOOL_A_PAUSE_AUTONEG`` bool pause autonegotiation
+ ``ETHTOOL_A_PAUSE_RX`` bool receive pause frames
+ ``ETHTOOL_A_PAUSE_TX`` bool transmit pause frames
+ ===================================== ====== ==========================
+
+
+PAUSE_SET
+============
+
+Sets pause parameters like ``ETHTOOL_GPAUSEPARAM`` ioctl request.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_PAUSE_HEADER`` nested request header
+ ``ETHTOOL_A_PAUSE_AUTONEG`` bool pause autonegotiation
+ ``ETHTOOL_A_PAUSE_RX`` bool receive pause frames
+ ``ETHTOOL_A_PAUSE_TX`` bool transmit pause frames
+ ===================================== ====== ==========================
+
+
+EEE_GET
+=======
+
+Gets channel counts like ``ETHTOOL_GEEE`` ioctl request.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_EEE_HEADER`` nested request header
+ ===================================== ====== ==========================
+
+Kernel response contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_EEE_HEADER`` nested request header
+ ``ETHTOOL_A_EEE_MODES_OURS`` bool supported/advertised modes
+ ``ETHTOOL_A_EEE_MODES_PEER`` bool peer advertised link modes
+ ``ETHTOOL_A_EEE_ACTIVE`` bool EEE is actively used
+ ``ETHTOOL_A_EEE_ENABLED`` bool EEE is enabled
+ ``ETHTOOL_A_EEE_TX_LPI_ENABLED`` bool Tx lpi enabled
+ ``ETHTOOL_A_EEE_TX_LPI_TIMER`` u32 Tx lpi timeout (in us)
+ ===================================== ====== ==========================
+
+In ``ETHTOOL_A_EEE_MODES_OURS``, mask consists of link modes for which EEE is
+enabled, value of link modes for which EEE is advertised. Link modes for which
+peer advertises EEE are listed in ``ETHTOOL_A_EEE_MODES_PEER`` (no mask). The
+netlink interface allows reporting EEE status for all link modes but only
+first 32 are provided by the ``ethtool_ops`` callback.
+
+
+EEE_SET
+=======
+
+Sets pause parameters like ``ETHTOOL_GEEEPARAM`` ioctl request.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_EEE_HEADER`` nested request header
+ ``ETHTOOL_A_EEE_MODES_OURS`` bool advertised modes
+ ``ETHTOOL_A_EEE_ENABLED`` bool EEE is enabled
+ ``ETHTOOL_A_EEE_TX_LPI_ENABLED`` bool Tx lpi enabled
+ ``ETHTOOL_A_EEE_TX_LPI_TIMER`` u32 Tx lpi timeout (in us)
+ ===================================== ====== ==========================
+
+``ETHTOOL_A_EEE_MODES_OURS`` is used to either list link modes to advertise
+EEE for (if there is no mask) or specify changes to the list (if there is
+a mask). The netlink interface allows reporting EEE status for all link modes
+but only first 32 can be set at the moment as that is what the ``ethtool_ops``
+callback supports.
+
+
+TSINFO_GET
+==========
+
+Gets timestamping information like ``ETHTOOL_GET_TS_INFO`` ioctl request.
+
+Request contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_TSINFO_HEADER`` nested request header
+ ===================================== ====== ==========================
+
+Kernel response contents:
+
+ ===================================== ====== ==========================
+ ``ETHTOOL_A_TSINFO_HEADER`` nested request header
+ ``ETHTOOL_A_TSINFO_TIMESTAMPING`` bitset SO_TIMESTAMPING flags
+ ``ETHTOOL_A_TSINFO_TX_TYPES`` bitset supported Tx types
+ ``ETHTOOL_A_TSINFO_RX_FILTERS`` bitset supported Rx filters
+ ``ETHTOOL_A_TSINFO_PHC_INDEX`` u32 PTP hw clock index
+ ===================================== ====== ==========================
+
+``ETHTOOL_A_TSINFO_PHC_INDEX`` is absent if there is no associated PHC (there
+is no special value for this case). The bitset attributes are omitted if they
+would be empty (no bit set).
+
+
Request translation
===================
@@ -545,37 +980,37 @@ have their netlink replacement yet.
``ETHTOOL_GLINK`` ``ETHTOOL_MSG_LINKSTATE_GET``
``ETHTOOL_GEEPROM`` n/a
``ETHTOOL_SEEPROM`` n/a
- ``ETHTOOL_GCOALESCE`` n/a
- ``ETHTOOL_SCOALESCE`` n/a
- ``ETHTOOL_GRINGPARAM`` n/a
- ``ETHTOOL_SRINGPARAM`` n/a
- ``ETHTOOL_GPAUSEPARAM`` n/a
- ``ETHTOOL_SPAUSEPARAM`` n/a
- ``ETHTOOL_GRXCSUM`` n/a
- ``ETHTOOL_SRXCSUM`` n/a
- ``ETHTOOL_GTXCSUM`` n/a
- ``ETHTOOL_STXCSUM`` n/a
- ``ETHTOOL_GSG`` n/a
- ``ETHTOOL_SSG`` n/a
+ ``ETHTOOL_GCOALESCE`` ``ETHTOOL_MSG_COALESCE_GET``
+ ``ETHTOOL_SCOALESCE`` ``ETHTOOL_MSG_COALESCE_SET``
+ ``ETHTOOL_GRINGPARAM`` ``ETHTOOL_MSG_RINGS_GET``
+ ``ETHTOOL_SRINGPARAM`` ``ETHTOOL_MSG_RINGS_SET``
+ ``ETHTOOL_GPAUSEPARAM`` ``ETHTOOL_MSG_PAUSE_GET``
+ ``ETHTOOL_SPAUSEPARAM`` ``ETHTOOL_MSG_PAUSE_SET``
+ ``ETHTOOL_GRXCSUM`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SRXCSUM`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GTXCSUM`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_STXCSUM`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GSG`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SSG`` ``ETHTOOL_MSG_FEATURES_SET``
``ETHTOOL_TEST`` n/a
``ETHTOOL_GSTRINGS`` ``ETHTOOL_MSG_STRSET_GET``
``ETHTOOL_PHYS_ID`` n/a
``ETHTOOL_GSTATS`` n/a
- ``ETHTOOL_GTSO`` n/a
- ``ETHTOOL_STSO`` n/a
+ ``ETHTOOL_GTSO`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_STSO`` ``ETHTOOL_MSG_FEATURES_SET``
``ETHTOOL_GPERMADDR`` rtnetlink ``RTM_GETLINK``
- ``ETHTOOL_GUFO`` n/a
- ``ETHTOOL_SUFO`` n/a
- ``ETHTOOL_GGSO`` n/a
- ``ETHTOOL_SGSO`` n/a
- ``ETHTOOL_GFLAGS`` n/a
- ``ETHTOOL_SFLAGS`` n/a
- ``ETHTOOL_GPFLAGS`` n/a
- ``ETHTOOL_SPFLAGS`` n/a
+ ``ETHTOOL_GUFO`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SUFO`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GGSO`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SGSO`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GFLAGS`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SFLAGS`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GPFLAGS`` ``ETHTOOL_MSG_PRIVFLAGS_GET``
+ ``ETHTOOL_SPFLAGS`` ``ETHTOOL_MSG_PRIVFLAGS_SET``
``ETHTOOL_GRXFH`` n/a
``ETHTOOL_SRXFH`` n/a
- ``ETHTOOL_GGRO`` n/a
- ``ETHTOOL_SGRO`` n/a
+ ``ETHTOOL_GGRO`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SGRO`` ``ETHTOOL_MSG_FEATURES_SET``
``ETHTOOL_GRXRINGS`` n/a
``ETHTOOL_GRXCLSRLCNT`` n/a
``ETHTOOL_GRXCLSRULE`` n/a
@@ -589,18 +1024,18 @@ have their netlink replacement yet.
``ETHTOOL_GSSET_INFO`` ``ETHTOOL_MSG_STRSET_GET``
``ETHTOOL_GRXFHINDIR`` n/a
``ETHTOOL_SRXFHINDIR`` n/a
- ``ETHTOOL_GFEATURES`` n/a
- ``ETHTOOL_SFEATURES`` n/a
- ``ETHTOOL_GCHANNELS`` n/a
- ``ETHTOOL_SCHANNELS`` n/a
+ ``ETHTOOL_GFEATURES`` ``ETHTOOL_MSG_FEATURES_GET``
+ ``ETHTOOL_SFEATURES`` ``ETHTOOL_MSG_FEATURES_SET``
+ ``ETHTOOL_GCHANNELS`` ``ETHTOOL_MSG_CHANNELS_GET``
+ ``ETHTOOL_SCHANNELS`` ``ETHTOOL_MSG_CHANNELS_SET``
``ETHTOOL_SET_DUMP`` n/a
``ETHTOOL_GET_DUMP_FLAG`` n/a
``ETHTOOL_GET_DUMP_DATA`` n/a
- ``ETHTOOL_GET_TS_INFO`` n/a
+ ``ETHTOOL_GET_TS_INFO`` ``ETHTOOL_MSG_TSINFO_GET``
``ETHTOOL_GMODULEINFO`` n/a
``ETHTOOL_GMODULEEEPROM`` n/a
- ``ETHTOOL_GEEE`` n/a
- ``ETHTOOL_SEEE`` n/a
+ ``ETHTOOL_GEEE`` ``ETHTOOL_MSG_EEE_GET``
+ ``ETHTOOL_SEEE`` ``ETHTOOL_MSG_EEE_SET``
``ETHTOOL_GRSSH`` n/a
``ETHTOOL_SRSSH`` n/a
``ETHTOOL_GTUNABLE`` n/a
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index c4a328f2d57a..2f0f8b17dade 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -606,7 +606,7 @@ before a conversion to the new layout is being done behind the scenes!
Currently, the classic BPF format is being used for JITing on most
32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64,
-sparc64, arm32, riscv (RV64G) perform JIT compilation from eBPF
+sparc64, arm32, riscv64, riscv32 perform JIT compilation from eBPF
instruction set.
Some core changes of the new internal format:
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index d07d9855dcd3..50133d9761c9 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -8,6 +8,7 @@ Contents:
netdev-FAQ
af_xdp
+ bareudp
batman-adv
can
can_ucan_protocol
@@ -33,6 +34,7 @@ Contents:
tls
tls-offload
nfc
+ 6lowpan
.. only:: subproject and html
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5f53faff4e25..ee961d322d93 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -958,6 +958,15 @@ ip_nonlocal_bind - BOOLEAN
which can be quite useful - but may break some applications.
Default: 0
+ip_autobind_reuse - BOOLEAN
+ By default, bind() does not select the ports automatically even if
+ the new socket and all sockets bound to the port have SO_REUSEADDR.
+ ip_autobind_reuse allows bind() to reuse the port and this is useful
+ when you use bind()+connect(), but may break some applications.
+ The preferred solution is to use IP_BIND_ADDRESS_NO_PORT and this
+ option should only be set by experts.
+ Default: 0
+
ip_dynaddr - BOOLEAN
If set non-zero, enables support for dynamic addresses.
If set to a non-zero value larger than 1, a kernel log
diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
new file mode 100644
index 000000000000..43088ddf95e4
--- /dev/null
+++ b/Documentation/networking/page_pool.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Page Pool API
+=============
+
+The page_pool allocator is optimized for the XDP mode that uses one frame
+per-page, but it can fallback on the regular page allocator APIs.
+
+Basic use involves replacing alloc_pages() calls with the
+page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages()
+replacing dev_alloc_pages().
+
+API keeps track of inflight pages, in order to let API user know
+when it is safe to free a page_pool object. Thus, API users
+must run page_pool_release_page() when a page is leaving the page_pool or
+call page_pool_put_page() where appropriate in order to maintain correct
+accounting.
+
+API user must call page_pool_put_page() once on a page, as it
+will either recycle the page, or in case of refcnt > 1, it will
+release the DMA mapping and inflight state accounting.
+
+Architecture overview
+=====================
+
+.. code-block:: none
+
+ +------------------+
+ | Driver |
+ +------------------+
+ ^
+ |
+ |
+ |
+ v
+ +--------------------------------------------+
+ | request memory |
+ +--------------------------------------------+
+ ^ ^
+ | |
+ | Pool empty | Pool has entries
+ | |
+ v v
+ +-----------------------+ +------------------------+
+ | alloc (and map) pages | | get page from cache |
+ +-----------------------+ +------------------------+
+ ^ ^
+ | |
+ | cache available | No entries, refill
+ | | from ptr-ring
+ | |
+ v v
+ +-----------------+ +------------------+
+ | Fast cache | | ptr-ring cache |
+ +-----------------+ +------------------+
+
+API interface
+=============
+The number of pools created **must** match the number of hardware queues
+unless hardware restrictions make that impossible. This would otherwise beat the
+purpose of page pool, which is allocate pages fast from cache without locking.
+This lockless guarantee naturally comes from running under a NAPI softirq.
+The protection doesn't strictly have to be NAPI, any guarantee that allocating
+a page will cause no race conditions is enough.
+
+* page_pool_create(): Create a pool.
+ * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
+ * order: 2^order pages on allocation
+ * pool_size: size of the ptr_ring
+ * nid: preferred NUMA node for allocation
+ * dev: struct device. Used on DMA operations
+ * dma_dir: DMA direction
+ * max_len: max DMA sync memory size
+ * offset: DMA address offset
+
+* page_pool_put_page(): The outcome of this depends on the page refcnt. If the
+ driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1
+ the allocator owns the page and will try to recycle it in one of the pool
+ caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
+ using dma_sync_single_range_for_device().
+
+* page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync
+ for the entire memory area configured in area pool->max_len.
+
+* page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller
+ must guarantee safe context (e.g NAPI), since it will recycle the page
+ directly into the pool fast cache.
+
+* page_pool_release_page(): Unmap the page (if mapped) and account for it on
+ inflight counters.
+
+* page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
+ caches.
+
+* page_pool_get_dma_addr(): Retrieve the stored DMA address.
+
+* page_pool_get_dma_dir(): Retrieve the stored DMA direction.
+
+Coding examples
+===============
+
+Registration
+------------
+
+.. code-block:: c
+
+ /* Page pool registration */
+ struct page_pool_params pp_params = { 0 };
+ struct xdp_rxq_info xdp_rxq;
+ int err;
+
+ pp_params.order = 0;
+ /* internal DMA mapping in page_pool */
+ pp_params.flags = PP_FLAG_DMA_MAP;
+ pp_params.pool_size = DESC_NUM;
+ pp_params.nid = NUMA_NO_NODE;
+ pp_params.dev = priv->dev;
+ pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
+ page_pool = page_pool_create(&pp_params);
+
+ err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0);
+ if (err)
+ goto err_out;
+
+ err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool);
+ if (err)
+ goto err_out;
+
+NAPI poller
+-----------
+
+
+.. code-block:: c
+
+ /* NAPI Rx poller */
+ enum dma_data_direction dma_dir;
+
+ dma_dir = page_pool_get_dma_dir(dring->page_pool);
+ while (done < budget) {
+ if (some error)
+ page_pool_recycle_direct(page_pool, page);
+ if (packet_is_xdp) {
+ if XDP_DROP:
+ page_pool_recycle_direct(page_pool, page);
+ } else (packet_is_skb) {
+ page_pool_release_page(page_pool, page);
+ new_page = page_pool_dev_alloc_pages(page_pool);
+ }
+ }
+
+Driver unload
+-------------
+
+.. code-block:: c
+
+ /* Driver unload */
+ page_pool_put_full_page(page_pool, page, false);
+ xdp_rxq_info_unreg(&xdp_rxq);
diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst
index d753a309f9d1..5aec7c8857d0 100644
--- a/Documentation/networking/sfp-phylink.rst
+++ b/Documentation/networking/sfp-phylink.rst
@@ -74,10 +74,13 @@ phylib to the sfp/phylink support. Please send patches to improve
this documentation.
1. Optionally split the network driver's phylib update function into
- three parts dealing with link-down, link-up and reconfiguring the
- MAC settings. This can be done as a separate preparation commit.
+ two parts dealing with link-down and link-up. This can be done as
+ a separate preparation commit.
- An example of this preparation can be found in git commit fc548b991fb0.
+ An older example of this preparation can be found in git commit
+ fc548b991fb0, although this was splitting into three parts; the
+ link-up part now includes configuring the MAC for the link settings.
+ Please see :c:func:`mac_link_up` for more information on this.
2. Replace::
@@ -135,27 +138,27 @@ this documentation.
.. code-block:: c
- static int foo_ethtool_set_link_ksettings(struct net_device *dev,
- const struct ethtool_link_ksettings *cmd)
- {
- struct foo_priv *priv = netdev_priv(dev);
-
- return phylink_ethtool_ksettings_set(priv->phylink, cmd);
- }
-
- static int foo_ethtool_get_link_ksettings(struct net_device *dev,
- struct ethtool_link_ksettings *cmd)
- {
- struct foo_priv *priv = netdev_priv(dev);
+ static int foo_ethtool_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
+ {
+ struct foo_priv *priv = netdev_priv(dev);
+
+ return phylink_ethtool_ksettings_set(priv->phylink, cmd);
+ }
- return phylink_ethtool_ksettings_get(priv->phylink, cmd);
- }
+ static int foo_ethtool_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
+ {
+ struct foo_priv *priv = netdev_priv(dev);
+
+ return phylink_ethtool_ksettings_get(priv->phylink, cmd);
+ }
-7. Replace the call to:
+7. Replace the call to::
phy_dev = of_phy_connect(dev, node, link_func, flags, phy_interface);
- and associated code with a call to:
+ and associated code with a call to::
err = phylink_of_phy_connect(priv->phylink, node, flags);
@@ -207,6 +210,14 @@ this documentation.
using. This is particularly important for in-band negotiation
methods such as 1000base-X and SGMII.
+ The :c:func:`mac_link_up` method is used to inform the MAC that the
+ link has come up. The call includes the negotiation mode and interface
+ for reference only. The finalised link parameters are also supplied
+ (speed, duplex and flow control/pause enablement settings) which
+ should be used to configure the MAC when the MAC and PCS are not
+ tightly integrated, or when the settings are not coming from in-band
+ negotiation.
+
The :c:func:`mac_config` method is used to update the MAC with the
requested state, and must avoid unnecessarily taking the link down
when making changes to the MAC configuration. This means the
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
index 38a4edc4522b..10e11099e74a 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -908,8 +908,8 @@ A TLP probe packet is sent.
A packet loss is detected and recovered by TLP.
-TCP Fast Open
-=============
+TCP Fast Open description
+=========================
TCP Fast Open is a technology which allows data transfer before the
3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a
general description.