diff options
Diffstat (limited to 'Documentation/networking/devlink')
33 files changed, 2623 insertions, 78 deletions
diff --git a/Documentation/networking/devlink/am65-nuss-cpsw-switch.rst b/Documentation/networking/devlink/am65-nuss-cpsw-switch.rst new file mode 100644 index 000000000000..1e589c26abff --- /dev/null +++ b/Documentation/networking/devlink/am65-nuss-cpsw-switch.rst @@ -0,0 +1,26 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================== +am65-cpsw-nuss devlink support +============================== + +This document describes the devlink features implemented by the ``am65-cpsw-nuss`` +device driver. + +Parameters +========== + +The ``am65-cpsw-nuss`` driver implements the following driver-specific +parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``switch_mode`` + - Boolean + - runtime + - Enable switch mode diff --git a/Documentation/networking/devlink/bnxt.rst b/Documentation/networking/devlink/bnxt.rst index 3dfd84ccb1c7..9a8b3d76d11f 100644 --- a/Documentation/networking/devlink/bnxt.rst +++ b/Documentation/networking/devlink/bnxt.rst @@ -22,6 +22,10 @@ Parameters - Permanent * - ``msix_vec_per_pf_min`` - Permanent + * - ``enable_remote_dev_reset`` + - Runtime + * - ``enable_roce`` + - Permanent The ``bnxt`` driver also implements the following driver-specific parameters. diff --git a/Documentation/networking/devlink/devlink-dpipe.rst b/Documentation/networking/devlink/devlink-dpipe.rst index 468fe1001b74..af37f250df43 100644 --- a/Documentation/networking/devlink/devlink-dpipe.rst +++ b/Documentation/networking/devlink/devlink-dpipe.rst @@ -52,7 +52,7 @@ purposes as a standard complementary tool. The system's view from ``devlink-dpipe`` should change according to the changes done by the standard configuration tools. -For example, it’s quiet common to implement Access Control Lists (ACL) +For example, it’s quite common to implement Access Control Lists (ACL) using Ternary Content Addressable Memory (TCAM). The TCAM memory can be divided into TCAM regions. Complex TC filters can have multiple rules with different priorities and different lookup keys. On the other hand hardware diff --git a/Documentation/networking/devlink/devlink-eswitch-attr.rst b/Documentation/networking/devlink/devlink-eswitch-attr.rst new file mode 100644 index 000000000000..08bb39ab1528 --- /dev/null +++ b/Documentation/networking/devlink/devlink-eswitch-attr.rst @@ -0,0 +1,76 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +Devlink E-Switch Attribute +========================== + +Devlink E-Switch supports two modes of operation: legacy and switchdev. +Legacy mode operates based on traditional MAC/VLAN steering rules. Switching +decisions are made based on MAC addresses, VLANs, etc. There is limited ability +to offload switching rules to hardware. + +On the other hand, switchdev mode allows for more advanced offloading +capabilities of the E-Switch to hardware. In switchdev mode, more switching +rules and logic can be offloaded to the hardware switch ASIC. It enables +representor netdevices that represent the slow path of virtual functions (VFs) +or scalable-functions (SFs) of the device. See more information about +:ref:`Documentation/networking/switchdev.rst <switchdev>` and +:ref:`Documentation/networking/representors.rst <representors>`. + +In addition, the devlink E-Switch also comes with other attributes listed +in the following section. + +Attributes Description +====================== + +The following is a list of E-Switch attributes. + +.. list-table:: E-Switch attributes + :widths: 8 5 45 + + * - Name + - Type + - Description + * - ``mode`` + - enum + - The mode of the device. The mode can be one of the following: + + * ``legacy`` operates based on traditional MAC/VLAN steering + rules. + * ``switchdev`` allows for more advanced offloading capabilities of + the E-Switch to hardware. + * - ``inline-mode`` + - enum + - Some HWs need the VF driver to put part of the packet + headers on the TX descriptor so the e-switch can do proper + matching and steering. Support for both switchdev mode and legacy mode. + + * ``none`` none. + * ``link`` L2 mode. + * ``network`` L3 mode. + * ``transport`` L4 mode. + * - ``encap-mode`` + - enum + - The encapsulation mode of the device. Support for both switchdev mode + and legacy mode. The mode can be one of the following: + + * ``none`` Disable encapsulation support. + * ``basic`` Enable encapsulation support. + +Example Usage +============= + +.. code:: shell + + # enable switchdev mode + $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev + + # set inline-mode and encap-mode + $ devlink dev eswitch set pci/0000:08:00.0 inline-mode none encap-mode basic + + # display devlink device eswitch attributes + $ devlink dev eswitch show pci/0000:08:00.0 + pci/0000:08:00.0: mode switchdev inline-mode none encap-mode basic + + # enable encap-mode with legacy mode + $ devlink dev eswitch set pci/0000:08:00.0 mode legacy inline-mode none encap-mode basic diff --git a/Documentation/networking/devlink/devlink-flash.rst b/Documentation/networking/devlink/devlink-flash.rst index 40a87c0222cb..603e732f00cc 100644 --- a/Documentation/networking/devlink/devlink-flash.rst +++ b/Documentation/networking/devlink/devlink-flash.rst @@ -16,6 +16,34 @@ Note that the file name is a path relative to the firmware loading path (usually ``/lib/firmware/``). Drivers may send status updates to inform user space about the progress of the update operation. +Overwrite Mask +============== + +The ``devlink-flash`` command allows optionally specifying a mask indicating +how the device should handle subsections of flash components when updating. +This mask indicates the set of sections which are allowed to be overwritten. + +.. list-table:: List of overwrite mask bits + :widths: 5 95 + + * - Name + - Description + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` + - Indicates that the device should overwrite settings in the components + being updated with the settings found in the provided image. + * - ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` + - Indicates that the device should overwrite identifiers in the + components being updated with the identifiers found in the provided + image. This includes MAC addresses, serial IDs, and similar device + identifiers. + +Multiple overwrite bits may be combined and requested together. If no bits +are provided, it is expected that the device only update firmware binaries +in the components being updated. Settings and identifiers are expected to be +preserved across the update. A device may not support every combination and +the driver for such a device must reject any combination which cannot be +faithfully implemented. + Firmware Loading ================ diff --git a/Documentation/networking/devlink/devlink-health.rst b/Documentation/networking/devlink/devlink-health.rst index 0c99b11f05f9..e0b8cfed610a 100644 --- a/Documentation/networking/devlink/devlink-health.rst +++ b/Documentation/networking/devlink/devlink-health.rst @@ -24,7 +24,7 @@ attributes of the health reporting and recovery procedures. The ``devlink`` health reporter: Device driver creates a "health reporter" per each error/health type. -Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) +Error/Health type can be a known/generic (e.g. PCI error, fw error, rx/tx error) or unknown (driver specific). For each registered health reporter a driver can issue error/health reports asynchronously. All health reports handling is done by ``devlink``. @@ -33,7 +33,7 @@ Device driver can provide specific callbacks for each "health reporter", e.g.: * Recovery procedures * Diagnostics procedures * Object dump procedures - * OOB initial parameters + * Out Of Box initial parameters Different parts of the driver can register different types of health reporters with different handlers. @@ -46,11 +46,31 @@ Once an error is reported, devlink health will perform the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and saved at the reporter instance (as long as - there is no other dump which is already stored) + auto-dump is set and there is no other dump which is already stored) * Auto recovery attempt is being done. Depends on: + - Auto-recovery configuration - Grace period vs. time passed since last recover +Devlink formatted message +========================= + +To handle devlink health diagnose and health dump requests, devlink creates a +formatted message structure ``devlink_fmsg`` and send it to the driver's callback +to fill the data in using the devlink fmsg API. + +Devlink fmsg is a mechanism to pass descriptors between drivers and devlink, in +json-like format. The API allows the driver to add nested attributes such as +object, object pair and value array, in addition to attributes such as name and +value. + +Driver should use this API to fill the fmsg context in a format which will be +translated by the devlink to the netlink message later. When it needs to send +the data using SKBs to the netlink layer, it fragments the data between +different SKBs. In order to do this fragmentation, it uses virtual nests +attributes, to avoid actual nesting use which cannot be divided between +different SKBs. + User Interface ============== @@ -72,14 +92,18 @@ via ``devlink``, e.g per error type (per health reporter): * - ``DEVLINK_CMD_HEALTH_REPORTER_SET`` - Allows reporter-related configuration setting. * - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER`` - - Triggers a reporter's recovery procedure. + - Triggers reporter's recovery procedure. + * - ``DEVLINK_CMD_HEALTH_REPORTER_TEST`` + - Triggers a fake health event on the reporter. The effects of the test + event in terms of recovery flow should follow closely that of a real + event. * - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE`` - - Retrieves diagnostics data from a reporter on a device. + - Retrieves current device state related to the reporter. * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET`` - Retrieves the last stored dump. Devlink health - saves a single dump. If an dump is not already stored by the devlink + saves a single dump. If an dump is not already stored by devlink for this reporter, devlink generates a new dump. - dump output is defined by the reporter. + Dump output is defined by the reporter. * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR`` - Clears the last saved dump file for the specified reporter. @@ -93,7 +117,7 @@ The following diagram provides a general overview of ``devlink-health``:: +--------------------------+ |request for ops |(diagnose, - mlx5_core devlink |recover, + driver devlink |recover, |dump) +--------+ +--------------------------+ | | | reporter| | diff --git a/Documentation/networking/devlink/devlink-info.rst b/Documentation/networking/devlink/devlink-info.rst index 7572bf6de5c1..dd6adc4d0559 100644 --- a/Documentation/networking/devlink/devlink-info.rst +++ b/Documentation/networking/devlink/devlink-info.rst @@ -86,6 +86,10 @@ In case software/firmware components are loaded from the disk (e.g. ``/lib/firmware``) only the running version should be reported via the kernel API. +Please note that any security versions reported via devlink are purely +informational. Devlink does not use a secure channel to communicate with +the device. + Generic Versions ================ @@ -146,6 +150,11 @@ board.manufacture An identifier of the company or the facility which produced the part. +board.part_number +----------------- + +Part number of the board and its components. + fw -- @@ -198,6 +207,11 @@ fw.bundle_id Unique identifier of the entire firmware bundle. +fw.bootloader +------------- + +Version of the bootloader. + Future work =========== diff --git a/Documentation/networking/devlink/devlink-linecard.rst b/Documentation/networking/devlink/devlink-linecard.rst new file mode 100644 index 000000000000..6c0b8928bc13 --- /dev/null +++ b/Documentation/networking/devlink/devlink-linecard.rst @@ -0,0 +1,122 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================= +Devlink Line card +================= + +Background +========== + +The ``devlink-linecard`` mechanism is targeted for manipulation of +line cards that serve as a detachable PHY modules for modular switch +system. Following operations are provided: + + * Get a list of supported line card types. + * Provision of a slot with specific line card type. + * Get and monitor of line card state and its change. + +Line card according to the type may contain one or more gearboxes +to mux the lanes with certain speed to multiple ports with lanes +of different speed. Line card ensures N:M mapping between +the switch ASIC modules and physical front panel ports. + +Overview +======== + +Each line card devlink object is created by device driver, +according to the physical line card slots available on the device. + +Similar to splitter cable, where the device might have no way +of detection of the splitter cable geometry, the device +might not have a way to detect line card type. For that devices, +concept of provisioning is introduced. It allows the user to: + + * Provision a line card slot with certain line card type + + - Device driver would instruct the ASIC to prepare all + resources accordingly. The device driver would + create all instances, namely devlink port and netdevices + that reside on the line card, according to the line card type + * Manipulate of line card entities even without line card + being physically connected or powered-up + * Setup splitter cable on line card ports + + - As on the ordinary ports, user may provision a splitter + cable of a certain type, without the need to + be physically connected to the port + * Configure devlink ports and netdevices + +Netdevice carrier is decided as follows: + + * Line card is not inserted or powered-down + + - The carrier is always down + * Line card is inserted and powered up + + - The carrier is decided as for ordinary port netdevice + +Line card state +=============== + +The ``devlink-linecard`` mechanism supports the following line card states: + + * ``unprovisioned``: Line card is not provisioned on the slot. + * ``unprovisioning``: Line card slot is currently being unprovisioned. + * ``provisioning``: Line card slot is currently in a process of being provisioned + with a line card type. + * ``provisioning_failed``: Provisioning was not successful. + * ``provisioned``: Line card slot is provisioned with a type. + * ``active``: Line card is powered-up and active. + +The following diagram provides a general overview of ``devlink-linecard`` +state transitions:: + + +-------------------------+ + | | + +----------------------------------> unprovisioned | + | | | + | +--------|-------^--------+ + | | | + | | | + | +--------v-------|--------+ + | | | + | | provisioning | + | | | + | +------------|------------+ + | | + | +-----------------------------+ + | | | + | +------------v------------+ +------------v------------+ +-------------------------+ + | | | | ----> | + +----- provisioning_failed | | provisioned | | active | + | | | | <---- | + | +------------^------------+ +------------|------------+ +-------------------------+ + | | | + | | | + | | +------------v------------+ + | | | | + | | | unprovisioning | + | | | | + | | +------------|------------+ + | | | + | +-----------------------------+ + | | + +-----------------------------------------------+ + + +Example usage +============= + +.. code:: shell + + $ devlink lc show [ DEV [ lc LC_INDEX ] ] + $ devlink lc set DEV lc LC_INDEX [ { type LC_TYPE | notype } ] + + # Show current line card configuration and status for all slots: + $ devlink lc + + # Set slot 8 to be provisioned with type "16x100G": + $ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G + + # Set slot 8 to be unprovisioned: + $ devlink lc set pci/0000:01:00.0 lc 8 notype diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst index d075fd090b3d..211b58177e12 100644 --- a/Documentation/networking/devlink/devlink-params.rst +++ b/Documentation/networking/devlink/devlink-params.rst @@ -97,14 +97,49 @@ own name. * - ``enable_roce`` - Boolean - Enable handling of RoCE traffic in the device. + * - ``enable_eth`` + - Boolean + - When enabled, the device driver will instantiate Ethernet specific + auxiliary device of the devlink device. + * - ``enable_rdma`` + - Boolean + - When enabled, the device driver will instantiate RDMA specific + auxiliary device of the devlink device. + * - ``enable_vnet`` + - Boolean + - When enabled, the device driver will instantiate VDPA networking + specific auxiliary device of the devlink device. + * - ``enable_iwarp`` + - Boolean + - Enable handling of iWARP traffic in the device. * - ``internal_err_reset`` - Boolean - When enabled, the device driver will reset the device on internal errors. * - ``max_macs`` - u32 - - Specifies the maximum number of MAC addresses per ethernet port of - this device. + - Typically macvlan, vlan net devices mac are also programmed in their + parent netdevice's Function rx filter. This parameter limit the + maximum number of unicast mac address filters to receive traffic from + per ethernet port of this device. * - ``region_snapshot_enable`` - Boolean - Enable capture of ``devlink-region`` snapshots. + * - ``enable_remote_dev_reset`` + - Boolean + - Enable device reset by remote host. When cleared, the device driver + will NACK any attempt of other host to reset the device. This parameter + is useful for setups where a device is shared by different hosts, such + as multi-host setup. + * - ``io_eq_size`` + - u32 + - Control the size of I/O completion EQs. + * - ``event_eq_size`` + - u32 + - Control the size of asynchronous control events EQ. + * - ``enable_phc`` + - Boolean + - Enable PHC (PTP Hardware Clock) functionality in the device. + * - ``clock_id`` + - u64 + - Clock ID used by the device for registering DPLL devices and pins. diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst new file mode 100644 index 000000000000..5e397798a402 --- /dev/null +++ b/Documentation/networking/devlink/devlink-port.rst @@ -0,0 +1,484 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _devlink_port: + +============ +Devlink Port +============ + +``devlink-port`` is a port that exists on the device. It has a logically +separate ingress/egress point of the device. A devlink port can be any one +of many flavours. A devlink port flavour along with port attributes +describe what a port represents. + +A device driver that intends to publish a devlink port sets the +devlink port attributes and registers the devlink port. + +Devlink port flavours are described below. + +.. list-table:: List of devlink port flavours + :widths: 33 90 + + * - Flavour + - Description + * - ``DEVLINK_PORT_FLAVOUR_PHYSICAL`` + - Any kind of physical port. This can be an eswitch physical port or any + other physical port on the device. + * - ``DEVLINK_PORT_FLAVOUR_DSA`` + - This indicates a DSA interconnect port. + * - ``DEVLINK_PORT_FLAVOUR_CPU`` + - This indicates a CPU port applicable only to DSA. + * - ``DEVLINK_PORT_FLAVOUR_PCI_PF`` + - This indicates an eswitch port representing a port of PCI + physical function (PF). + * - ``DEVLINK_PORT_FLAVOUR_PCI_VF`` + - This indicates an eswitch port representing a port of PCI + virtual function (VF). + * - ``DEVLINK_PORT_FLAVOUR_PCI_SF`` + - This indicates an eswitch port representing a port of PCI + subfunction (SF). + * - ``DEVLINK_PORT_FLAVOUR_VIRTUAL`` + - This indicates a virtual port for the PCI virtual function. + +Devlink port can have a different type based on the link layer described below. + +.. list-table:: List of devlink port types + :widths: 23 90 + + * - Type + - Description + * - ``DEVLINK_PORT_TYPE_ETH`` + - Driver should set this port type when a link layer of the port is + Ethernet. + * - ``DEVLINK_PORT_TYPE_IB`` + - Driver should set this port type when a link layer of the port is + InfiniBand. + * - ``DEVLINK_PORT_TYPE_AUTO`` + - This type is indicated by the user when driver should detect the port + type automatically. + +PCI controllers +--------------- +In most cases a PCI device has only one controller. A controller consists of +potentially multiple physical, virtual functions and subfunctions. A function +consists of one or more ports. This port is represented by the devlink eswitch +port. + +A PCI device connected to multiple CPUs or multiple PCI root complexes or a +SmartNIC, however, may have multiple controllers. For a device with multiple +controllers, each controller is distinguished by a unique controller number. +An eswitch is on the PCI device which supports ports of multiple controllers. + +An example view of a system with two controllers:: + + --------------------------------------------------------- + | | + | --------- --------- ------- ------- | + ----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| | + | server | | ------- ----/---- ---/----- ------- ---/--- ---/--- | + | pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ | + | connect | | ------- ------- | + ----------- | | controller_num=1 (no eswitch) | + ------|-------------------------------------------------- + (internal wire) + | + --------------------------------------------------------- + | devlink eswitch ports and reps | + | ----------------------------------------------------- | + | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | | + | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | | + | ----------------------------------------------------- | + | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | | + | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | | + | ----------------------------------------------------- | + | | + | | + ----------- | --------- --------- ------- ------- | + | smartNIC| | | vf(s) | | sf(s) | |vf(s)| |sf(s)| | + | pci rc |==| ------- ----/---- ---/----- ------- ---/--- ---/--- | + | connect | | | pf0 |______/________/ | pf1 |___/_______/ | + ----------- | ------- ------- | + | | + | local controller_num=0 (eswitch) | + --------------------------------------------------------- + +In the above example, the external controller (identified by controller number = 1) +doesn't have the eswitch. Local controller (identified by controller number = 0) +has the eswitch. The Devlink instance on the local controller has eswitch +devlink ports for both the controllers. + +Function configuration +====================== + +Users can configure one or more function attributes before enumerating the PCI +function. Usually it means, user should configure function attribute +before a bus specific device for the function is created. However, when +SRIOV is enabled, virtual function devices are created on the PCI bus. +Hence, function attribute should be configured before binding virtual +function device to the driver. For subfunctions, this means user should +configure port function attribute before activating the port function. + +A user may set the hardware address of the function using +`devlink port function set hw_addr` command. For Ethernet port function +this means a MAC address. + +Users may also set the RoCE capability of the function using +`devlink port function set roce` command. + +Users may also set the function as migratable using +`devlink port function set migratable` command. + +Users may also set the IPsec crypto capability of the function using +`devlink port function set ipsec_crypto` command. + +Users may also set the IPsec packet capability of the function using +`devlink port function set ipsec_packet` command. + +Users may also set the maximum IO event queues of the function +using `devlink port function set max_io_eqs` command. + +Function attributes +=================== + +MAC address setup +----------------- +The configured MAC address of the PCI VF/SF will be used by netdevice and rdma +device created for the PCI VF/SF. + +- Get the MAC address of the VF identified by its unique devlink port index:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 + +- Set the MAC address of the VF identified by its unique devlink port index:: + + $ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55 + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:11:22:33:44:55 + +- Get the MAC address of the SF identified by its unique devlink port index:: + + $ devlink port show pci/0000:06:00.0/32768 + pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 + function: + hw_addr 00:00:00:00:00:00 + +- Set the MAC address of the SF identified by its unique devlink port index:: + + $ devlink port function set pci/0000:06:00.0/32768 hw_addr 00:00:00:00:88:88 + + $ devlink port show pci/0000:06:00.0/32768 + pci/0000:06:00.0/32768: type eth netdev enp6s0pf0sf88 flavour pcisf pfnum 0 sfnum 88 + function: + hw_addr 00:00:00:00:88:88 + +RoCE capability setup +--------------------- +Not all PCI VFs/SFs require RoCE capability. + +When RoCE capability is disabled, it saves system memory per PCI VF/SF. + +When user disables RoCE capability for a VF/SF, user application cannot send or +receive any RoCE packets through this VF/SF and RoCE GID table for this PCI +will be empty. + +When RoCE capability is disabled in the device using port function attribute, +VF/SF driver cannot override it. + +- Get RoCE capability of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 roce enable + +- Set RoCE capability of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 roce disable + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 roce disable + +migratable capability setup +--------------------------- +Live migration is the process of transferring a live virtual machine +from one physical host to another without disrupting its normal +operation. + +User who want PCI VFs to be able to perform live migration need to +explicitly enable the VF migratable capability. + +When user enables migratable capability for a VF, and the HV binds the VF to VFIO driver +with migration support, the user can migrate the VM with this VF from one HV to a +different one. + +However, when migratable capability is enable, device will disable features which cannot +be migrated. Thus migratable cap can impose limitations on a VF so let the user decide. + +Example of LM with migratable function configuration: +- Get migratable capability of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 migratable disable + +- Set migratable capability of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 migratable enable + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 migratable enable + +- Bind VF to VFIO driver with migration support:: + + $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind + $ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override + $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind + +Attach VF to the VM. +Start the VM. +Perform live migration. + +IPsec crypto capability setup +----------------------------- +When user enables IPsec crypto capability for a VF, user application can offload +XFRM state crypto operation (Encrypt/Decrypt) to this VF. + +When IPsec crypto capability is disabled (default) for a VF, the XFRM state is +processed in software by the kernel. + +- Get IPsec crypto capability of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_crypto disabled + +- Set IPsec crypto capability of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 ipsec_crypto enable + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_crypto enabled + +IPsec packet capability setup +----------------------------- +When user enables IPsec packet capability for a VF, user application can offload +XFRM state and policy crypto operation (Encrypt/Decrypt) to this VF, as well as +IPsec encapsulation. + +When IPsec packet capability is disabled (default) for a VF, the XFRM state and +policy is processed in software by the kernel. + +- Get IPsec packet capability of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_packet disabled + +- Set IPsec packet capability of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 ipsec_packet enable + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_packet enabled + +Maximum IO events queues setup +------------------------------ +When user sets maximum number of IO event queues for a SF or +a VF, such function driver is limited to consume only enforced +number of IO event queues. + +IO event queues deliver events related to IO queues, including network +device transmit and receive queues (txq and rxq) and RDMA Queue Pairs (QPs). +For example, the number of netdevice channels and RDMA device completion +vectors are derived from the function's IO event queues. Usually, the number +of interrupt vectors consumed by the driver is limited by the number of IO +event queues per device, as each of the IO event queues is connected to an +interrupt vector. + +- Get maximum IO event queues of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 + +- Set maximum IO event queues of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 + +Subfunction +============ + +Subfunction is a lightweight function that has a parent PCI function on which +it is deployed. Subfunction is created and deployed in unit of 1. Unlike +SRIOV VFs, a subfunction doesn't require its own PCI virtual function. +A subfunction communicates with the hardware through the parent PCI function. + +To use a subfunction, 3 steps setup sequence is followed: + +1) create - create a subfunction; +2) configure - configure subfunction attributes; +3) deploy - deploy the subfunction; + +Subfunction management is done using devlink port user interface. +User performs setup on the subfunction management device. + +(1) Create +---------- +A subfunction is created using a devlink port interface. A user adds the +subfunction by adding a devlink port of subfunction flavour. The devlink +kernel code calls down to subfunction management driver (devlink ops) and asks +it to create a subfunction devlink port. Driver then instantiates the +subfunction port and any associated objects such as health reporters and +representor netdevice. + +(2) Configure +------------- +A subfunction devlink port is created but it is not active yet. That means the +entities are created on devlink side, the e-switch port representor is created, +but the subfunction device itself is not created. A user might use e-switch port +representor to do settings, putting it into bridge, adding TC rules, etc. A user +might as well configure the hardware address (such as MAC address) of the +subfunction while subfunction is inactive. + +(3) Deploy +---------- +Once a subfunction is configured, user must activate it to use it. Upon +activation, subfunction management driver asks the subfunction management +device to instantiate the subfunction device on particular PCI function. +A subfunction device is created on the :ref:`Documentation/driver-api/auxiliary_bus.rst <auxiliary_bus>`. +At this point a matching subfunction driver binds to the subfunction's auxiliary device. + +Rate object management +====================== + +Devlink provides API to manage tx rates of single devlink port or a group. +This is done through rate objects, which can be one of the two types: + +``leaf`` + Represents a single devlink port; created/destroyed by the driver. Since leaf + have 1to1 mapping to its devlink port, in user space it is referred as + ``pci/<bus_addr>/<port_index>``; + +``node`` + Represents a group of rate objects (leafs and/or nodes); created/deleted by + request from the userspace; initially empty (no rate objects added). In + userspace it is referred as ``pci/<bus_addr>/<node_name>``, where + ``node_name`` can be any identifier, except decimal number, to avoid + collisions with leafs. + +API allows to configure following rate object's parameters: + +``tx_share`` + Minimum TX rate value shared among all other rate objects, or rate objects + that parts of the parent group, if it is a part of the same group. + +``tx_max`` + Maximum TX rate value. + +``tx_priority`` + Allows for usage of strict priority arbiter among siblings. This + arbitration scheme attempts to schedule nodes based on their priority + as long as the nodes remain within their bandwidth limit. The higher the + priority the higher the probability that the node will get selected for + scheduling. + +``tx_weight`` + Allows for usage of Weighted Fair Queuing arbitration scheme among + siblings. This arbitration scheme can be used simultaneously with the + strict priority. As a node is configured with a higher rate it gets more + BW relative to its siblings. Values are relative like a percentage + points, they basically tell how much BW should node take relative to + its siblings. + +``parent`` + Parent node name. Parent node rate limits are considered as additional limits + to all node children limits. ``tx_max`` is an upper limit for children. + ``tx_share`` is a total bandwidth distributed among children. + +``tc_bw`` + Allow users to set the bandwidth allocation per traffic class on rate + objects. This enables fine-grained QoS configurations by assigning a relative + share value to each traffic class. The bandwidth is distributed in proportion + to the share value for each class, relative to the sum of all shares. + When applied to a non-leaf node, tc_bw determines how bandwidth is shared + among its child elements. + +``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case +nodes with the same priority form a WFQ subgroup in the sibling group +and arbitration among them is based on assigned weights. + +Arbitration flow from the high level: + +#. Choose a node, or group of nodes with the highest priority that stays + within the BW limit and are not blocked. Use ``tx_priority`` as a + parameter for this arbitration. + +#. If group of nodes have the same priority perform WFQ arbitration on + that subgroup. Use ``tx_weight`` as a parameter for this arbitration. + +#. Select the winner node, and continue arbitration flow among its children, + until leaf node is reached, and the winner is established. + +#. If all the nodes from the highest priority sub-group are satisfied, or + overused their assigned BW, move to the lower priority nodes. + +Driver implementations are allowed to support both or either rate object types +and setting methods of their parameters. Additionally driver implementation +may export nodes/leafs and their child-parent relationships. + +Terms and Definitions +===================== + +.. list-table:: Terms and Definitions + :widths: 22 90 + + * - Term + - Definitions + * - ``PCI device`` + - A physical PCI device having one or more PCI buses consists of one or + more PCI controllers. + * - ``PCI controller`` + - A controller consists of potentially multiple physical functions, + virtual functions and subfunctions. + * - ``Port function`` + - An object to manage the function of a port. + * - ``Subfunction`` + - A lightweight function that has parent PCI function on which it is + deployed. + * - ``Subfunction device`` + - A bus device of the subfunction, usually on a auxiliary bus. + * - ``Subfunction driver`` + - A device driver for the subfunction auxiliary device. + * - ``Subfunction management device`` + - A PCI physical function that supports subfunction management. + * - ``Subfunction management driver`` + - A device driver for PCI physical function that supports + subfunction management using devlink port interface. + * - ``Subfunction host driver`` + - A device driver for PCI physical function that hosts subfunction + devices. In most cases it is same as subfunction management driver. When + subfunction is used on external controller, subfunction management and + host drivers are different. diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst index 3654c3e9658f..5d0b68f752c0 100644 --- a/Documentation/networking/devlink/devlink-region.rst +++ b/Documentation/networking/devlink/devlink-region.rst @@ -22,7 +22,7 @@ The major benefit to creating a region is to provide access to internal address regions that are otherwise inaccessible to the user. Regions may also be used to provide an additional way to debug complex error -states, but see also :doc:`devlink-health` +states, but see also Documentation/networking/devlink/devlink-health.rst Regions may optionally support capturing a snapshot on demand via the ``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow @@ -31,6 +31,15 @@ in its ``devlink_region_ops`` structure. If snapshot id is not set in the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send the snapshot information to user space. +Regions may optionally allow directly reading from their contents without a +snapshot. Direct read requests are not atomic. In particular a read request +of size 256 bytes or larger will be split into multiple chunks. If atomic +access is required, use a snapshot. A driver wishing to enable this for a +region should implement the ``.read`` callback in the ``devlink_region_ops`` +structure. User space can request a direct read by using the +``DEVLINK_ATTR_REGION_DIRECT`` attribute instead of specifying a snapshot +id. + example usage ------------- @@ -40,12 +49,12 @@ example usage $ devlink region show [ DEV/REGION ] $ devlink region del DEV/REGION snapshot SNAPSHOT_ID $ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ] - $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length length + $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length LENGTH # Show all of the exposed regions with region sizes: $ devlink region show - pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2] - pci/0000:00:05.0/fw-health: size 64 snapshot [1 2] + pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2] max 8 + pci/0000:00:05.0/fw-health: size 64 snapshot [1 2] max 8 # Delete a snapshot using: $ devlink region del pci/0000:00:05.0/cr-space snapshot 1 @@ -65,6 +74,10 @@ example usage $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + # Read from the region without a snapshot + $ devlink region read pci/0000:00:05.0/fw-health address 16 length 16 + 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 + As regions are likely very device or driver specific, no generic regions are defined. See the driver-specific documentation files for information on the specific regions a driver supports. diff --git a/Documentation/networking/devlink/devlink-reload.rst b/Documentation/networking/devlink/devlink-reload.rst new file mode 100644 index 000000000000..2fb0269b2054 --- /dev/null +++ b/Documentation/networking/devlink/devlink-reload.rst @@ -0,0 +1,90 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============== +Devlink Reload +============== + +``devlink-reload`` provides mechanism to reinit driver entities, applying +``devlink-params`` and ``devlink-resources`` new values. It also provides +mechanism to activate firmware. + +Reload Actions +============== + +User may select a reload action. +By default ``driver_reinit`` action is selected. + +.. list-table:: Possible reload actions + :widths: 5 90 + + * - Name + - Description + * - ``driver-reinit`` + - Devlink driver entities re-initialization, including applying + new values to devlink entities which are used during driver + load which are: + + * ``devlink-params`` in configuration mode ``driverinit`` + * ``devlink-resources`` + + Other devlink entities may stay over the re-initialization: + + * ``devlink-health-reporter`` + * ``devlink-region`` + + The rest of the devlink entities have to be removed and readded. + * - ``fw_activate`` + - Firmware activate. Activates new firmware if such image is stored and + pending activation. If no limitation specified this action may involve + firmware reset. If no new image pending this action will reload current + firmware image. + +Note that even though user asks for a specific action, the driver +implementation might require to perform another action alongside with +it. For example, some driver do not support driver reinitialization +being performed without fw activation. Therefore, the devlink reload +command returns the list of actions which were actrually performed. + +Reload Limits +============= + +By default reload actions are not limited and driver implementation may +include reset or downtime as needed to perform the actions. + +However, some drivers support action limits, which limit the action +implementation to specific constraints. + +.. list-table:: Possible reload limits + :widths: 5 90 + + * - Name + - Description + * - ``no_reset`` + - No reset allowed, no down time allowed, no link flap and no + configuration is lost. + +Change Namespace +================ + +The netns option allows user to be able to move devlink instances into +namespaces during devlink reload operation. +By default all devlink instances are created in init_net and stay there. + +example usage +------------- + +.. code:: shell + + $ devlink dev reload help + $ devlink dev reload DEV [ netns { PID | NAME | ID } ] [ action { driver_reinit | fw_activate } ] [ limit no_reset ] + + # Run reload command for devlink driver entities re-initialization: + $ devlink dev reload pci/0000:82:00.0 action driver_reinit + reload_actions_performed: + driver_reinit + + # Run reload command to activate firmware: + # Note that mlx5 driver reloads the driver while activating firmware + $ devlink dev reload pci/0000:82:00.0 action fw_activate + reload_actions_performed: + driver_reinit fw_activate diff --git a/Documentation/networking/devlink/devlink-resource.rst b/Documentation/networking/devlink/devlink-resource.rst index 93e92d2f0752..3d5ae51e65a2 100644 --- a/Documentation/networking/devlink/devlink-resource.rst +++ b/Documentation/networking/devlink/devlink-resource.rst @@ -23,6 +23,20 @@ current size and related sub resources. To access a sub resource, you specify the path of the resource. For example ``/IPv4/fib`` is the id for the ``fib`` sub-resource under the ``IPv4`` resource. +Generic Resources +================= + +Generic resources are used to describe resources that can be shared by multiple +device drivers and their description must be added to the following table: + +.. list-table:: List of Generic Resources + :widths: 10 90 + + * - Name + - Description + * - ``physical_ports`` + - A limited capacity of physical ports that the switch ASIC can support + example usage ------------- diff --git a/Documentation/networking/devlink/devlink-selftests.rst b/Documentation/networking/devlink/devlink-selftests.rst new file mode 100644 index 000000000000..c0aa1f3aef0d --- /dev/null +++ b/Documentation/networking/devlink/devlink-selftests.rst @@ -0,0 +1,38 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +================= +Devlink Selftests +================= + +The ``devlink-selftests`` API allows executing selftests on the device. + +Tests Mask +========== +The ``devlink-selftests`` command should be run with a mask indicating +the tests to be executed. + +Tests Description +================= +The following is a list of tests that drivers may execute. + +.. list-table:: List of tests + :widths: 5 90 + + * - Name + - Description + * - ``DEVLINK_SELFTEST_FLASH`` + - Devices may have the firmware on non-volatile memory on the board, e.g. + flash. This particular test helps to run a flash selftest on the device. + Implementation of the test is left to the driver/firmware. + +example usage +------------- + +.. code:: shell + + # Query selftests supported on the devlink device + $ devlink dev selftests show DEV + # Query selftests supported on all devlink devices + $ devlink dev selftests show + # Executes selftests on the device + $ devlink dev selftests run DEV id flash diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst index 7a798352b45d..5885e21e2212 100644 --- a/Documentation/networking/devlink/devlink-trap.rst +++ b/Documentation/networking/devlink/devlink-trap.rst @@ -409,6 +409,92 @@ be added to the following table: - ``drop`` - Traps packets dropped due to the RED (Random Early Detection) algorithm (i.e., early drops) + * - ``vxlan_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the VXLAN header parsing which + might be because of packet truncation or the I flag is not set. + * - ``llc_snap_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the LLC+SNAP header parsing + * - ``vlan_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the VLAN header parsing. Could + include unexpected packet truncation. + * - ``pppoe_ppp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the PPPoE+PPP header parsing. + This could include finding a session ID of 0xFFFF (which is reserved and + not for use), a PPPoE length which is larger than the frame received or + any common error on this type of header + * - ``mpls_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the MPLS header parsing which + could include unexpected header truncation + * - ``arp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the ARP header parsing + * - ``ip_1_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the first IP header parsing. + This packet trap could include packets which do not pass an IP checksum + check, a header length check (a minimum of 20 bytes), which might suffer + from packet truncation thus the total length field exceeds the received + packet length etc + * - ``ip_n_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the parsing of the last IP + header (the inner one in case of an IP over IP tunnel). The same common + error checking is performed here as for the ip_1_parsing trap + * - ``gre_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the GRE header parsing + * - ``udp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the UDP header parsing. + This packet trap could include checksum errors, an improper UDP + length detected (smaller than 8 bytes) or detection of header + truncation. + * - ``tcp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the TCP header parsing. + This could include TCP checksum errors, improper combination of SYN, FIN + and/or RESET etc. + * - ``ipsec_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the IPSEC header parsing + * - ``sctp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the SCTP header parsing. + This would mean that port number 0 was used or that the header is + truncated. + * - ``dccp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the DCCP header parsing + * - ``gtp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the GTP header parsing + * - ``esp_parsing`` + - ``drop`` + - Traps packets dropped due to an error in the ESP header parsing + * - ``blackhole_nexthop`` + - ``drop`` + - Traps packets that the device decided to drop in case they hit a + blackhole nexthop + * - ``dmac_filter`` + - ``drop`` + - Traps incoming packets that the device decided to drop because + the destination MAC is not configured in the MAC table and + the interface is not in promiscuous mode + * - ``eapol`` + - ``control`` + - Traps "Extensible Authentication Protocol over LAN" (EAPOL) packets + specified in IEEE 802.1X + * - ``locked_port`` + - ``drop`` + - Traps packets that the device decided to drop because they failed the + locked bridge port check. That is, packets that were received via a + locked port and whose {SMAC, VID} does not correspond to an FDB entry + pointing to the port Driver-specific Packet Traps ============================ @@ -419,8 +505,9 @@ help debug packet drops caused by these exceptions. The following list includes links to the description of driver-specific traps registered by various device drivers: - * :doc:`netdevsim` - * :doc:`mlxsw` + * Documentation/networking/devlink/netdevsim.rst + * Documentation/networking/devlink/mlxsw.rst + * Documentation/networking/devlink/prestera.rst .. _Generic-Packet-Trap-Groups: @@ -509,6 +596,12 @@ narrow. The description of these groups must be added to the following table: * - ``acl_trap`` - Contains packet traps for packets that were trapped (logged) by the device during ACL processing + * - ``parser_error_drops`` + - Contains packet traps for packets that were marked by the device during + parsing as erroneous + * - ``eapol`` + - Contains packet traps for "Extensible Authentication Protocol over LAN" + (EAPOL) packets specified in IEEE 802.1X Packet Trap Policers ==================== diff --git a/Documentation/networking/devlink/etas_es58x.rst b/Documentation/networking/devlink/etas_es58x.rst new file mode 100644 index 000000000000..3b857d82a44c --- /dev/null +++ b/Documentation/networking/devlink/etas_es58x.rst @@ -0,0 +1,36 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +etas_es58x devlink support +========================== + +This document describes the devlink features implemented by the +``etas_es58x`` device driver. + +Info versions +============= + +The ``etas_es58x`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``fw`` + - running + - Version of the firmware running on the device. Also available + through ``ethtool -i`` as the first member of the + ``firmware-version``. + * - ``fw.bootloader`` + - running + - Version of the bootloader running on the device. Also available + through ``ethtool -i`` as the second member of the + ``firmware-version``. + * - ``board.rev`` + - fixed + - The hardware revision of the device. + * - ``serial_number`` + - fixed + - The USB serial number. Also available through ``lsusb -v``. diff --git a/Documentation/networking/devlink/hns3.rst b/Documentation/networking/devlink/hns3.rst new file mode 100644 index 000000000000..72bc1b9f3785 --- /dev/null +++ b/Documentation/networking/devlink/hns3.rst @@ -0,0 +1,30 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +hns3 devlink support +==================== + +This document describes the devlink features implemented by the ``hns3`` +device driver. + +The ``hns3`` driver supports reloading via ``DEVLINK_CMD_RELOAD``. + +Info versions +============= + +The ``hns3`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 10 10 80 + + * - Name + - Type + - Description + * - ``fw`` + - running + - Used to represent the firmware version. + * - ``fw.scc`` + - running + - Used to represent the Soft Congestion Control (SSC) firmware version. + SCC is a firmware component which provides multiple RDMA congestion + control algorithms, including DCQCN. diff --git a/Documentation/networking/devlink/i40e.rst b/Documentation/networking/devlink/i40e.rst new file mode 100644 index 000000000000..d3cb5bb5197e --- /dev/null +++ b/Documentation/networking/devlink/i40e.rst @@ -0,0 +1,59 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +i40e devlink support +==================== + +This document describes the devlink features implemented by the ``i40e`` +device driver. + +Info versions +============= + +The ``i40e`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 5 90 + + * - Name + - Type + - Example + - Description + * - ``board.id`` + - fixed + - K15190-000 + - The Product Board Assembly (PBA) identifier of the board. + * - ``fw.mgmt`` + - running + - 9.130 + - 2-digit version number of the management firmware that controls the + PHY, link, etc. + * - ``fw.mgmt.api`` + - running + - 1.15 + - 2-digit version number of the API exported over the AdminQ by the + management firmware. Used by the driver to identify what commands + are supported. + * - ``fw.mgmt.build`` + - running + - 73618 + - Build number of the source for the management firmware. + * - ``fw.undi`` + - running + - 1.3429.0 + - Version of the Option ROM containing the UEFI driver. The version is + reported in ``major.minor.patch`` format. The major version is + incremented whenever a major breaking change occurs, or when the + minor version would overflow. The minor version is incremented for + non-breaking changes and reset to 1 when the major version is + incremented. The patch version is normally 0 but is incremented when + a fix is delivered as a patch against an older base Option ROM. + * - ``fw.psid.api`` + - running + - 9.30 + - Version defining the format of the flash contents. + * - ``fw.bundle_id`` + - running + - 0x8000e5f3 + - Unique identifier of the firmware image file that was loaded onto + the device. Also referred to as the EETRACK identifier of the NVM. diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst index 237848d56f9b..792e9f8c846a 100644 --- a/Documentation/networking/devlink/ice.rst +++ b/Documentation/networking/devlink/ice.rst @@ -7,6 +7,104 @@ ice devlink support This document describes the devlink features implemented by the ``ice`` device driver. +Parameters +========== + +.. list-table:: Generic parameters implemented + :widths: 5 5 90 + + * - Name + - Mode + - Notes + * - ``enable_roce`` + - runtime + - mutually exclusive with ``enable_iwarp`` + * - ``enable_iwarp`` + - runtime + - mutually exclusive with ``enable_roce`` + * - ``tx_scheduling_layers`` + - permanent + - The ice hardware uses hierarchical scheduling for Tx with a fixed + number of layers in the scheduling tree. Each of them are decision + points. Root node represents a port, while all the leaves represent + the queues. This way of configuring the Tx scheduler allows features + like DCB or devlink-rate (documented below) to configure how much + bandwidth is given to any given queue or group of queues, enabling + fine-grained control because scheduling parameters can be configured + at any given layer of the tree. + + The default 9-layer tree topology was deemed best for most workloads, + as it gives an optimal ratio of performance to configurability. However, + for some specific cases, this 9-layer topology might not be desired. + One example would be sending traffic to queues that are not a multiple + of 8. Because the maximum radix is limited to 8 in 9-layer topology, + the 9th queue has a different parent than the rest, and it's given + more bandwidth credits. This causes a problem when the system is + sending traffic to 9 queues: + + | tx_queue_0_packets: 24163396 + | tx_queue_1_packets: 24164623 + | tx_queue_2_packets: 24163188 + | tx_queue_3_packets: 24163701 + | tx_queue_4_packets: 24163683 + | tx_queue_5_packets: 24164668 + | tx_queue_6_packets: 23327200 + | tx_queue_7_packets: 24163853 + | tx_queue_8_packets: 91101417 < Too much traffic is sent from 9th + + To address this need, you can switch to a 5-layer topology, which + changes the maximum topology radix to 512. With this enhancement, + the performance characteristic is equal as all queues can be assigned + to the same parent in the tree. The obvious drawback of this solution + is a lower configuration depth of the tree. + + Use the ``tx_scheduling_layer`` parameter with the devlink command + to change the transmit scheduler topology. To use 5-layer topology, + use a value of 5. For example: + $ devlink dev param set pci/0000:16:00.0 name tx_scheduling_layers + value 5 cmode permanent + Use a value of 9 to set it back to the default value. + + You must do PCI slot powercycle for the selected topology to take effect. + + To verify that value has been set: + $ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers + * - ``msix_vec_per_pf_max`` + - driverinit + - Set the max MSI-X that can be used by the PF, rest can be utilized for + SRIOV. The range is from min value set in msix_vec_per_pf_min to + 2k/number of ports. + * - ``msix_vec_per_pf_min`` + - driverinit + - Set the min MSI-X that will be used by the PF. This value inform how many + MSI-X will be allocated statically. The range is from 2 to value set + in msix_vec_per_pf_max. + +.. list-table:: Driver specific parameters implemented + :widths: 5 5 90 + + * - Name + - Mode + - Description + * - ``local_forwarding`` + - runtime + - Controls loopback behavior by tuning scheduler bandwidth. + It impacts all kinds of functions: physical, virtual and + subfunctions. + Supported values are: + + ``enabled`` - loopback traffic is allowed on port + + ``disabled`` - loopback traffic is not allowed on this port + + ``prioritized`` - loopback traffic is prioritized on this port + + Default value of ``local_forwarding`` parameter is ``enabled``. + ``prioritized`` provides ability to adjust loopback traffic rate to increase + one port capacity at cost of the another. User needs to disable + local forwarding on one of the ports in order have increased capacity + on the ``prioritized`` port. + Info versions ============= @@ -23,17 +121,24 @@ The ``ice`` driver reports the following versions - fixed - K65390-000 - The Product Board Assembly (PBA) identifier of the board. + * - ``cgu.id`` + - fixed + - 36 + - The Clock Generation Unit (CGU) hardware revision identifier. * - ``fw.mgmt`` - running - 2.1.7 - - 3-digit version number of the management firmware that controls the - PHY, link, etc. + - 3-digit version number of the management firmware running on the + Embedded Management Processor of the device. It controls the PHY, + link, access to device resources, etc. Intel documentation refers to + this as the EMP firmware. * - ``fw.mgmt.api`` - running - - 1.5 - - 2-digit version number of the API exported over the AdminQ by the - management firmware. Used by the driver to identify what commands - are supported. + - 1.5.1 + - 3-digit version number (major.minor.patch) of the API exported over + the AdminQ by the management firmware. Used by the driver to + identify what commands are supported. Historical versions of the + kernel only displayed a 2-digit version number (major.minor). * - ``fw.mgmt.build`` - running - 0x305d955f @@ -69,6 +174,12 @@ The ``ice`` driver reports the following versions - The version of the DDP package that is active in the device. Note that both the name (as reported by ``fw.app.name``) and version are required to uniquely identify the package. + * - ``fw.app.bundle_id`` + - running + - 0xc0000001 + - Unique identifier for the DDP package loaded in the device. Also + referred to as the DDP Track ID. Can be used to uniquely identify + the specific DDP package. * - ``fw.netlist`` - running - 1.1.2000-6.7.0 @@ -80,6 +191,96 @@ The ``ice`` driver reports the following versions - running - 0xee16ced7 - The first 4 bytes of the hash of the netlist module contents. + * - ``fw.cgu`` + - running + - 8032.16973825.6021 + - The version of Clock Generation Unit (CGU). Format: + <CGU type>.<configuration version>.<firmware version>. + +Flash Update +============ + +The ``ice`` driver implements support for flash update using the +``devlink-flash`` interface. It supports updating the device flash using a +combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and +``fw.netlist`` components. + +.. list-table:: List of supported overwrite modes + :widths: 5 95 + + * - Bits + - Behavior + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` + - Do not preserve settings stored in the flash components being + updated. This includes overwriting the port configuration that + determines the number of physical functions the device will + initialize with. + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` + - Do not preserve either settings or identifiers. Overwrite everything + in the flash with the contents from the provided image, without + performing any preservation. This includes overwriting device + identifying fields such as the MAC address, VPD area, and device + serial number. It is expected that this combination be used with an + image customized for the specific device. + +The ice hardware does not support overwriting only identifiers while +preserving settings, and thus ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` on its +own will be rejected. If no overwrite mask is provided, the firmware will be +instructed to preserve all settings and identifying fields when updating. + +Reload +====== + +The ``ice`` driver supports activating new firmware after a flash update +using ``DEVLINK_CMD_RELOAD`` with the ``DEVLINK_RELOAD_ACTION_FW_ACTIVATE`` +action. + +.. code:: shell + + $ devlink dev reload pci/0000:01:00.0 reload action fw_activate + +The new firmware is activated by issuing a device specific Embedded +Management Processor reset which requests the device to reset and reload the +EMP firmware image. + +The driver does not currently support reloading the driver via +``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``. + +Port split +========== + +The ``ice`` driver supports port splitting only for port 0, as the FW has +a predefined set of available port split options for the whole device. + +A system reboot is required for port split to be applied. + +The following command will select the port split option with 4 ports: + +.. code:: shell + + $ devlink port split pci/0000:16:00.0/0 count 4 + +The list of all available port options will be printed to dynamic debug after +each ``split`` and ``unsplit`` command. The first option is the default. + +.. code:: shell + + ice 0000:16:00.0: Available port split options and max port speeds (Gbps): + ice 0000:16:00.0: Status Split Quad 0 Quad 1 + ice 0000:16:00.0: count L0 L1 L2 L3 L4 L5 L6 L7 + ice 0000:16:00.0: Active 2 100 - - - 100 - - - + ice 0000:16:00.0: 2 50 - 50 - - - - - + ice 0000:16:00.0: Pending 4 25 25 25 25 - - - - + ice 0000:16:00.0: 4 25 25 - - 25 25 - - + ice 0000:16:00.0: 8 10 10 10 10 10 10 10 10 + ice 0000:16:00.0: 1 100 - - - - - - - + +There could be multiple FW port options with the same port split count. When +the same port split count request is issued again, the next FW port option with +the same port split count will be selected. + +``devlink port unsplit`` will select the option with a split count of 1. If +there is no FW option available with split count 1, you will receive an error. Regions ======= @@ -95,15 +296,28 @@ device data. * - ``nvm-flash`` - The contents of the entire flash chip, sometimes referred to as the device's Non Volatile Memory. + * - ``shadow-ram`` + - The contents of the Shadow RAM, which is loaded from the beginning + of the flash. Although the contents are primarily from the flash, + this area also contains data generated during device boot which is + not stored in flash. * - ``device-caps`` - The contents of the device firmware's capabilities buffer. Useful to determine the current state and configuration of the device. -Users can request an immediate capture of a snapshot via the -``DEVLINK_CMD_REGION_NEW`` +Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a +snapshot. The ``device-caps`` region requires a snapshot as the contents are +sent by firmware and can't be split into separate reads. + +Users can request an immediate capture of a snapshot for all three regions +via the ``DEVLINK_CMD_REGION_NEW`` command. .. code:: shell + $ devlink region show + pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1 + pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10 + $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 @@ -156,3 +370,118 @@ Users can request an immediate capture of a snapshot via the 0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1 + +Devlink Rate +============ + +The ``ice`` driver implements devlink-rate API. It allows for offload of +the Hierarchical QoS to the hardware. It enables user to group Virtual +Functions in a tree structure and assign supported parameters: tx_share, +tx_max, tx_priority and tx_weight to each node in a tree. So effectively +user gains an ability to control how much bandwidth is allocated for each +VF group. This is later enforced by the HW. + +It is assumed that this feature is mutually exclusive with DCB performed +in FW and ADQ, or any driver feature that would trigger changes in QoS, +for example creation of the new traffic class. The driver will prevent DCB +or ADQ configuration if user started making any changes to the nodes using +devlink-rate API. To configure those features a driver reload is necessary. +Correspondingly if ADQ or DCB will get configured the driver won't export +hierarchy at all, or will remove the untouched hierarchy if those +features are enabled after the hierarchy is exported, but before any +changes are made. + +This feature is also dependent on switchdev being enabled in the system. +It's required because devlink-rate requires devlink-port objects to be +present, and those objects are only created in switchdev mode. + +If the driver is set to the switchdev mode, it will export internal +hierarchy the moment VF's are created. Root of the tree is always +represented by the node_0. This node can't be deleted by the user. Leaf +nodes and nodes with children also can't be deleted. + +.. list-table:: Attributes supported + :widths: 15 85 + + * - Name + - Description + * - ``tx_max`` + - maximum bandwidth to be consumed by the tree Node. Rate Limit is + an absolute number specifying a maximum amount of bytes a Node may + consume during the course of one second. Rate limit guarantees + that a link will not oversaturate the receiver on the remote end + and also enforces an SLA between the subscriber and network + provider. + * - ``tx_share`` + - minimum bandwidth allocated to a tree node when it is not blocked. + It specifies an absolute BW. While tx_max defines the maximum + bandwidth the node may consume, the tx_share marks committed BW + for the Node. + * - ``tx_priority`` + - allows for usage of strict priority arbiter among siblings. This + arbitration scheme attempts to schedule nodes based on their + priority as long as the nodes remain within their bandwidth limit. + Range 0-7. Nodes with priority 7 have the highest priority and are + selected first, while nodes with priority 0 have the lowest + priority. Nodes that have the same priority are treated equally. + * - ``tx_weight`` + - allows for usage of Weighted Fair Queuing arbitration scheme among + siblings. This arbitration scheme can be used simultaneously with + the strict priority. Range 1-200. Only relative values matter for + arbitration. + +``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case +nodes with the same priority form a WFQ subgroup in the sibling group +and arbitration among them is based on assigned weights. + +.. code:: shell + + # enable switchdev + $ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev + + # at this point driver should export internal hierarchy + $ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs + + $ devlink port function rate show + pci/0000:4b:00.0/node_25: type node parent node_24 + pci/0000:4b:00.0/node_24: type node parent node_0 + pci/0000:4b:00.0/node_32: type node parent node_31 + pci/0000:4b:00.0/node_31: type node parent node_30 + pci/0000:4b:00.0/node_30: type node parent node_16 + pci/0000:4b:00.0/node_19: type node parent node_18 + pci/0000:4b:00.0/node_18: type node parent node_17 + pci/0000:4b:00.0/node_17: type node parent node_16 + pci/0000:4b:00.0/node_14: type node parent node_5 + pci/0000:4b:00.0/node_5: type node parent node_3 + pci/0000:4b:00.0/node_13: type node parent node_4 + pci/0000:4b:00.0/node_12: type node parent node_4 + pci/0000:4b:00.0/node_11: type node parent node_4 + pci/0000:4b:00.0/node_10: type node parent node_4 + pci/0000:4b:00.0/node_9: type node parent node_4 + pci/0000:4b:00.0/node_8: type node parent node_4 + pci/0000:4b:00.0/node_7: type node parent node_4 + pci/0000:4b:00.0/node_6: type node parent node_4 + pci/0000:4b:00.0/node_4: type node parent node_3 + pci/0000:4b:00.0/node_3: type node parent node_16 + pci/0000:4b:00.0/node_16: type node parent node_15 + pci/0000:4b:00.0/node_15: type node parent node_0 + pci/0000:4b:00.0/node_2: type node parent node_1 + pci/0000:4b:00.0/node_1: type node parent node_0 + pci/0000:4b:00.0/node_0: type node + pci/0000:4b:00.0/1: type leaf parent node_25 + pci/0000:4b:00.0/2: type leaf parent node_25 + + # let's create some custom node + $ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0 + + # second custom node + $ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom + + # reassign second VF to newly created branch + $ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1 + + # assign tx_weight to the VF + $ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5 + + # assign tx_share to the VF + $ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst index 7684ae5c4a4a..270a65a01411 100644 --- a/Documentation/networking/devlink/index.rst +++ b/Documentation/networking/devlink/index.rst @@ -4,6 +4,48 @@ Linux Devlink Documentation devlink is an API to expose device information and resources not directly related to any device class, such as chip-wide/switch-ASIC-wide configuration. +Locking +------- + +Driver facing APIs are currently transitioning to allow more explicit +locking. Drivers can use the existing ``devlink_*`` set of APIs, or +new APIs prefixed by ``devl_*``. The older APIs handle all the locking +in devlink core, but don't allow registration of most sub-objects once +the main devlink object is itself registered. The newer ``devl_*`` APIs assume +the devlink instance lock is already held. Drivers can take the instance +lock by calling ``devl_lock()``. It is also held all callbacks of devlink +netlink commands. + +Drivers are encouraged to use the devlink instance lock for their own needs. + +Drivers need to be cautious when taking devlink instance lock and +taking RTNL lock at the same time. Devlink instance lock needs to be taken +first, only after that RTNL lock could be taken. + +Nested instances +---------------- + +Some objects, like linecards or port functions, could have another +devlink instances created underneath. In that case, drivers should make +sure to respect following rules: + + - Lock ordering should be maintained. If driver needs to take instance + lock of both nested and parent instances at the same time, devlink + instance lock of the parent instance should be taken first, only then + instance lock of the nested instance could be taken. + - Driver should use object-specific helpers to setup the + nested relationship: + + - ``devl_nested_devlink_set()`` - called to setup devlink -> nested + devlink relationship (could be user for multiple nested instances. + - ``devl_port_fn_devlink_set()`` - called to setup port function -> + nested devlink relationship. + - ``devlink_linecard_nested_dl_set()`` - called to setup linecard -> + nested devlink relationship. + +The nested devlink info is exposed to the userspace over object-specific +attributes of devlink netlink. + Interface documentation ----------------------- @@ -18,9 +60,14 @@ general. devlink-info devlink-flash devlink-params + devlink-port devlink-region devlink-resource + devlink-reload + devlink-selftests devlink-trap + devlink-linecard + devlink-eswitch-attr Driver-specific documentation ----------------------------- @@ -32,14 +79,25 @@ parameters, info versions, and other features it supports. :maxdepth: 1 bnxt + etas_es58x + hns3 + i40e ionic ice + ixgbe + kvaser_pciefd + kvaser_usb mlx4 mlx5 mlxsw mv88e6xxx netdevsim nfp - sja1105 qed ti-cpsw-switch + am65-nuss-cpsw-switch + prestera + iosm + octeontx2 + sfc + zl3073x diff --git a/Documentation/networking/devlink/iosm.rst b/Documentation/networking/devlink/iosm.rst new file mode 100644 index 000000000000..6136181339aa --- /dev/null +++ b/Documentation/networking/devlink/iosm.rst @@ -0,0 +1,162 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================== +iosm devlink support +==================== + +This document describes the devlink features implemented by the ``iosm`` +device driver. + +Parameters +========== + +The ``iosm`` driver implements the following driver-specific parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``erase_full_flash`` + - u8 + - runtime + - erase_full_flash parameter is used to check if full erase is required for + the device during firmware flashing. + If set, Full nand erase command will be sent to the device. By default, + only conditional erase support is enabled. + + +Flash Update +============ + +The ``iosm`` driver implements support for flash update using the +``devlink-flash`` interface. + +It supports updating the device flash using a combined flash image which contains +the Bootloader images and other modem software images. + +The driver uses DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT to identify type of +firmware image that need to be flashed as requested by user space application. +Supported firmware image types. + +.. list-table:: Firmware Image types + :widths: 15 85 + + * - Name + - Description + * - ``PSI RAM`` + - Primary Signed Image + * - ``EBL`` + - External Bootloader + * - ``FLS`` + - Modem Software Image + +PSI RAM and EBL are the RAM images which are injected to the device when the +device is in BOOT ROM stage. Once this is successful, the actual modem firmware +image is flashed to the device. The modem software image contains multiple files +each having one secure bin file and at least one Loadmap/Region file. For flashing +these files, appropriate commands are sent to the modem device along with the +data required for flashing. The data like region count and address of each region +has to be passed to the driver using the devlink param command. + +If the device has to be fully erased before firmware flashing, user application +need to set the erase_full_flash parameter using devlink param command. +By default, conditional erase feature is supported. + +Flash Commands: +=============== +1) When modem is in Boot ROM stage, user can use below command to inject PSI RAM +image using devlink flash command. + +$ devlink dev flash pci/0000:02:00.0 file <PSI_RAM_File_name> + +2) If user want to do a full erase, below command need to be issued to set the +erase full flash param (To be set only if full erase required). + +$ devlink dev param set pci/0000:02:00.0 name erase_full_flash value true cmode runtime + +3) Inject EBL after the modem is in PSI stage. + +$ devlink dev flash pci/0000:02:00.0 file <EBL_File_name> + +4) Once EBL is injected successfully, then the actual firmware flashing takes +place. Below is the sequence of commands used for each of the firmware images. + +a) Flash secure bin file. + +$ devlink dev flash pci/0000:02:00.0 file <Secure_bin_file_name> + +b) Flashing the Loadmap/Region file + +$ devlink dev flash pci/0000:02:00.0 file <Load_map_file_name> + +Regions +======= + +The ``iosm`` driver supports dumping the coredump logs. + +In case a firmware encounters an exception, a snapshot will be taken by the +driver. Following regions are accessed for device internal data. + +.. list-table:: Regions implemented + :widths: 15 85 + + * - Name + - Description + * - ``report.json`` + - The summary of exception details logged as part of this region. + * - ``coredump.fcd`` + - This region contains the details related to the exception occurred in the + device (RAM dump). + * - ``cdd.log`` + - This region contains the logs related to the modem CDD driver. + * - ``eeprom.bin`` + - This region contains the eeprom logs. + * - ``bootcore_trace.bin`` + - This region contains the current instance of bootloader logs. + * - ``bootcore_prev_trace.bin`` + - This region contains the previous instance of bootloader logs. + + +Region commands +=============== + +$ devlink region show + +$ devlink region new pci/0000:02:00.0/report.json + +$ devlink region dump pci/0000:02:00.0/report.json snapshot 0 + +$ devlink region del pci/0000:02:00.0/report.json snapshot 0 + +$ devlink region new pci/0000:02:00.0/coredump.fcd + +$ devlink region dump pci/0000:02:00.0/coredump.fcd snapshot 1 + +$ devlink region del pci/0000:02:00.0/coredump.fcd snapshot 1 + +$ devlink region new pci/0000:02:00.0/cdd.log + +$ devlink region dump pci/0000:02:00.0/cdd.log snapshot 2 + +$ devlink region del pci/0000:02:00.0/cdd.log snapshot 2 + +$ devlink region new pci/0000:02:00.0/eeprom.bin + +$ devlink region dump pci/0000:02:00.0/eeprom.bin snapshot 3 + +$ devlink region del pci/0000:02:00.0/eeprom.bin snapshot 3 + +$ devlink region new pci/0000:02:00.0/bootcore_trace.bin + +$ devlink region dump pci/0000:02:00.0/bootcore_trace.bin snapshot 4 + +$ devlink region del pci/0000:02:00.0/bootcore_trace.bin snapshot 4 + +$ devlink region new pci/0000:02:00.0/bootcore_prev_trace.bin + +$ devlink region dump pci/0000:02:00.0/bootcore_prev_trace.bin snapshot 5 + +$ devlink region del pci/0000:02:00.0/bootcore_prev_trace.bin snapshot 5 diff --git a/Documentation/networking/devlink/ixgbe.rst b/Documentation/networking/devlink/ixgbe.rst new file mode 100644 index 000000000000..c27d1436c70e --- /dev/null +++ b/Documentation/networking/devlink/ixgbe.rst @@ -0,0 +1,171 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +ixgbe devlink support +===================== + +This document describes the devlink features implemented by the ``ixgbe`` +device driver. + +Info versions +============= + +Any of the versions dealing with the security presented by ``devlink-info`` +is purely informational. Devlink does not use a secure channel to communicate +with the device. + +The ``ixgbe`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 5 90 + + * - Name + - Type + - Example + - Description + * - ``board.id`` + - fixed + - H49289-000 + - The Product Board Assembly (PBA) identifier of the board. + * - ``fw.undi`` + - running + - 1.1937.0 + - Version of the Option ROM containing the UEFI driver. The version is + reported in ``major.minor.patch`` format. The major version is + incremented whenever a major breaking change occurs, or when the + minor version would overflow. The minor version is incremented for + non-breaking changes and reset to 1 when the major version is + incremented. The patch version is normally 0 but is incremented when + a fix is delivered as a patch against an older base Option ROM. + * - ``fw.undi.srev`` + - running + - 4 + - Number indicating the security revision of the Option ROM. + * - ``fw.bundle_id`` + - running + - 0x80000d0d + - Unique identifier of the firmware image file that was loaded onto + the device. Also referred to as the EETRACK identifier of the NVM. + * - ``fw.mgmt.api`` + - running + - 1.5.1 + - 3-digit version number (major.minor.patch) of the API exported over + the AdminQ by the management firmware. Used by the driver to + identify what commands are supported. Historical versions of the + kernel only displayed a 2-digit version number (major.minor). + * - ``fw.mgmt.build`` + - running + - 0x305d955f + - Unique identifier of the source for the management firmware. + * - ``fw.mgmt.srev`` + - running + - 3 + - Number indicating the security revision of the firmware. + * - ``fw.psid.api`` + - running + - 0.80 + - Version defining the format of the flash contents. + * - ``fw.netlist`` + - running + - 1.1.2000-6.7.0 + - The version of the netlist module. This module defines the device's + Ethernet capabilities and default settings, and is used by the + management firmware as part of managing link and device + connectivity. + * - ``fw.netlist.build`` + - running + - 0xee16ced7 + - The first 4 bytes of the hash of the netlist module contents. + +Flash Update +============ + +The ``ixgbe`` driver implements support for flash update using the +``devlink-flash`` interface. It supports updating the device flash using a +combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and +``fw.netlist`` components. + +.. list-table:: List of supported overwrite modes + :widths: 5 95 + + * - Bits + - Behavior + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` + - Do not preserve settings stored in the flash components being + updated. This includes overwriting the port configuration that + determines the number of physical functions the device will + initialize with. + * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` + - Do not preserve either settings or identifiers. Overwrite everything + in the flash with the contents from the provided image, without + performing any preservation. This includes overwriting device + identifying fields such as the MAC address, Vital product Data (VPD) area, + and device serial number. It is expected that this combination be used with an + image customized for the specific device. + +Reload +====== + +The ``ixgbe`` driver supports activating new firmware after a flash update +using ``DEVLINK_CMD_RELOAD`` with the ``DEVLINK_RELOAD_ACTION_FW_ACTIVATE`` +action. + +.. code:: shell + + $ devlink dev reload pci/0000:01:00.0 reload action fw_activate + +The new firmware is activated by issuing a device specific Embedded +Management Processor reset which requests the device to reset and reload the +EMP firmware image. + +The driver does not currently support reloading the driver via +``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``. + +Regions +======= + +The ``ixgbe`` driver implements the following regions for accessing internal +device data. + +.. list-table:: regions implemented + :widths: 15 85 + + * - Name + - Description + * - ``nvm-flash`` + - The contents of the entire flash chip, sometimes referred to as + the device's Non Volatile Memory. + * - ``shadow-ram`` + - The contents of the Shadow RAM, which is loaded from the beginning + of the flash. Although the contents are primarily from the flash, + this area also contains data generated during device boot which is + not stored in flash. + * - ``device-caps`` + - The contents of the device firmware's capabilities buffer. Useful to + determine the current state and configuration of the device. + +Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a +snapshot. The ``device-caps`` region requires a snapshot as the contents are +sent by firmware and can't be split into separate reads. + +Users can request an immediate capture of a snapshot for all three regions +via the ``DEVLINK_CMD_REGION_NEW`` command. + +.. code:: shell + + $ devlink region show + pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1 + pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10 + + $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1 + + $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 + 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc + 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 + + $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + + $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1 diff --git a/Documentation/networking/devlink/kvaser_pciefd.rst b/Documentation/networking/devlink/kvaser_pciefd.rst new file mode 100644 index 000000000000..075edd2a508a --- /dev/null +++ b/Documentation/networking/devlink/kvaser_pciefd.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================= +kvaser_pciefd devlink support +============================= + +This document describes the devlink features implemented by the +``kvaser_pciefd`` device driver. + +Info versions +============= + +The ``kvaser_pciefd`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``fw`` + - running + - Version of the firmware running on the device. Also available + through ``ethtool -i`` as ``firmware-version``. diff --git a/Documentation/networking/devlink/kvaser_usb.rst b/Documentation/networking/devlink/kvaser_usb.rst new file mode 100644 index 000000000000..403db3766cb4 --- /dev/null +++ b/Documentation/networking/devlink/kvaser_usb.rst @@ -0,0 +1,33 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +kvaser_usb devlink support +========================== + +This document describes the devlink features implemented by the +``kvaser_usb`` device driver. + +Info versions +============= + +The ``kvaser_usb`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``fw`` + - running + - Version of the firmware running on the device. Also available + through ``ethtool -i`` as ``firmware-version``. + * - ``board.rev`` + - fixed + - The device hardware revision. + * - ``board.id`` + - fixed + - The device EAN (product number). + * - ``serial_number`` + - fixed + - The device serial number. diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 4e4b97f7971a..7febe0aecd53 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -14,8 +14,24 @@ Parameters * - Name - Mode + - Validation * - ``enable_roce`` - driverinit + - Type: Boolean + + If the device supports RoCE disablement, RoCE enablement state controls + device support for RoCE capability. Otherwise, the control occurs in the + driver stack. When RoCE is disabled at the driver level, only raw + ethernet QPs are supported. + * - ``io_eq_size`` + - driverinit + - The range is between 64 and 4096. + * - ``event_eq_size`` + - driverinit + - The range is between 64 and 4096. + * - ``max_macs`` + - driverinit + - The range is between 1 and 2^31. Only power of 2 values are supported. The ``mlx5`` driver also implements the following driver-specific parameters. @@ -37,12 +53,69 @@ parameters. * ``smfs`` Software managed flow steering. In SMFS mode, the HW steering entities are created and manage through the driver without firmware intervention. + * ``hmfs`` Hardware managed flow steering. In HMFS mode, the driver + is configuring steering rules directly to the HW using Work Queues with + a special new type of WQE (Work Queue Element). + + SMFS mode is faster and provides better rule insertion rate compared to + default DMFS mode. * - ``fdb_large_groups`` - u32 - driverinit - Control the number of large groups (size > 1) in the FDB table. * The default value is 15, and the range is between 1 and 1024. + * - ``esw_multiport`` + - Boolean + - runtime + - Control MultiPort E-Switch shared fdb mode. + + An experimental mode where a single E-Switch is used and all the vports + and physical ports on the NIC are connected to it. + + An example is to send traffic from a VF that is created on PF0 to an + uplink that is natively associated with the uplink of PF1 + + Note: Future devices, ConnectX-8 and onward, will eventually have this + as the default to allow forwarding between all NIC ports in a single + E-switch environment and the dual E-switch mode will likely get + deprecated. + + Default: disabled + * - ``esw_port_metadata`` + - Boolean + - runtime + - When applicable, disabling eswitch metadata can increase packet rate up + to 20% depending on the use case and packet sizes. + + Eswitch port metadata state controls whether to internally tag packets + with metadata. Metadata tagging must be enabled for multi-port RoCE, + failover between representors and stacked devices. By default metadata is + enabled on the supported devices in E-switch. Metadata is applicable only + for E-switch in switchdev mode and users may disable it when NONE of the + below use cases will be in use: + 1. HCA is in Dual/multi-port RoCE mode. + 2. VF/SF representor bonding (Usually used for Live migration) + 3. Stacked devices + + When metadata is disabled, the above use cases will fail to initialize if + users try to enable them. + + Note: Setting this parameter does not take effect immediately. Setting + must happen in legacy mode and eswitch port metadata takes effect after + enabling switchdev mode. + * - ``hairpin_num_queues`` + - u32 + - driverinit + - We refer to a TC NIC rule that involves forwarding as "hairpin". + Hairpin queues are mlx5 hardware specific implementation for hardware + forwarding of such packets. + + Control the number of hairpin queues. + * - ``hairpin_queue_size`` + - u32 + - driverinit + - Control the size (in packets) of the hairpin queues. The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` @@ -63,3 +136,165 @@ The ``mlx5`` driver reports the following versions * - ``fw.version`` - stored, running - Three digit major.minor.subminor firmware version number. + +Health reporters +================ + +tx reporter +----------- +The tx reporter is responsible for reporting and recovering of the following three error scenarios: + +- tx timeout + Report on kernel tx timeout detection. + Recover by searching lost interrupts. +- tx error completion + Report on error tx completion. + Recover by flushing the tx queue and reset it. +- tx PTP port timestamping CQ unhealthy + Report too many CQEs never delivered on port ts CQ. + Recover by flushing and re-creating all PTP channels. + +tx reporter also support on demand diagnose callback, on which it provides +real time information of its send queues status. + +User commands examples: + +- Diagnose send queues status:: + + $ devlink health diagnose pci/0000:82:00.0 reporter tx + +.. note:: + This command has valid output only when interface is up, otherwise the command has empty output. + +- Show number of tx errors indicated, number of recover flows ended successfully, + is autorecover enabled and graceful period from last recover:: + + $ devlink health show pci/0000:82:00.0 reporter tx + +rx reporter +----------- +The rx reporter is responsible for reporting and recovering of the following two error scenarios: + +- rx queues' initialization (population) timeout + Population of rx queues' descriptors on ring initialization is done + in napi context via triggering an irq. In case of a failure to get + the minimum amount of descriptors, a timeout would occur, and + descriptors could be recovered by polling the EQ (Event Queue). +- rx completions with errors (reported by HW on interrupt context) + Report on rx completion error. + Recover (if needed) by flushing the related queue and reset it. + +rx reporter also supports on demand diagnose callback, on which it +provides real time information of its receive queues' status. + +- Diagnose rx queues' status and corresponding completion queue:: + + $ devlink health diagnose pci/0000:82:00.0 reporter rx + +.. note:: + This command has valid output only when interface is up. Otherwise, the command has empty output. + +- Show number of rx errors indicated, number of recover flows ended successfully, + is autorecover enabled, and graceful period from last recover:: + + $ devlink health show pci/0000:82:00.0 reporter rx + +fw reporter +----------- +The fw reporter implements `diagnose` and `dump` callbacks. +It follows symptoms of fw error such as fw syndrome by triggering +fw core dump and storing it into the dump buffer. +The fw reporter diagnose command can be triggered any time by the user to check +current fw status. + +User commands examples: + +- Check fw heath status:: + + $ devlink health diagnose pci/0000:82:00.0 reporter fw + +- Read FW core dump if already stored or trigger new one:: + + $ devlink health dump show pci/0000:82:00.0 reporter fw + +.. note:: + This command can run only on the PF which has fw tracer ownership, + running it on other PF or any VF will return "Operation not permitted". + +fw fatal reporter +----------------- +The fw fatal reporter implements `dump` and `recover` callbacks. +It follows fatal errors indications by CR-space dump and recover flow. +The CR-space dump uses vsc interface which is valid even if the FW command +interface is not functional, which is the case in most FW fatal errors. +The recover function runs recover flow which reloads the driver and triggers fw +reset if needed. +On firmware error, the health buffer is dumped into the dmesg. The log +level is derived from the error's severity (given in health buffer). + +User commands examples: + +- Run fw recover flow manually:: + + $ devlink health recover pci/0000:82:00.0 reporter fw_fatal + +- Read FW CR-space dump if already stored or trigger new one:: + + $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal + +.. note:: + This command can run only on PF. + +vnic reporter +------------- +The vnic reporter implements only the `diagnose` callback. +It is responsible for querying the vnic diagnostic counters from fw and displaying +them in realtime. + +Description of the vnic counters: + +- total_error_queues + number of queues in an error state due to + an async error or errored command. +- send_queue_priority_update_flow + number of QP/SQ priority/SL update events. +- cq_overrun + number of times CQ entered an error state due to an overflow. +- async_eq_overrun + number of times an EQ mapped to async events was overrun. +- comp_eq_overrun + number of times an EQ mapped to completion events was + overrun. +- quota_exceeded_command + number of commands issued and failed due to quota exceeded. +- invalid_command + number of commands issued and failed dues to any reason other than quota + exceeded. +- nic_receive_steering_discard + number of packets that completed RX flow + steering but were discarded due to a mismatch in flow table. +- generated_pkt_steering_fail + number of packets generated by the VNIC experiencing unexpected steering + failure (at any point in steering flow). +- handled_pkt_steering_fail + number of packets handled by the VNIC experiencing unexpected steering + failure (at any point in steering flow owned by the VNIC, including the FDB + for the eswitch owner). +- icm_consumption + amount of Interconnect Host Memory (ICM) consumed by the vnic in + granularity of 4KB. ICM is host memory allocated by SW upon HCA request + and is used for storing data structures that control HCA operation. + +User commands examples: + +- Diagnose PF/VF vnic counters:: + + $ devlink health diagnose pci/0000:82:00.1 reporter vnic + +- Diagnose representor vnic counters (performed by supplying devlink port of the + representor, which can be obtained via devlink port command):: + + $ devlink health diagnose pci/0000:82:00.1/65537 reporter vnic + +.. note:: + This command can run over all interfaces such as PF/VF and representor ports. diff --git a/Documentation/networking/devlink/mlxsw.rst b/Documentation/networking/devlink/mlxsw.rst index cf857cb4ba8f..433962225bd4 100644 --- a/Documentation/networking/devlink/mlxsw.rst +++ b/Documentation/networking/devlink/mlxsw.rst @@ -58,6 +58,30 @@ The ``mlxsw`` driver reports the following versions - running - Three digit firmware version +Line card auxiliary device info versions +======================================== + +The ``mlxsw`` driver reports the following versions for line card auxiliary device + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``hw.revision`` + - fixed + - The hardware revision for this line card + * - ``ini.version`` + - running + - Version of line card INI loaded + * - ``fw.psid`` + - fixed + - Line card device PSID + * - ``fw.version`` + - running + - Three digit firmware version of line card device + Driver-specific Traps ===================== diff --git a/Documentation/networking/devlink/netdevsim.rst b/Documentation/networking/devlink/netdevsim.rst index 2a266b7e7b38..3932004eae82 100644 --- a/Documentation/networking/devlink/netdevsim.rst +++ b/Documentation/networking/devlink/netdevsim.rst @@ -46,7 +46,7 @@ Resources ========= The ``netdevsim`` driver exposes resources to control the number of FIB -entries and FIB rule entries that the driver will allow. +entries, FIB rule entries and nexthops that the driver will allow. .. code:: shell @@ -54,8 +54,35 @@ entries and FIB rule entries that the driver will allow. $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib-rules size 16 $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib size 64 $ devlink resource set netdevsim/netdevsim0 path /IPv6/fib-rules size 16 + $ devlink resource set netdevsim/netdevsim0 path /nexthops size 16 $ devlink dev reload netdevsim/netdevsim0 +Rate objects +============ + +The ``netdevsim`` driver supports rate objects management, which includes: + +- registering/unregistering leaf rate objects per VF devlink port; +- creation/deletion node rate objects; +- setting tx_share and tx_max rate values for any rate object type; +- setting parent node for any rate object type. + +Rate nodes and their parameters are exposed in ``netdevsim`` debugfs in RO mode. +For example created rate node with name ``some_group``: + +.. code:: shell + + $ ls /sys/kernel/debug/netdevsim/netdevsim0/rate_groups/some_group + rate_parent tx_max tx_share + +Same parameters are exposed for leaf objects in corresponding ports directories. +For ex.: + +.. code:: shell + + $ ls /sys/kernel/debug/netdevsim/netdevsim0/ports/1 + dev ethtool rate_parent tx_max tx_share + Driver-specific Traps ===================== @@ -68,5 +95,5 @@ Driver-specific Traps * - ``fid_miss`` - ``exception`` - When a packet enters the device it is classified to a filtering - indentifier (FID) based on the ingress port and VLAN. This trap is used + identifier (FID) based on the ingress port and VLAN. This trap is used to trap packets for which a FID could not be found diff --git a/Documentation/networking/devlink/nfp.rst b/Documentation/networking/devlink/nfp.rst index a1717db0dfcc..3093642bdae4 100644 --- a/Documentation/networking/devlink/nfp.rst +++ b/Documentation/networking/devlink/nfp.rst @@ -32,7 +32,7 @@ The ``nfp`` driver reports the following versions - Description * - ``board.id`` - fixed - - Part number identifying the board design + - Identifier of the board design * - ``board.rev`` - fixed - Revision of the board design @@ -42,6 +42,9 @@ The ``nfp`` driver reports the following versions * - ``board.model`` - fixed - Model name of the board design + * - ``board.part_number`` + - fixed + - Part number of the board and its components * - ``fw.bundle_id`` - stored, running - Firmware bundle id diff --git a/Documentation/networking/devlink/octeontx2.rst b/Documentation/networking/devlink/octeontx2.rst new file mode 100644 index 000000000000..84206537aedb --- /dev/null +++ b/Documentation/networking/devlink/octeontx2.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================= +octeontx2 devlink support +========================= + +This document describes the devlink features implemented by the ``octeontx2 AF, PF and VF`` +device drivers. + +Parameters +========== + +The ``octeontx2 PF and VF`` drivers implement the following driver-specific parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``mcam_count`` + - u16 + - runtime + - Select number of match CAM entries to be allocated for an interface. + The same is used for ntuple filters of the interface. Supported by + PF and VF drivers. + +The ``octeontx2 AF`` driver implements the following driver-specific parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``dwrr_mtu`` + - u32 + - runtime + - Use to set the quantum which hardware uses for scheduling among transmit queues. + Hardware uses weighted DWRR algorithm to schedule among all transmit queues. + * - ``npc_mcam_high_zone_percent`` + - u8 + - runtime + - Use to set the number of high priority zone entries in NPC MCAM that can be allocated + by a user, out of the three priority zone categories high, mid and low. + * - ``npc_def_rule_cntr`` + - bool + - runtime + - Use to enable or disable hit counters for the default rules in NPC MCAM. + Its not guaranteed that counters gets enabled and mapped to all the default rules, + since the counters are scarce and driver follows a best effort approach. + The default rule serves as the primary packet steering rule for a specific PF or VF, + based on its DMAC address which is installed by AF driver as part of its initialization. + Sample command to read hit counters for default rule from debugfs is as follows, + cat /sys/kernel/debug/cn10k/npc/mcam_rules + * - ``nix_maxlf`` + - u16 + - runtime + - Use to set the maximum number of LFs in NIX hardware block. This would be useful + to increase the availability of default resources allocated to enabled LFs like + MCAM entries for example. + +The ``octeontx2 PF`` driver implements the following driver-specific parameters. + +.. list-table:: Driver-specific parameters implemented + :widths: 5 5 5 85 + + * - Name + - Type + - Mode + - Description + * - ``unicast_filter_count`` + - u8 + - runtime + - Set the maximum number of unicast filters that can be programmed for + the device. This can be used to achieve better device resource + utilization, avoiding over consumption of unused MCAM table entries. diff --git a/Documentation/networking/devlink/prestera.rst b/Documentation/networking/devlink/prestera.rst new file mode 100644 index 000000000000..96b1124e614b --- /dev/null +++ b/Documentation/networking/devlink/prestera.rst @@ -0,0 +1,141 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================== +prestera devlink support +======================== + +This document describes the devlink features implemented by the ``prestera`` +device driver. + +Driver-specific Traps +===================== + +.. list-table:: List of Driver-specific Traps Registered by ``prestera`` + :widths: 5 5 90 + + * - Name + - Type + - Description +.. list-table:: List of Driver-specific Traps Registered by ``prestera`` + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``arp_bc`` + - ``trap`` + - Traps ARP broadcast packets (both requests/responses) + * - ``is_is`` + - ``trap`` + - Traps IS-IS packets + * - ``ospf`` + - ``trap`` + - Traps OSPF packets + * - ``ip_bc_mac`` + - ``trap`` + - Traps IPv4 packets with broadcast DA Mac address + * - ``stp`` + - ``trap`` + - Traps STP BPDU + * - ``lacp`` + - ``trap`` + - Traps LACP packets + * - ``lldp`` + - ``trap`` + - Traps LLDP packets + * - ``router_mc`` + - ``trap`` + - Traps multicast packets + * - ``vrrp`` + - ``trap`` + - Traps VRRP packets + * - ``dhcp`` + - ``trap`` + - Traps DHCP packets + * - ``mtu_error`` + - ``trap`` + - Traps (exception) packets that exceeded port's MTU + * - ``mac_to_me`` + - ``trap`` + - Traps packets with switch-port's DA Mac address + * - ``ttl_error`` + - ``trap`` + - Traps (exception) IPv4 packets whose TTL exceeded + * - ``ipv4_options`` + - ``trap`` + - Traps (exception) packets due to the malformed IPV4 header options + * - ``ip_default_route`` + - ``trap`` + - Traps packets that have no specific IP interface (IP to me) and no forwarding prefix + * - ``local_route`` + - ``trap`` + - Traps packets that have been send to one of switch IP interfaces addresses + * - ``ipv4_icmp_redirect`` + - ``trap`` + - Traps (exception) IPV4 ICMP redirect packets + * - ``arp_response`` + - ``trap`` + - Traps ARP replies packets that have switch-port's DA Mac address + * - ``acl_code_0`` + - ``trap`` + - Traps packets that have ACL priority set to 0 (tc pref 0) + * - ``acl_code_1`` + - ``trap`` + - Traps packets that have ACL priority set to 1 (tc pref 1) + * - ``acl_code_2`` + - ``trap`` + - Traps packets that have ACL priority set to 2 (tc pref 2) + * - ``acl_code_3`` + - ``trap`` + - Traps packets that have ACL priority set to 3 (tc pref 3) + * - ``acl_code_4`` + - ``trap`` + - Traps packets that have ACL priority set to 4 (tc pref 4) + * - ``acl_code_5`` + - ``trap`` + - Traps packets that have ACL priority set to 5 (tc pref 5) + * - ``acl_code_6`` + - ``trap`` + - Traps packets that have ACL priority set to 6 (tc pref 6) + * - ``acl_code_7`` + - ``trap`` + - Traps packets that have ACL priority set to 7 (tc pref 7) + * - ``ipv4_bgp`` + - ``trap`` + - Traps IPv4 BGP packets + * - ``ssh`` + - ``trap`` + - Traps SSH packets + * - ``telnet`` + - ``trap`` + - Traps Telnet packets + * - ``icmp`` + - ``trap`` + - Traps ICMP packets + * - ``rxdma_drop`` + - ``drop`` + - Drops packets (RxDMA) due to the lack of ingress buffers etc. + * - ``port_no_vlan`` + - ``drop`` + - Drops packets due to faulty-configured network or due to internal bug (config issue). + * - ``local_port`` + - ``drop`` + - Drops packets whose decision (FDB entry) is to bridge packet back to the incoming port/trunk. + * - ``invalid_sa`` + - ``drop`` + - Drops packets with multicast source MAC address. + * - ``illegal_ip_addr`` + - ``drop`` + - Drops packets with illegal SIP/DIP multicast/unicast addresses. + * - ``illegal_ipv4_hdr`` + - ``drop`` + - Drops packets with illegal IPV4 header. + * - ``ip_uc_dip_da_mismatch`` + - ``drop`` + - Drops packets with destination MAC being unicast, but destination IP address being multicast. + * - ``ip_sip_is_zero`` + - ``drop`` + - Drops packets with zero (0) IPV4 source address. + * - ``met_red`` + - ``drop`` + - Drops non-conforming packets (dropped by Ingress policer, metering drop), e.g. packet rate exceeded configured bandwidth. diff --git a/Documentation/networking/devlink/sfc.rst b/Documentation/networking/devlink/sfc.rst new file mode 100644 index 000000000000..0398d59ea184 --- /dev/null +++ b/Documentation/networking/devlink/sfc.rst @@ -0,0 +1,71 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================== +sfc devlink support +=================== + +This document describes the devlink features implemented by the ``sfc`` +device driver for the ef10 and ef100 devices. + +Info versions +============= + +The ``sfc`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 90 + + * - Name + - Type + - Description + * - ``fw.bundle_id`` + - stored + - Version of the firmware "bundle" image that was last used to update + multiple components. + * - ``fw.mgmt.suc`` + - running + - For boards where the management function is split between multiple + control units, this is the SUC control unit's firmware version. + * - ``fw.mgmt.cmc`` + - running + - For boards where the management function is split between multiple + control units, this is the CMC control unit's firmware version. + * - ``fpga.rev`` + - running + - FPGA design revision. + * - ``fpga.app`` + - running + - Datapath programmable logic version. + * - ``fw.app`` + - running + - Datapath software/microcode/firmware version. + * - ``coproc.boot`` + - running + - SmartNIC application co-processor (APU) first stage boot loader version. + * - ``coproc.uboot`` + - running + - SmartNIC application co-processor (APU) co-operating system loader version. + * - ``coproc.main`` + - running + - SmartNIC application co-processor (APU) main operating system version. + * - ``coproc.recovery`` + - running + - SmartNIC application co-processor (APU) recovery operating system version. + * - ``fw.exprom`` + - running + - Expansion ROM version. For boards where the expansion ROM is split between + multiple images (e.g. PXE and UEFI), this is the specifically the PXE boot + ROM version. + * - ``fw.uefi`` + - running + - UEFI driver version (No UNDI support). + +Flash Update +============ + +The ``sfc`` driver implements support for flash update using the +``devlink-flash`` interface. It supports updating the device flash using a +combined flash image ("bundle") that contains multiple components (on ef10, +typically ``fw.mgmt``, ``fw.app``, ``fw.exprom`` and ``fw.uefi``). + +The driver does not support any overwrite mask flags. diff --git a/Documentation/networking/devlink/sja1105.rst b/Documentation/networking/devlink/sja1105.rst deleted file mode 100644 index e2679c274085..000000000000 --- a/Documentation/networking/devlink/sja1105.rst +++ /dev/null @@ -1,49 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -======================= -sja1105 devlink support -======================= - -This document describes the devlink features implemented -by the ``sja1105`` device driver. - -Parameters -========== - -.. list-table:: Driver-specific parameters implemented - :widths: 5 5 5 85 - - * - Name - - Type - - Mode - - Description - * - ``best_effort_vlan_filtering`` - - Boolean - - runtime - - Allow plain ETH_P_8021Q headers to be used as DSA tags. - - Benefits: - - - Can terminate untagged traffic over switch net - devices even when enslaved to a bridge with - vlan_filtering=1. - - Can terminate VLAN-tagged traffic over switch net - devices even when enslaved to a bridge with - vlan_filtering=1, with some constraints (no more than - 7 non-pvid VLANs per user port). - - Can do QoS based on VLAN PCP and VLAN membership - admission control for autonomously forwarded frames - (regardless of whether they can be terminated on the - CPU or not). - - Drawbacks: - - - User cannot use VLANs in range 1024-3071. If the - switch receives frames with such VIDs, it will - misinterpret them as DSA tags. - - Switch uses Shared VLAN Learning (FDB lookup uses - only DMAC as key). - - When VLANs span cross-chip topologies, the total - number of permitted VLANs may be less than 7 per - port, due to a maximum number of 32 VLAN retagging - rules per switch. diff --git a/Documentation/networking/devlink/zl3073x.rst b/Documentation/networking/devlink/zl3073x.rst new file mode 100644 index 000000000000..4b6cfaf38643 --- /dev/null +++ b/Documentation/networking/devlink/zl3073x.rst @@ -0,0 +1,51 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================= +zl3073x devlink support +======================= + +This document describes the devlink features implemented by the ``zl3073x`` +device driver. + +Parameters +========== + +.. list-table:: Generic parameters implemented + :widths: 5 5 90 + + * - Name + - Mode + - Notes + * - ``clock_id`` + - driverinit + - Set the clock ID that is used by the driver for registering DPLL devices + and pins. + +Info versions +============= + +The ``zl3073x`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 5 90 + + * - Name + - Type + - Example + - Description + * - ``asic.id`` + - fixed + - 1E94 + - Chip identification number + * - ``asic.rev`` + - fixed + - 300 + - Chip revision number + * - ``fw`` + - running + - 7006 + - Firmware version number + * - ``custom_cfg`` + - running + - 1.3.0.1 + - Device configuration version customized by OEM |