diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-01-28 16:02:33 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-01-28 16:02:33 -0800 |
commit | bd2463ac7d7ec51d432f23bf0e893fb371a908cd (patch) | |
tree | 3da32c23be83adb9d9bda7e51b51fa39f69f2447 /Documentation/networking/devlink/devlink-health.rst | |
parent | Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 (diff) | |
parent | net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC (diff) | |
download | linux-bd2463ac7d7ec51d432f23bf0e893fb371a908cd.tar.xz linux-bd2463ac7d7ec51d432f23bf0e893fb371a908cd.zip |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
1) Add WireGuard
2) Add HE and TWT support to ath11k driver, from John Crispin.
3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.
4) Add variable window congestion control to TIPC, from Jon Maloy.
5) Add BCM84881 PHY driver, from Russell King.
6) Start adding netlink support for ethtool operations, from Michal
Kubecek.
7) Add XDP drop and TX action support to ena driver, from Sameeh
Jubran.
8) Add new ipv4 route notifications so that mlxsw driver does not have
to handle identical routes itself. From Ido Schimmel.
9) Add BPF dynamic program extensions, from Alexei Starovoitov.
10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.
11) Add support for macsec HW offloading, from Antoine Tenart.
12) Add initial support for MPTCP protocol, from Christoph Paasch,
Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.
13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
Cherian, and others.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
udp: segment looped gso packets correctly
netem: change mailing list
qed: FW 8.42.2.0 debug features
qed: rt init valid initialization changed
qed: Debug feature: ilt and mdump
qed: FW 8.42.2.0 Add fw overlay feature
qed: FW 8.42.2.0 HSI changes
qed: FW 8.42.2.0 iscsi/fcoe changes
qed: Add abstraction for different hsi values per chip
qed: FW 8.42.2.0 Additional ll2 type
qed: Use dmae to write to widebus registers in fw_funcs
qed: FW 8.42.2.0 Parser offsets modified
qed: FW 8.42.2.0 Queue Manager changes
qed: FW 8.42.2.0 Expose new registers and change windows
qed: FW 8.42.2.0 Internal ram offsets modifications
MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
Documentation: net: octeontx2: Add RVU HW and drivers overview
octeontx2-pf: ethtool RSS config support
octeontx2-pf: Add basic ethtool support
...
Diffstat (limited to 'Documentation/networking/devlink/devlink-health.rst')
-rw-r--r-- | Documentation/networking/devlink/devlink-health.rst | 114 |
1 files changed, 114 insertions, 0 deletions
diff --git a/Documentation/networking/devlink/devlink-health.rst b/Documentation/networking/devlink/devlink-health.rst new file mode 100644 index 000000000000..0c99b11f05f9 --- /dev/null +++ b/Documentation/networking/devlink/devlink-health.rst @@ -0,0 +1,114 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============== +Devlink Health +============== + +Background +========== + +The ``devlink`` health mechanism is targeted for Real Time Alerting, in +order to know when something bad happened to a PCI device. + + * Provide alert debug information. + * Self healing. + * If problem needs vendor support, provide a way to gather all needed + debugging information. + +Overview +======== + +The main idea is to unify and centralize driver health reports in the +generic ``devlink`` instance and allow the user to set different +attributes of the health reporting and recovery procedures. + +The ``devlink`` health reporter: +Device driver creates a "health reporter" per each error/health type. +Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) +or unknown (driver specific). +For each registered health reporter a driver can issue error/health reports +asynchronously. All health reports handling is done by ``devlink``. +Device driver can provide specific callbacks for each "health reporter", e.g.: + + * Recovery procedures + * Diagnostics procedures + * Object dump procedures + * OOB initial parameters + +Different parts of the driver can register different types of health reporters +with different handlers. + +Actions +======= + +Once an error is reported, devlink health will perform the following actions: + + * A log is being send to the kernel trace events buffer + * Health status and statistics are being updated for the reporter instance + * Object dump is being taken and saved at the reporter instance (as long as + there is no other dump which is already stored) + * Auto recovery attempt is being done. Depends on: + - Auto-recovery configuration + - Grace period vs. time passed since last recover + +User Interface +============== + +User can access/change each reporter's parameters and driver specific callbacks +via ``devlink``, e.g per error type (per health reporter): + + * Configure reporter's generic parameters (like: disable/enable auto recovery) + * Invoke recovery procedure + * Run diagnostics + * Object dump + +.. list-table:: List of devlink health interfaces + :widths: 10 90 + + * - Name + - Description + * - ``DEVLINK_CMD_HEALTH_REPORTER_GET`` + - Retrieves status and configuration info per DEV and reporter. + * - ``DEVLINK_CMD_HEALTH_REPORTER_SET`` + - Allows reporter-related configuration setting. + * - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER`` + - Triggers a reporter's recovery procedure. + * - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE`` + - Retrieves diagnostics data from a reporter on a device. + * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET`` + - Retrieves the last stored dump. Devlink health + saves a single dump. If an dump is not already stored by the devlink + for this reporter, devlink generates a new dump. + dump output is defined by the reporter. + * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR`` + - Clears the last saved dump file for the specified reporter. + +The following diagram provides a general overview of ``devlink-health``:: + + netlink + +--------------------------+ + | | + | + | + | | | + +--------------------------+ + |request for ops + |(diagnose, + mlx5_core devlink |recover, + |dump) + +--------+ +--------------------------+ + | | | reporter| | + | | | +---------v----------+ | + | | ops execution | | | | + | <----------------------------------+ | | + | | | | | | + | | | + ^------------------+ | + | | | | request for ops | + | | | | (recover, dump) | + | | | | | + | | | +-+------------------+ | + | | health report | | health handler | | + | +-------------------------------> | | + | | | +--------------------+ | + | | health reporter create | | + | +----------------------------> | + +--------+ +--------------------------+ |