aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation/networking/devlink/devlink-health.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/devlink/devlink-health.rst')
-rw-r--r--Documentation/networking/devlink/devlink-health.rst40
1 files changed, 32 insertions, 8 deletions
diff --git a/Documentation/networking/devlink/devlink-health.rst b/Documentation/networking/devlink/devlink-health.rst
index 0c99b11f05f9..e0b8cfed610a 100644
--- a/Documentation/networking/devlink/devlink-health.rst
+++ b/Documentation/networking/devlink/devlink-health.rst
@@ -24,7 +24,7 @@ attributes of the health reporting and recovery procedures.
The ``devlink`` health reporter:
Device driver creates a "health reporter" per each error/health type.
-Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error)
+Error/Health type can be a known/generic (e.g. PCI error, fw error, rx/tx error)
or unknown (driver specific).
For each registered health reporter a driver can issue error/health reports
asynchronously. All health reports handling is done by ``devlink``.
@@ -33,7 +33,7 @@ Device driver can provide specific callbacks for each "health reporter", e.g.:
* Recovery procedures
* Diagnostics procedures
* Object dump procedures
- * OOB initial parameters
+ * Out Of Box initial parameters
Different parts of the driver can register different types of health reporters
with different handlers.
@@ -46,11 +46,31 @@ Once an error is reported, devlink health will perform the following actions:
* A log is being send to the kernel trace events buffer
* Health status and statistics are being updated for the reporter instance
* Object dump is being taken and saved at the reporter instance (as long as
- there is no other dump which is already stored)
+ auto-dump is set and there is no other dump which is already stored)
* Auto recovery attempt is being done. Depends on:
+
- Auto-recovery configuration
- Grace period vs. time passed since last recover
+Devlink formatted message
+=========================
+
+To handle devlink health diagnose and health dump requests, devlink creates a
+formatted message structure ``devlink_fmsg`` and send it to the driver's callback
+to fill the data in using the devlink fmsg API.
+
+Devlink fmsg is a mechanism to pass descriptors between drivers and devlink, in
+json-like format. The API allows the driver to add nested attributes such as
+object, object pair and value array, in addition to attributes such as name and
+value.
+
+Driver should use this API to fill the fmsg context in a format which will be
+translated by the devlink to the netlink message later. When it needs to send
+the data using SKBs to the netlink layer, it fragments the data between
+different SKBs. In order to do this fragmentation, it uses virtual nests
+attributes, to avoid actual nesting use which cannot be divided between
+different SKBs.
+
User Interface
==============
@@ -72,14 +92,18 @@ via ``devlink``, e.g per error type (per health reporter):
* - ``DEVLINK_CMD_HEALTH_REPORTER_SET``
- Allows reporter-related configuration setting.
* - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER``
- - Triggers a reporter's recovery procedure.
+ - Triggers reporter's recovery procedure.
+ * - ``DEVLINK_CMD_HEALTH_REPORTER_TEST``
+ - Triggers a fake health event on the reporter. The effects of the test
+ event in terms of recovery flow should follow closely that of a real
+ event.
* - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE``
- - Retrieves diagnostics data from a reporter on a device.
+ - Retrieves current device state related to the reporter.
* - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET``
- Retrieves the last stored dump. Devlink health
- saves a single dump. If an dump is not already stored by the devlink
+ saves a single dump. If an dump is not already stored by devlink
for this reporter, devlink generates a new dump.
- dump output is defined by the reporter.
+ Dump output is defined by the reporter.
* - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR``
- Clears the last saved dump file for the specified reporter.
@@ -93,7 +117,7 @@ The following diagram provides a general overview of ``devlink-health``::
+--------------------------+
|request for ops
|(diagnose,
- mlx5_core devlink |recover,
+ driver devlink |recover,
|dump)
+--------+ +--------------------------+
| | | reporter| |