diff options
Diffstat (limited to 'Documentation/networking/devlink/devlink-health.rst')
-rw-r--r-- | Documentation/networking/devlink/devlink-health.rst | 40 |
1 files changed, 32 insertions, 8 deletions
diff --git a/Documentation/networking/devlink/devlink-health.rst b/Documentation/networking/devlink/devlink-health.rst index 0c99b11f05f9..e0b8cfed610a 100644 --- a/Documentation/networking/devlink/devlink-health.rst +++ b/Documentation/networking/devlink/devlink-health.rst @@ -24,7 +24,7 @@ attributes of the health reporting and recovery procedures. The ``devlink`` health reporter: Device driver creates a "health reporter" per each error/health type. -Error/Health type can be a known/generic (eg pci error, fw error, rx/tx error) +Error/Health type can be a known/generic (e.g. PCI error, fw error, rx/tx error) or unknown (driver specific). For each registered health reporter a driver can issue error/health reports asynchronously. All health reports handling is done by ``devlink``. @@ -33,7 +33,7 @@ Device driver can provide specific callbacks for each "health reporter", e.g.: * Recovery procedures * Diagnostics procedures * Object dump procedures - * OOB initial parameters + * Out Of Box initial parameters Different parts of the driver can register different types of health reporters with different handlers. @@ -46,11 +46,31 @@ Once an error is reported, devlink health will perform the following actions: * A log is being send to the kernel trace events buffer * Health status and statistics are being updated for the reporter instance * Object dump is being taken and saved at the reporter instance (as long as - there is no other dump which is already stored) + auto-dump is set and there is no other dump which is already stored) * Auto recovery attempt is being done. Depends on: + - Auto-recovery configuration - Grace period vs. time passed since last recover +Devlink formatted message +========================= + +To handle devlink health diagnose and health dump requests, devlink creates a +formatted message structure ``devlink_fmsg`` and send it to the driver's callback +to fill the data in using the devlink fmsg API. + +Devlink fmsg is a mechanism to pass descriptors between drivers and devlink, in +json-like format. The API allows the driver to add nested attributes such as +object, object pair and value array, in addition to attributes such as name and +value. + +Driver should use this API to fill the fmsg context in a format which will be +translated by the devlink to the netlink message later. When it needs to send +the data using SKBs to the netlink layer, it fragments the data between +different SKBs. In order to do this fragmentation, it uses virtual nests +attributes, to avoid actual nesting use which cannot be divided between +different SKBs. + User Interface ============== @@ -72,14 +92,18 @@ via ``devlink``, e.g per error type (per health reporter): * - ``DEVLINK_CMD_HEALTH_REPORTER_SET`` - Allows reporter-related configuration setting. * - ``DEVLINK_CMD_HEALTH_REPORTER_RECOVER`` - - Triggers a reporter's recovery procedure. + - Triggers reporter's recovery procedure. + * - ``DEVLINK_CMD_HEALTH_REPORTER_TEST`` + - Triggers a fake health event on the reporter. The effects of the test + event in terms of recovery flow should follow closely that of a real + event. * - ``DEVLINK_CMD_HEALTH_REPORTER_DIAGNOSE`` - - Retrieves diagnostics data from a reporter on a device. + - Retrieves current device state related to the reporter. * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET`` - Retrieves the last stored dump. Devlink health - saves a single dump. If an dump is not already stored by the devlink + saves a single dump. If an dump is not already stored by devlink for this reporter, devlink generates a new dump. - dump output is defined by the reporter. + Dump output is defined by the reporter. * - ``DEVLINK_CMD_HEALTH_REPORTER_DUMP_CLEAR`` - Clears the last saved dump file for the specified reporter. @@ -93,7 +117,7 @@ The following diagram provides a general overview of ``devlink-health``:: +--------------------------+ |request for ops |(diagnose, - mlx5_core devlink |recover, + driver devlink |recover, |dump) +--------+ +--------------------------+ | | | reporter| | |