aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-03-01net/mlx5e: Declare mlx5e_tx_reporter_recover_from_ctx as staticEran Ben Elisha1-1/+1
Function mlx5e_tx_reporter_recover_from_ctx is only used within mlx5e tx reporter, move it to be statically declared in en/reporter_tx.c. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-22net/mlx5e: Fix mlx5e_tx_reporter_create return valueEran Ben Elisha1-1/+1
If reporter is ERR_PTR or NULL, error code shall be returned. At all other cases it shall return success. Fix that. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-22net/mlx5e: Fix return status of TX reporter timeout recoverEran Ben Elisha1-1/+1
In case of lost interrupt recover, we shall return success. Fix that. Fixes: 7d91126b1aea ("net/mlx5e: Add tx timeout support for mlx5e tx reporter") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Maria Pasechnik <mariap@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-22net/mlx5e: Re-add support for TX timeout when TX reporter is not validEran Ben Elisha1-2/+14
When TX reporter was introduced, it took ownership over TX timeout error handling. this introduced a regression in case TX reporter is not valid (NET_DEVLINK is not set, or devlink_health_reporter_create failure). Fix mlx5e_tx_reporter_timeout function so it can be called at all times. In addition, remove a warning print that indicates that a TX timeout won't be handled in case of no valid TX reporter. Fixes: 7d91126b1aea ("net/mlx5e: Add tx timeout support for mlx5e tx reporter") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-22net/mlx5e: Fix warn print in case of TX reporter creation failureEran Ben Elisha1-1/+1
Print warning message in case of TX reporter creation failure, only if the return value is ERR_PTR type. NULL pointer return indicates that NET_DEVLINK is not set, and the warning print can be skipped. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-07net/mlx5e: Add tx timeout support for mlx5e tx reporterEran Ben Elisha1-0/+38
With this patch, ndo_tx_timeout callback will be redirected to the tx reporter in order to detect a tx timeout error and report it to the devlink health. (The watchdog detects tx timeouts, but the driver verify the issue still exists before launching any recover method). In addition, recover from tx timeout in case of lost interrupt was added to the tx reporter recover method. The tx timeout recover from lost interrupt is not a new feature in the driver, this patch re-organize the functionality and move it to the tx reporter recovery flow. tx timeout example: (with auto_recover set to false, if set to true, the manual recover and diagnose sections are irrelevant) $cat /sys/kernel/debug/tracing/trace ... devlink_health_report: bus_name=pci dev_name=0000:00:09.0 driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a, CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000 $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 0 last_dump_ts N/A parameters: grace_period 500 auto_recover false $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": true },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: true sqn: 142 HW state: 1 stopped: false $devlink health recover pci/0000:00:09 reporter tx $devlink health show pci/0000:00:09.0: name tx state healthy #err 1 #recover 1 last_dump_ts N/A parameters: grace_period 500 auto_recover false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07net/mlx5e: Add tx reporter supportEran Ben Elisha1-0/+259
Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of tx errors. This patch declares the TX reporter operations and creates it using the devlink health API. Currently, this reporter supports reporting and recovering from send error CQE only. In addition, it adds diagnose information for the open SQs. For a local SQ recover (due to driver error report), in case of SQ recover failure, the recover operation will be considered as a failure. For a full tx recover, an attempt to close and open the channels will be done. If this one passed successfully, it will be considered as a successful recover. The SQ recover from error CQE flow is not a new feature in the driver, this patch re-organize the functions and adapt them for the devlink health API. For this purpose, move code from en_main.c to a new file named reporter_tx.c. Diagnose output: $devlink health diagnose pci/0000:00:09.0 reporter tx -j -p { "SQs": [ { "sqn": 138, "HW state": 1, "stopped": false },{ "sqn": 142, "HW state": 1, "stopped": false } ] } $devlink health diagnose pci/0000:00:09.0 reporter tx SQs: sqn: 138 HW state: 1 stopped: false sqn: 142 HW state: 1 stopped: false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25net: Revert devlink health changes.David S. Miller1-356/+0
This reverts the devlink health changes from 9/17/2019, Jiri wants things to be designed differently and it was agreed that the easiest way to do this is start from the beginning again. Commits reverted: cb5ccfbe73b389470e1dc11061bb185ef4bc9aec 880ee82f0313453ec5a6cb122866ac057263066b c7af343b4e33578b7de91786a3f639c8cfa0d97b ff253fedab961b22117a73ab808fcfa9e6852b50 6f9d56132eb6d2603d4273cfc65bed914ec47acb fcd852c69d776c0f46c8f79e8e431e5cc6ddc7b7 8a66704a13d9713593342e29b4f0c19762f5746b 12bd0dcefe88782ac1c9fff632958dd1b71d27e5 aba25279c10094c5c97d09c3491ca86d00b4ad5e ce019faa70f81555fa17ebc1d5a03651f2e7e15a b8c45a033acc607201588f7665ba84207e5149e0 And the follow-on build fix: o33a0efa4baecd689da9474ce0e8b673eb6931c60 Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net/mlx5e: Add TX timeout support for mlx5e TX reporterEran Ben Elisha1-0/+35
With this patch, ndo_tx_timeout callback will be redirected to the TX reporter in order to detect a TX timeout error and report it to the devlink health. (The watchdog detects TX timeouts, but the driver verify the issue still exists before launching any recover method). In addition, recover from TX timeout in case of lost interrupt was added to the TX reporter recover method. The TX timeout recover from lost interrupt is not a new feature in the driver, this patch re-organize the functionality and move it to the TX reporter recovery flow. TX timeout example: (with auto_recover set to false, if set to true, the manual recover and diagnose sections are irrelevant) $cat /sys/kernel/debug/tracing/trace ... devlink_health_report: bus_name=pci dev_name=0000:00:09.0 driver_name=mlx5_core reporter_name=TX: TX timeout on queue: 0, SQ: 0xd8a, CQ: 0x406, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 13972000 $devlink health diagnose pci/0000:00:09 reporter TX SQ 0xd8a: HW state: 1, stopped: 1 SQ 0xe44: HW state: 1, stopped: 0 SQ 0xeb4: HW state: 1, stopped: 0 SQ 0xf1f: HW state: 1, stopped: 0 SQ 0xf80: HW state: 1, stopped: 0 SQ 0xfe5: HW state: 1, stopped: 0 $devlink health recover pci/0000:00:09 reporter TX $devlink health show pci/0000:00:09.0: name TX state healthy #err 1 #recover 1 last_dump_ts N/A dump_available false attributes: grace_period 500 auto_recover false Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net/mlx5e: Add TX reporter supportEran Ben Elisha1-0/+321
Add mlx5e tx reporter to devlink health reporters. This reporter will be responsible for diagnosing, reporting and recovering of TX errors. This patch declares the TX reporter operations and allocate it using the devlink health API. Currently, this reporter supports reporting and recovering from send error CQE only. In addition, it adds diagnose information for the open SQs. For a local SQ recover (due to driver error report), in case of SQ recover failure, the recover operation will be considered as a failure. For a full TX recover, an attempt to close and open the channels will be done. If this one passed successfully, it will be considered as a successful recover. The SQ recover from error CQE flow is not a new feature in the driver, this patch re-organize the functions and adapt them for the devlink health API. For this purpose, move code from en_main.c to a new file named reporter_tx.c. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>