aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/en/health.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-10-15net/mlx5: Read timeout values from DTORAmir Tzin1-1/+0
Replace hard coded timeouts with values stored by firmware in default timeouts register (DTOR). Timeouts are read during driver load. If DTOR is not supported by firmware then fallback to hard coded defaults instead. Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Move MLX5E_RX_ERR_CQE macroEran Ben Elisha1-2/+0
MLX5E_RX_ERR_CQE Macro is used only in data-path, move it to the appropriate header file. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Allow SQ outside of channel contextEran Ben Elisha1-1/+1
In order to be able to create an SQ outside of a channel context, remove sq->channel direct pointer. This requires adding a direct pointer to: netdevice, priv and mlx5_core in order to support SQs that are part of mlx5e_channel. Use channel_stats from the corresponding CQ. Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-12-08net/mlx5e: Allow RQ outside of channel contextAya Levin1-1/+2
In order to be able to create an RQ outside of a channel context, remove rq->channel direct pointer. This requires adding a direct pointer to: ICOSQ and priv in order to support RQs that are part of mlx5e_channel. Use channel_stats from the corresponding CQ. Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2020-07-02net/mlx5e: Add EQ info to TX/RX reporter's diagnoseAya Levin1-0/+1
Enhance TX/RX reporter's diagnose to include info about the corresponding EQ. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 1713 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 ci: 0 size: 1024 EQ: eqn: 7 irqn: 42 vecidx: 1 ci: 93 size: 2048 channel ix: 1 rqn: 1718 HW state: 1 SW state: 5 WQE counter: 7 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 ci: 0 size: 1024 EQ: eqn: 8 irqn: 43 vecidx: 2 ci: 2 size: 2048 $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common Config: SQ: stride size: 64 size: 1024 CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 1712 HW state: 1 stopped: false cc: 91 pc: 91 CQ: cqn: 1030 HW status: 0 ci: 91 size: 1024 EQ: eqn: 7 irqn: 42 vecidx: 1 ci: 93 size: 2048 channel ix: 1 tc: 0 txq ix: 1 sqn: 1717 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1034 HW status: 0 ci: 0 size: 1024 EQ: eqn: 8 irqn: 43 vecidx: 2 ci: 2 size: 2048 Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-02net/mlx5e: Rename reporter's helpersAya Levin1-4/+4
Change prefix to match resident file: %s/mlx5e_reporter_cq_diagnose/mlx5e_health_cq_diag_fmsg %s/mlx5e_reporter_cq_common_diagnose/mlx5e_health_cq_common_diag_fmsg %s/mlx5e_reporter_named_obj_nest_start/mlx5e_health_fmsg_named_obj_nest_start %s/mlx5e_reporter_named_obj_nest_end/mlx5e_health_fmsg_named_obj_nest_end Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-02net/mlx5e: Add a flush timeout defineAya Levin1-0/+1
During queue's recovery, driver waits for flush. The flush timeout is set to 2 seconds. Add a define for this value for the benefit of RX and TX reporters. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-02net/mlx5e: Change reporters create functions to return voidEran Ben Elisha1-3/+3
Creation of devlink health reporters is not fatal for mlx5e instance load. In case of error in reporter's creation, the return value is ignored. Change all reporters creation functions to return void. In addition, with this change, a failure in creating a reporter, will not prevent the driver from trying to create the next reporter in the list. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-2/+1
Overlapping header include additions in macsec.c A bug fix in 'net' overlapping with the removal of 'version' string in ena_netdev.c Overlapping test additions in selftests Makefile Overlapping PCI ID table adjustments in iwlwifi driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24net/mlx5e: Do not recover from a non-fatal syndromeAya Levin1-2/+1
For non-fatal syndromes like LOCAL_LENGTH_ERR, recovery shouldn't be triggered. In these scenarios, the RQ is not actually in ERR state. This misleads the recovery flow which assumes that the RQ is really in error state and no more completions arrive, causing crashes on bad page state. Fixes: 8276ea1353a4 ("net/mlx5e: Report and recover from CQE with error on RQ") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-18net/mlx5e: Support dump callback in TX reporterAya Levin1-2/+6
Add support for SQ's FW dump on TX reporter's events. Use Resource dump API to retrieve the relevant data: SX slice, SQ dump and SQ buffer. Wrap it in formatted messages and store the binary output in devlink core. Example: $ devlink health dump show pci/0000:00:0b.0 reporter tx SX Slice: data: 00 00 00 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 02 01 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 20 40 90 81 88 ff ff 00 00 00 00 00 00 00 00 15 00 15 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 81 ae 41 06 00 ea ff ff SQs: SQ: index: 1511 data: 00 00 00 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 02 01 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 20 40 90 81 88 ff ff 00 00 00 00 00 00 00 00 15 00 15 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 81 ae 41 06 00 ea ff ff SQ: index: 1516 data: 00 00 00 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 02 01 00 00 00 00 80 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de 00 20 40 90 81 88 ff ff 00 00 00 00 00 00 00 00 15 00 15 00 00 00 00 00 ff ff ff ff 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 81 ae 41 06 00 ea ff ff $ devlink health dump show pci/0000:00:0b.0 reporter tx -jp { "SX Slice": { "data": [ 0,0,0,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,32,64,144,129,136,255,255,0,0,0,0,0,0,0,0,21,0,21,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,129,174,65,6,0,234,255,255], }, "SQs": [ { "SQ": { "index": 1511, "data": [ 0,0,0,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,32,64,144,129,136,255,255,0,0,0,0,0,0,0,0,21,0,21,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,129,174,65,6,0,234,255,255] } },{ "SQ": { "index": 1516, "data": [ 0,0,0,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,128,0,1,0,0,0,0,173,222,34,1,0,0,0,0,173,222,0,32,64,144,129,136,255,255,0,0,0,0,0,0,0,0,21,0,21,0,0,0,0,0,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,129,174,65,6,0,234,255,255] } } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Report and recover from CQE with error on RQAya Levin1-0/+9
Add support for report and recovery from error on completion on RQ by setting the queue back to ready state. Handle only errors with a syndrome indicating the RQ might enter error state and could be recovered. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: RX, Handle CQE with error at the earliest stageSaeed Mahameed1-0/+2
Just to be aligned with the MPWQE handlers, handle RX WQE with error for legacy RQs in the top RX handlers, just before calling skb_from_cqe(). CQE error handling will now be called at the same stage regardless of the RQ type or netdev mode NIC, Representor, IPoIB, etc .. This will be useful for down stream patch to improve error CQE handling. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Report and recover from rx timeoutAya Levin1-0/+1
Add support for report and recovery from rx timeout. On driver open we post NOP work request on the rx channels to trigger napi in order to fillup the rx rings. In case napi wasn't scheduled due to a lost interrupt, perform EQ recovery. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Report and recover from CQE error on ICOSQAya Levin1-0/+1
Add support for report and recovery from error on completion on ICOSQ. Deactivate RQ and flush, then deactivate ICOSQ. Set the queue back to ready state (firmware) and reset the ICOSQ and the RQ (software resources). Finally, activate the ICOSQ and the RQ. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add support to rx reporter diagnoseAya Levin1-0/+3
Add rx reporter, which supports diagnose call-back. Diagnostics output include: information common to all RQs: RQ type, RQ size, RQ stride size, CQ size and CQ stride size. In addition advertise information per RQ and its related icosq and attached CQ. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4308 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 channel ix: 1 rqn: 4313 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 channel ix: 2 rqn: 4318 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1040 HW status: 0 channel ix: 3 rqn: 4323 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1044 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp { "Common config": { "RQ": { "type": 2, "stride size": 2048, "size": 8 }, "CQ": { "stride size": 64, "size": 1024 } }, "RQs": [ { "channel ix": 0, "rqn": 4308, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1032, "HW status": 0 } },{ "channel ix": 1, "rqn": 4313, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1036, "HW status": 0 } },{ "channel ix": 2, "rqn": 4318, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1040, "HW status": 0 } },{ "channel ix": 3, "rqn": 4323, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1044, "HW status": 0 } } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add helper functions for reporter's basicsAya Levin1-0/+4
Introduce helper functions for create and destroy reporters and update channels. In the following patch, rx reporter is added and it will use these helpers too. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Add cq info to tx reporter diagnoseAya Levin1-0/+2
Add cq information to general diagnose output: CQ size and stride size. Per SQ add information about the related CQ: cqn and CQ's HW status. $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common Config: SQ: stride size: 64 size: 1024 CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4307 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1030 HW status: 0 channel ix: 1 tc: 0 txq ix: 1 sqn: 4312 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1034 HW status: 0 channel ix: 2 tc: 0 txq ix: 2 sqn: 4317 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1038 HW status: 0 channel ix: 3 tc: 0 txq ix: 3 sqn: 4322 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1042 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp { "Common Config": { "SQ": { "stride size": 64, "size": 1024 }, "CQ": { "stride size": 64, "size": 1024 } }, "SQs": [ { "channel ix": 0, "tc": 0, "txq ix": 0, "sqn": 4307, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1030, "HW status": 0 } },{ "channel ix": 1, "tc": 0, "txq ix": 1, "sqn": 4312, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1034, "HW status": 0 } },{ "channel ix": 2, "tc": 0, "txq ix": 2, "sqn": 4317, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1038, "HW status": 0 } },{ "channel ix": 3, "tc": 0, "txq ix": 3, "sqn": 4322, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1042, "HW status": 0 } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Extend tx reporter diagnostics outputAya Levin1-0/+3
Enhance tx reporter's diagnostics output to include: information common to all SQs: SQ size, SQ stride size. In addition add channel ix, tc, txq ix, cc and pc. $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common config: SQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4307 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 1 tc: 0 txq ix: 1 sqn: 4312 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 2 tc: 0 txq ix: 2 sqn: 4317 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 3 tc: 0 txq ix: 3 sqn: 4322 HW state: 1 stopped: false cc: 0 pc: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp { "Common config": { "SQ": { "stride size": 64, "size": 1024 } }, "SQs": [ { "channel ix": 0, "tc": 0, "txq ix": 0, "sqn": 4307, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 1, "tc": 0, "txq ix": 1, "sqn": 4312, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 2, "tc": 0, "txq ix": 2, "sqn": 4317, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 3, "tc": 0, "txq ix": 3, "sqn": 4322, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Generalize tx reporter's functionalityAya Levin1-0/+14
Prepare for code sharing with rx reporter, which is added in the following patches in the set. Introduce a generic error_ctx for agnostic recovery despatch. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Change naming convention for reporter's functionsAya Levin1-4/+4
Change from mlx5e_tx_reporter_* to mlx5e_reporter_tx_*. In the following patches in the set rx reporter is added, the new naming convention is more uniformed. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-20net/mlx5e: Rename reporter header fileAya Levin1-0/+14
Rename reporter.h -> health.h so patches in the set can use it for health related functionality. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>