Age | Commit message (Collapse) | Author | Files | Lines |
|
Add the rx_oversize_pkts_buffer counter to ethtool statistics.
This counter exposes the number of dropped received packets due to
length which arrived to RQ and exceed software buffer size allocated by
the device for incoming traffic. It might imply that the device MTU is
larger than the software buffers size.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the following statistics:
RX successfully decrypted MACsec packets:
macsec_rx_pkts : Number of packets decrypted successfully
macsec_rx_bytes : Number of bytes decrypted successfully
Rx dropped MACsec packets:
macsec_rx_pkts_drop : Number of MACsec packets dropped
macsec_rx_bytes_drop : Number of MACsec bytes dropped
TX successfully encrypted MACsec packets:
macsec_tx_pkts : Number of packets encrypted/authenticated successfully
macsec_tx_bytes : Number of bytes encrypted/authenticated successfully
Tx dropped MACsec packets:
macsec_tx_pkts_drop : Number of MACsec packets dropped
macsec_tx_bytes_drop : Number of MACsec bytes dropped
The above can be seen using:
ethtool -S <ifc> |grep macsec
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PTP TX port timestamp relies on receiving 2 CQEs for each outgoing
packet (WQE). The regular CQE has a less accurate timestamp than the
wire CQE. On link change, the wire CQE may get lost. Let the driver
detect and restore the relation between the CQEs, and re-sync after
timeout.
Add resiliency for this as follows: add id (producer counter)
into the WQE's metadata. This id will be received in the wire
CQE (in wqe_counter field). On handling the wire CQE, if there is no
match, replay the PTP application with the time-stamp from the regular
CQE and restore the sync between the CQEs and their SKBs. This patch
adds 2 ptp counters:
1) ptp_cq0_resync_event: number of times a mismatch was detected between
the regular CQE and the wire CQE.
2) ptp_cq0_resync_cqe: total amount of missing wire CQEs.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Preparation for dynamic allocation of the HTB struct.
The statistics should be preserved even when the struct is de-allocated.
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The existing capability check for vnic env counters only checks for
receive steering discards, although we need the counters update for the
exposed internal queue oob counter as well. This could result in the
latter counter not being updated correctly when the receive steering
discards counter is not supported.
Fix that by checking whether any counter is supported instead of only
the steering counter capability.
Fixes: 0cfafd4b4ddf ("net/mlx5e: Add device out of buffer counter")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Delete the statistics that is not used anymore.
Link: https://lore.kernel.org/r/3f194752881e095910c887dd5cede1dcba6acaf3.1649232994.git.leonro@nvidia.com
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
After removal FPGA TLS, we can remove tls->ktls indirection too,
as it is the same thing.
Link: https://lore.kernel.org/r/67e596599edcffb0de43f26551208dfd34ac777e.1649073691.git.leonro@nvidia.com
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
We need to sync page pool stats only for active channels. Reading ethtool
stats on a down netdev or a netdev with modified number of channels will
result in a user-after-free, trying to access page pools that are freed
already.
BUG: KASAN: use-after-free in mlx5e_stats_grp_sw_update_stats+0x465/0xf80
Read of size 8 at addr ffff888004835e40 by task ethtool/720
Fixes: cc10e84b2ec3 ("mlx5: add support for page_pool_get_stats")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Joe Damato <jdamato@fastly.com>
Link: https://lore.kernel.org/r/20220312005353.786255-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This change adds support for the page_pool_get_stats API to mlx5. If the
user has enabled CONFIG_PAGE_POOL_STATS in their kernel, ethtool will
output page pool stats.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Acked-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tools/testing/selftests/net/mptcp/mptcp_join.sh
34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers")
857898eb4b28 ("selftests: mptcp: add missing join check")
6ef84b1517e0 ("selftests: mptcp: more robust signal race test")
https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h
drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c
fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions")
c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information")
09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr")
84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr")
efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow")
3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fec counters support is checked via the PCAM feature_cap_mask,
bit 0: PPCNT_counter_group_Phy_statistical_counter_group.
Add feature check to avoid faulty behavior.
Fixes: 0a1498ebfa55 ("net/mlx5e: Expose FEC counters via ethtool")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
There is a configuration where the uplink interface is the synchronizer.
Add PTP counters for this interface for monitoring.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Build bot reports:
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c: In function 'fec_set_block_stats':
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:1235:48: error: 'outl' undeclared (first use in this function); did you mean 'out'?
1235 | if (mlx5_core_access_reg(mdev, in, sz, outl, sz, MLX5_REG_PPCNT, 0, 0))
| ^~~~
| out
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220109213321.2292830-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add FEC counters' statistics of corrected_blocks and
uncorrectable_blocks, along with their lanes via ethtool.
HW supports corrected_blocks and uncorrectable_blocks counters both for
RS-FEC mode and FC-FEC mode. In FC mode these counters are accumulated
per lane, while in RS mode the correction method crosses lanes, thus
only total corrected_blocks and uncorrectable_blocks are reported in
this mode.
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Make stats array an array of pointer. This patch comes in to prepare for
the next patch where allocations of the stats are to be performed
dynamically on first usage.
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Treat the string as an argument to avoid this.
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:482:5:
error: format string is not a string literal (potentially insecure)
name);
^~~~
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:2079:4:
error: format string is not a string literal (potentially insecure)
ptp_ch_stats_desc[i].format);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
|
|
This patch adds HW_GRO counters to RX packets statistics:
- gro_match_packets: counter of received packets with set match flag.
- gro_packets: counter of received packets over the HW_GRO feature,
this counter is increased by one for every received
HW_GRO cqe.
- gro_bytes: counter of received bytes over the HW_GRO feature,
this counter is increased by the received bytes for every
received HW_GRO cqe.
- gro_skbs: counter of built HW_GRO skbs,
increased by one when we flush HW_GRO skb
(when we call a napi_gro_receive with hw_gro skb).
- gro_large_hds: counter of received packets with large headers size,
in case the packet needs new SKB, the driver will allocate
new one and will not use the headers entry to build it.
Signed-off-by: Khalid Manaa <khalidm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
PTP-RQ counters title format contains PTP-RQ identifier, which is
mistakenly not passed to sprinft().
This leads to unexpected garbage values instead.
This patch fixes it.
Before applying the patch:
ethtool -S eth3 | grep ptp_rq
ptp_rq15_packets: 0
ptp_rq8_bytes: 0
ptp_rq6_csum_complete: 0
ptp_rq14_csum_complete_tail: 0
ptp_rq3_csum_complete_tail_slow : 0
ptp_rq9_csum_unnecessary: 0
ptp_rq1_csum_unnecessary_inner: 0
ptp_rq7_csum_none: 0
ptp_rq10_xdp_drop: 0
ptp_rq9_xdp_redirect: 0
ptp_rq13_lro_packets: 0
ptp_rq12_lro_bytes: 0
ptp_rq10_ecn_mark: 0
ptp_rq9_removed_vlan_packets: 0
ptp_rq5_wqe_err: 0
ptp_rq8_mpwqe_filler_cqes: 0
ptp_rq2_mpwqe_filler_strides: 0
ptp_rq5_oversize_pkts_sw_drop: 0
ptp_rq6_buff_alloc_err: 0
ptp_rq15_cqe_compress_blks: 0
ptp_rq2_cqe_compress_pkts: 0
ptp_rq2_cache_reuse: 0
ptp_rq12_cache_full: 0
ptp_rq11_cache_empty: 256
ptp_rq12_cache_busy: 0
ptp_rq11_cache_waive: 0
ptp_rq12_congst_umr: 0
ptp_rq11_arfs_err: 0
ptp_rq9_recover: 0
After applying the patch:
ethtool -S eth3 | grep ptp_rq
ptp_rq0_packets: 0
ptp_rq0_bytes: 0
ptp_rq0_csum_complete: 0
ptp_rq0_csum_complete_tail: 0
ptp_rq0_csum_complete_tail_slow : 0
ptp_rq0_csum_unnecessary: 0
ptp_rq0_csum_unnecessary_inner: 0
ptp_rq0_csum_none: 0
ptp_rq0_xdp_drop: 0
ptp_rq0_xdp_redirect: 0
ptp_rq0_lro_packets: 0
ptp_rq0_lro_bytes: 0
ptp_rq0_ecn_mark: 0
ptp_rq0_removed_vlan_packets: 0
ptp_rq0_wqe_err: 0
ptp_rq0_mpwqe_filler_cqes: 0
ptp_rq0_mpwqe_filler_strides: 0
ptp_rq0_oversize_pkts_sw_drop: 0
ptp_rq0_buff_alloc_err: 0
ptp_rq0_cqe_compress_blks: 0
ptp_rq0_cqe_compress_pkts: 0
ptp_rq0_cache_reuse: 0
ptp_rq0_cache_full: 0
ptp_rq0_cache_empty: 256
ptp_rq0_cache_busy: 0
ptp_rq0_cache_waive: 0
ptp_rq0_congst_umr: 0
ptp_rq0_arfs_err: 0
ptp_rq0_recover: 0
Fixes: a28359e922c6 ("net/mlx5e: Add PTP-RX statistics")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The value for maximum number of channels is first calculated based
on the netdev's profile and current function resources (specifically,
number of MSIX vectors, which depends among other things on the number
of online cores in the system).
This value is then used to calculate the netdev's number of rxqs/txqs.
Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
is constant and we must not exceed it.
To achieve this, keep the maximum number of channels in sync upon any
netdevice re-attach.
Use mlx5e_get_max_num_channels() for calculating the number of netdev's
rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
coinsiders core device resources, profile, and netdev) to init or
update priv->max_nch.
Before this patch, the value of priv->max_nch might get out of sync,
mistakenly allowing accesses to out-of-bounds objects, which would
crash the system.
Track the number of channels stats structures used in a separate
field, as they are persistent to suspend/resume operations. All the
collected stats of every channel index that ever existed should be
preserved. They are reset only when struct mlx5e_priv is,
in mlx5e_priv_cleanup(), which is part of the profile changing flow.
There is no point anymore in blocking a profile change due to max_nch
mismatch in mlx5e_netdev_change_profile(). Remove the limitation.
Fixes: a1f240f18017 ("net/mlx5e: Adjust to max number of channles when re-attaching")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add support for PHY/MAC/Ctrl/RMON stats.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the TLS logic finds a tcp seq match for a kTLS RX resync
request, it calls the driver callback function mlx5e_ktls_resync()
to handle it and communicate it to the device.
Errors might occur during mlx5e_ktls_resync(), however, they are not
reported to the stack. Moreover, there is no error handling in the
stack for these errors.
In this patch, the driver obtains responsibility on errors handling,
adding queue and retry mechanisms to these resyncs.
We maintain a linked list of resync matches, and try posting them
to the async ICOSQ in the NAPI context.
Only possible failure that demands driver handling is ICOSQ being full.
By relying on the NAPI mechanism, we make sure that the entries in list
will be handled when ICOSQ completions arrive and make some room
available.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Report corrected bits.
v2: catch reg access errors (Saeed)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some TLS RX counters increment per socket/connection, and are not
protected against parallel modifications from several cores.
Switch them to atomic counters by taking them out of the RQ stats into
the global atomic TLS stats.
In this patch, we touch 'rx_tls_ctx/del' that count the number of
device-offloaded RX TLS connections added/deleted.
These counters are updated in the add/del callbacks, out of the fast
data-path.
This change is not needed for counters that increment only in NAPI
context, as they are protected by the NAPI mechanism.
Keep them as tls_* counters under 'struct mlx5e_rq_stats'.
Fixes: 76c1e1ac2aae ("net/mlx5e: kTLS, Add kTLS RX stats")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Some TLS TX counters increment per socket/connection, and are not
protected against parallel modifications from several cores.
Switch them to atomic counters by taking them out of the SQ stats into
the global atomic TLS stats.
In this patch, we touch a single counter 'tx_tls_ctx' that counts the
number of device-offloaded TX TLS connections added.
Now that this counter can be increased without the for having the SQ
context in hand, move it to the mlx5e_ktls_add_tx() callback where it
really belongs, out of the fast data-path.
This change is not needed for counters that increment only in NAPI
context or under the TX lock, as they are already protected.
Keep them as tls_* counters under 'struct mlx5e_sq_stats'.
Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Like PTP-TX, once the PTP-RX is opened, corresponding statistics appear.
Add indication that PTP-RX was ever opened: rx_ptp_opened. If any of the
PTP RX or TX were opened, display the PTP channel's statistics.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Following patches in the set add support for RX PTP. Rename PTP prefix
from %s/port_ptp/ptp/g to include RX PTP too.
In addition rename indication (used in statistics context) that PTP-SQ
was opened: %s/port_ptp_opened/tx_ptp_opened/g. This will simplify adding
indication that PTP-RQ was opened.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This commit adds support for HTB offload in the mlx5e driver.
Performance:
NIC: Mellanox ConnectX-6 Dx
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (24 cores with HT)
100 Gbit/s line rate, 500 UDP streams @ ~200 Mbit/s each
48 traffic classes, flower used for steering
No shaping (rate limits set to 4 Gbit/s per TC) - checking for max
throughput.
Baseline: 98.7 Gbps, 8.25 Mpps
HTB: 6.7 Gbps, 0.56 Mpps
HTB offload: 95.6 Gbps, 8.00 Mpps
Limitations:
1. 256 leaf nodes, 3 levels of depth.
2. Granularity for ceil is 1 Mbit/s. Rates are converted to weights, and
the bandwidth is split among the siblings according to these weights.
Other parameters for classes are not supported.
Ethtool statistics support for QoS SQs are also added. The counters are
called qos_txN_*, where N is the QoS queue number (starting from 0, the
numeration is separate from the normal SQs), and * is the counter name
(the counters are the same as for the normal SQs).
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Transmitted packet timestamping accuracy can be improved when using
timestamp from the port, instead of packet CQE creation timestamp, as
it better reflects the actual time of a packet's transmit.
TX port timestamping is supported starting from ConnectX6-DX hardware.
Although at the original completion, only CQE timestamp can be attached,
we are able to get TX port timestamping via an additional completion over
a special CQ associated with the SQ (in addition to the regular CQ).
Driver to ignore the original packet completion timestamp, and report
back the timestamp of the special CQ completion. If the absolute timestamp
diff between the two completions is greater than 1 / 128 second, ignore
the TX port timestamp as it has a jitter which is too big.
No skb will be generate out of the extra completion.
Allocate additional CQ per ptpsq, to receive the TX port timestamp.
Driver to hold an skb FIFO in order to map between transmitted skb to
the two expected completions. When using ptpsq, hold double refcount on
the skb, to gaurantee it will not get released before both completions
arrive.
Expose dedicated counters of the ptp additional CQ and connect it to the
TX health reporter.
This patch improves TX Hardware timestamping offset to be less than 40ns
at a 100Gbps line rate, compared to 600ns before.
With that, making our HW compliant with G.8273.2 class C, and allow Linux
systems to be deployed in the 5G telco edge, where this standard is a
must.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add TX PTP port object support for better TX timestamping accuracy.
Currently, driver supports CQE based TX port timestamp. Device
also offers TX port timestamp, which has less jitter and better
reflects the actual time of a packet's transmit.
Define new driver layout called ptpsq, on which driver will create
SQs that will support TX port timestamp for their transmitted packets.
Driver to identify PTP TX skbs and steer them to these dedicated SQs
as part of the select queue ndo.
Driver to hold ptpsq per TC and report them at
netif_set_real_num_tx_queues().
Add support for all needed functionality in order to xmit and poll
completions received via ptpsq.
Add ptpsq to the TX reporter recover, diagnose and dump methods.
Creation of ptpsqs is disabled by default, and can be enabled via
tx_port_ts private flag.
This patch steer all timestamp related packets to a ptpsq, but it
does not open the port timestamp support for it. The support will
be added in the following patch.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
SW group counter update function aggregates sw stats out of many
mlx5e_*_stats resides in a given mlx5e_channel_stats struct.
Split the function into a few helper functions.
This will be used later in the series to calculate specific
mlx5e_*_stats which are not defined inside mlx5e_channel_stats.
Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Saeed Mahameed says:
====================
mlx5-updates-2020-09-21
Multi packet TX descriptor support for SKBs.
This series introduces some refactoring of the regular TX data path in
mlx5 and adds the Enhanced TX MPWQE feature support. MPWQE stands for
multi-packet work queue element, and it can serve multiple packets,
reducing the PCI bandwidth spent on control traffic. It should improve
performance in scenarios where PCI is the bottleneck, and xmit_more is
signaled by the kernel. The refactoring done in this series also
improves the packet rate on its own.
MPWQE is already implemented in the XDP tx path, this series adds the
support of MPWQE for regular kernel SKB tx path.
MPWQE is supported from ConnectX-5 and onward, for legacy devices we need
to keep backward compatibility for regular (Single packet) WQE descriptor.
MPWQE is not compatible with certain offloads and features, such as TLS
offload, TSO, nonlinear SKBs. If such incompatible features are in use,
the driver gracefully falls back to non-MPWQE per SKB.
Prior to the final patch "net/mlx5e: Enhanced TX MPWQE for SKBs" that adds
the actual support, Maxim did some refactoring to the tx data path to
split it into stages and smaller helper functions that can be utilized and
reused for both legacy and new MPWQE feature.
Performance testing:
UDP performance is improved in a single stream pktgen test:
Packet rate: 16.86 Mpps (±0.15 Mpps) -> 20.94 Mpps (±0.33 Mpps)
Instructions per packet: 434 -> 329
Cycles per packet: 158 -> 123
Instructions per cycle: 2.75 -> 2.67
TCP and XDP_TX single stream tests show no performance difference.
MPWQE can reduce PCI bandwidth:
PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 80.3%
Inbound PCI utilization with MPWQE on: 59.0%
PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 65.4%
Inbound PCI utilization with MPWQE on: 49.3%
MPWQE can also reduce CPU load, increasing the packet rate in case of
CPU bottleneck:
PCI Gen2, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 37.5 Mpps
Packet rate with MPWQE on: 49.0 Mpps
PCI Gen3, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 57.0 Mpps
Packet rate with MPWQE on: 66.8 Mpps
Burst size in all pktgen tests is 32.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Two minor conflicts:
1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.
2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit adds support for Enhanced TX MPWQE feature in the regular
(SKB) data path. A MPWQE (multi-packet work queue element) can serve
multiple packets, reducing the PCI bandwidth on control traffic.
Two new stats (tx*_mpwqe_blks and tx*_mpwqe_pkts) are added. The feature
is on by default and controlled by the skb_tx_mpwqe private flag.
In a MPWQE, eseg is shared among all packets, so eseg-based offloads
(IPSEC, GENEVE, checksum) run on a separate eseg that is compared to the
eseg of the current MPWQE session to decide if the new packet can be
added to the same session.
MPWQE is not compatible with certain offloads and features, such as TLS
offload, TSO, nonlinear SKBs. If such incompatible features are in use,
the driver gracefully falls back to non-MPWQE.
This change has no performance impact in TCP single stream test and
XDP_TX single stream test.
UDP pktgen, 64-byte packets, single stream, MPWQE off:
Packet rate: 16.96 Mpps (±0.12 Mpps) -> 17.01 Mpps (±0.20 Mpps)
Instructions per packet: 421 -> 429
Cycles per packet: 156 -> 161
Instructions per cycle: 2.70 -> 2.67
UDP pktgen, 64-byte packets, single stream, MPWQE on:
Packet rate: 16.96 Mpps (±0.12 Mpps) -> 20.94 Mpps (±0.33 Mpps)
Instructions per packet: 421 -> 329
Cycles per packet: 156 -> 123
Instructions per cycle: 2.70 -> 2.67
Enabling MPWQE can reduce PCI bandwidth:
PCI Gen2, pktgen at fixed rate of 36864000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 80.3%
Inbound PCI utilization with MPWQE on: 59.0%
PCI Gen3, pktgen at fixed rate of 56064000 pps on 24 CPU cores:
Inbound PCI utilization with MPWQE off: 65.4%
Inbound PCI utilization with MPWQE on: 49.3%
Enabling MPWQE can also reduce CPU load, increasing the packet rate in
case of CPU bottleneck:
PCI Gen2, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 37.5 Mpps
Packet rate with MPWQE on: 49.0 Mpps
PCI Gen3, pktgen at full rate on 24 CPU cores:
Packet rate with MPWQE off: 57.0 Mpps
Packet rate with MPWQE on: 66.8 Mpps
Burst size in all pktgen tests is 32.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (x86_64)
NIC: Mellanox ConnectX-6 Dx
GCC 10.2.0
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commit started to reuse function mlx5e_update_ndo_stats() for
the representors as well.
However, the function is hard-coded to work on mlx5e_nic_stats_grps only.
Due to this issue, the representors statistics were not updated in the
output of "ip -s".
Fix it to work with the correct group by extracting it from the caller's
profile.
Also, while at it and since this function became generic, move it to
en_stats.c and rename it accordingly.
Fixes: 8a236b15144b ("net/mlx5e: Convert rep stats to mlx5e_stats_grp-based infra")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Plumb through all the indirection and copy some code from
ethtool -S. The names of the group indicate that these are
the stats we are after (and Saeed confirms it).
v3:
- fix build in mlx5_rep
v2:
- drop the ethool helper and call stats directly
- don't pass 0 as initialized to in buffer
- use local buffer
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add global and per-channel ethtool SW stats for the device
offload.
Document the new counters in tls-offload.rst.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
mlx5 updates for both net-next and rdma-next:
1) HW bits and definitions for TLS and IPsec offlaods
2) Release all pages capability bits
3) New command interface helpers and some code cleanup as a result
4) Move qp.c out of mlx5 core driver into mlx5_ib rdma driver
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Do mass update of statistics to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
The current HW counters are supported only by Innova, split the ipsec
stats group into two groups, one for HW and one for SW. And expose
the HW counters to ethtool only if Innova HW is used for IPsec offload.
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
mlx5 core users are encouraged to use low level API (mlx5_cmd_exec)
without the need of helper functions, do this for q counters, remove
helper functions and call mlx5_cmd_exec directly from users.
This will help reduce the total amount of code and reduction of the
mlx5_core symbol table.
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Extend stats group array of uplink representor with all stats that are
available for PF in legacy mode, besides ipsec and TLS which are not
supported.
Don't output vport stats for uplink representor because they are already
handled by 802_3 group (with different names: {tx|rx}_{bytes|packets}_phy).
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Don't copy all of the stats groups used for mlx5e ethernet NIC profile,
have a separate stats groups for IPoIB with the set of the needed stats
only.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Convert stats groups array to array of "stats group" pointers to allow
sharing and individual selection of groups per profile as illustrated in
the next patches.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
|
|
Introduce new macros to declare stats callbacks and groups, for better
code reuse and for individual groups selection per profile which will be
introduced in next patches.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
|
|
Attach stats groups array to the profiles and make the stats utility
functions (get_num, update, fill, fill_strings) generic and use the
profile->stats_grps rather the hardcoded NIC stats groups.
This will allow future extension to have per profile stats groups.
In this patch mlx5e NIC and IPoIB will still share the same stats
groups.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
|
|
This merge syncs with mlx5-next latest HW bits and layout updates for next
features, in addition one patch that improves
mlx5_create_auto_grouped_flow_table() API across all mlx5 users.
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Refactor mlx5_create_auto_grouped_flow_table
net/mlx5e: Add discard counters per priority
net/mlx5e: Expose FEC feilds and related capability bit
net/mlx5: Add mlx5_ifc definitions for connection tracking support
net/mlx5: Add copy header action struct layout
net/mlx5: Expose resource dump register mapping
net/mlx5: Add structures and defines for MIRC register
net/mlx5: Read MCAM register groups 1 and 2
net/mlx5: Add structures layout for new MCAM access reg groups
net/mlx5: Expose vDPA emulation device capabilities
net/mlx5: Add Virtio Emulation related device capabilities
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add counters that count (per priority) the number of received
packets that dropped due to lack of buffers on a physical port. If
this counter is increasing, it implies that the adapter is
congested and cannot absorb the traffic coming from the network.
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In some configurations, gcc tries too hard to optimize this code:
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c: In function 'mlx5e_grp_sw_update_stats':
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:302:1: error: the frame size of 1336 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
As was stated in the bug report, the reason is that gcc runs into a corner
case in the register allocator that is rather hard to fix in a good way.
As there is an easy way to work around it, just add a comment and the
barrier that stops gcc from trying to overoptimize the function.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92657
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Cited patch changed (channel index, tc) => (TXQ index) mapping to be a
static one, in order to keep indices consistent when changing number of
channels or TCs.
For 32 channels (OOB) and 8 TCs, real num of TXQs is 256.
When reducing the amount of channels to 8, the real num of TXQs will be
changed to 64.
This indices method is buggy:
- Channel #0, TC 3, the TXQ index is 96.
- Index 8 is not valid, as there is no such TXQ from driver perspective
(As it represents channel #8, TC 0, which is not valid with the above
configuration).
As part of driver's select queue, it calls netdev_pick_tx which returns an
index in the range of real number of TXQs. Depends on the return value,
with the examples above, driver could have returned index larger than the
real number of tx queues, or crash the kernel as it tries to read invalid
address of SQ which was not allocated.
Fix that by allocating sequential TXQ indices, and hold a new mapping
between (channel index, tc) => (real TXQ index). This mapping will be
updated as part of priv channels activation, and is used in
mlx5e_select_queue to find the selected queue index.
The existing indices mapping (channel_tc2txq) is no longer needed, as it
is used only for statistics structures and can be calculated on run time.
Delete its definintion and updates.
Fixes: 8bfaf07f7806 ("net/mlx5e: Present SW stats when state is not opened")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|