aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/ipoib (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-08-28net/mlx5e: Support LAG TX port affinity distributionMaxim Mikityanskiy2-5/+5
When the VF LAG is in use, round-robin the TX affinity of channels among the different ports, if supported by the firmware. Create a set of TISes per port, while doing round-robin of the channels over the different sets. Let all SQs of a channel share the same set of TISes. If lag_tx_port_affinity HCA cap bit is supported, num_lag_ports > 1 and we aren't the LACP owner (PF in the regular use), assign the affinities, otherwise use tx_affinity == 0 in TIS context to let the FW assign the affinities itself. The TISes of the LACP owner are mapped only to the native physical port. For VFs, the starting port for round-robin is determined by its vhca_id, because a VF may have only one channel if attached to a single-core VM. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-15net/mlx5e: Fix compatibility issue with ethtool flash deviceEran Ben Elisha1-0/+9
Cited patch deleted ethtool flash device support, as ethtool core can fallback into devlink flash callback. However, this is supported only if there is a devlink port registered over the corresponding netdevice. As mlx5e do not have devlink port support over native netdevice, it broke the ability to flash device via ethtool. This patch re-add the ethtool callback to avoid user functionality breakage when trying to flash device via ethtool. Fixes: 9c8bca2637b8 ("mlx5: Move firmware flash implementation to devlink") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-25net/mlx5e: Fix wrong max num channels indicationTariq Toukan2-4/+4
No XSK support in the enhanced IPoIB driver and representors. Add a profile property to specify this, and enhance the logic that calculates the max number of channels to take it into account. Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-11Merge tag 'mlx5-fixes-2019-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller1-1/+8
Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-07-11 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn') For -stable v5.1 ('net/mlx5e: Fix port tunnel GRE entropy control') ('net/mlx5e: Rx, Fix checksum calculation for new hardware') ('net/mlx5e: Fix return value from timeout recover function') ('net/mlx5e: Fix error flow in tx reporter diagnose') For -stable v5.2 ('net/mlx5: E-Switch, Fix default encap mode') Conflict note: This pull request will produce a small conflict when merged with net-next. In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c Take the hunk from net and replace: esw_offloads_steering_init(esw, vf_nvports, total_nvports); with: esw_offloads_steering_init(esw); ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-11net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rnAya Levin1-1/+8
Check return value from mlx5e_attach_netdev, add error path on failure. Fixes: 48935bbb7ae8 ("net/mlx5e: IPoIB, Add netdevice profile skeleton") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-05net/mlx5e: Re-work TIS creation functionsTariq Toukan3-2/+16
Let the EN TIS creation function (mlx5e_create_tis) be responsible for applying common mdev related fields. Other specific fields must be set by the caller and passed within the inbox. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-7/+7
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-03 The following pull-request contains BPF updates for your *net-next* tree. There is a minor merge conflict in mlx5 due to 8960b38932be ("linux/dim: Rename externally used net_dim members") which has been pulled into your tree in the meantime, but resolution seems not that bad ... getting current bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed just so he's aware of the resolution below: ** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD static int mlx5e_open_cq(struct mlx5e_channel *c, struct dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) ======= int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Resolution is to take the second chunk and rename net_dim_cq_moder into dim_cq_moder. Also the signature for mlx5e_open_cq() in ... drivers/net/ethernet/mellanox/mlx5/core/en.h +977 ... and in mlx5e_open_xsk() ... drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64 ... needs the same rename from net_dim_cq_moder into dim_cq_moder. ** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); struct dim_cq_moder icocq_moder = {0, 0}; struct net_device *netdev = priv->netdev; struct mlx5e_channel *c; unsigned int irq; ======= struct net_dim_cq_moder icocq_moder = {0, 0}; >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Take the second chunk and rename net_dim_cq_moder into dim_cq_moder as well. Let me know if you run into any issues. Anyway, the main changes are: 1) Long-awaited AF_XDP support for mlx5e driver, from Maxim. 2) Addition of two new per-cgroup BPF hooks for getsockopt and setsockopt along with a new sockopt program type which allows more fine-grained pass/reject settings for containers. Also add a sock_ops callback that can be selectively enabled on a per-socket basis and is executed for every RTT to help tracking TCP statistics, both features from Stanislav. 3) Follow-up fix from loops in precision tracking which was not propagating precision marks and as a result verifier assumed that some branches were not taken and therefore wrongly removed as dead code, from Alexei. 4) Fix BPF cgroup release synchronization race which could lead to a double-free if a leaf's cgroup_bpf object is released and a new BPF program is attached to the one of ancestor cgroups in parallel, from Roman. 5) Support for bulking XDP_TX on veth devices which improves performance in some cases by around 9%, from Toshiaki. 6) Allow for lookups into BPF devmap and improve feedback when calling into bpf_redirect_map() as lookup is now performed right away in the helper itself, from Toke. 7) Add support for fq's Earliest Departure Time to the Host Bandwidth Manager (HBM) sample BPF program, from Lawrence. 8) Various cleanups and minor fixes all over the place from many others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-28net/mlx5e: Don't refresh TIRs when updating representor SQsGavi Teitz2-2/+4
Refreshing TIRs is done in order to update the TIRs with the current state of SQs in the transport domain, so that the TIRs can filter out undesired self-loopback packets based on the source SQ of the packet. Representor TIRs will only receive packets that originate from their associated vport, due to dedicated steering, and therefore will never receive self-loopback packets, whose source vport will be the vport of the E-Switch manager, and therefore not the vport associated with the representor. As such, it is not necessary to refresh the representors' TIRs, since self-loopback packets can't reach them. Since representors only exist in switchdev mode, and there is no scenario in which a representor will exist in the transport domain alongside a non-representor, it is not necessary to refresh the transport domain's TIRs upon changing the state of a representor's queues. Therefore, do not refresh TIRs upon such a change. Achieve this by adding an update_rx callback to the mlx5e_profile, which refreshes TIRs for non-representors and does nothing for representors, and replace instances of mlx5e_refresh_tirs() upon changing the state of the queues with update_rx(). Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-27net/mlx5e: Add XSK zero-copy supportMaxim Mikityanskiy1-7/+7
This commit adds support for AF_XDP zero-copy RX and TX. We create a dedicated XSK RQ inside the channel, it means that two RQs are running simultaneously: one for non-XSK traffic and the other for XSK traffic. The regular and XSK RQs use a single ID namespace split into two halves: the lower half is regular RQs, and the upper half is XSK RQs. When any zero-copy AF_XDP socket is active, changing the number of channels is not allowed, because it would break to mapping between XSK RQ IDs and channels. XSK requires different page allocation and release routines. Such functions as mlx5e_{alloc,free}_rx_mpwqe and mlx5e_{get,put}_rx_frag are generic enough to be used for both regular and XSK RQs, and they use the mlx5e_page_{alloc,release} wrappers around the real allocation functions. Function pointers are not used to avoid losing the performance with retpolines. Wherever it's certain that the regular (non-XSK) page release function should be used, it's called directly. Only the stats that could be meaningful for XSK are exposed to the userspace. Those that don't take part in the XSK flow are not considered. Note that we don't wait for WQEs on the XSK RQ (unlike the regular RQ), because the newer xdpsock sample doesn't provide any Fill Ring entries at the setup stage. We create a dedicated XSK SQ in the channel. This separation has its advantages: 1. When the UMEM is closed, the XSK SQ can also be closed and stop receiving completions. If an existing SQ was used for XSK, it would continue receiving completions for the packets of the closed socket. If a new UMEM was opened at that point, it would start getting completions that don't belong to it. 2. Calculating statistics separately. When the userspace kicks the TX, the driver triggers a hardware interrupt by posting a NOP to a dedicated XSK ICO (internal control operations) SQ, in order to trigger NAPI on the right CPU core. This XSK ICO SQ is protected by a spinlock, as the userspace application may kick the TX from any core. Store the pointers to the UMEMs in the net device private context, independently from the kernel. This way the driver can distinguish between the zero-copy and non-zero-copy UMEMs. The kernel function xdp_get_umem_from_qid does not care about this difference, but the driver is only interested in zero-copy UMEMs, particularly, on the cleanup it determines whether to close the XSK RQ and SQ or not by looking at the presence of the UMEM. Use state_lock to protect the access to this area of UMEM pointers. LRO isn't compatible with XDP, but there may be active UMEMs while XDP is off. If this is the case, don't allow LRO to ensure XDP can be reenabled at any time. The validation of XSK parameters typically happens when XSK queues open. However, when the interface is down or the XDP program isn't set, it's still possible to have active AF_XDP sockets and even to open new, but the XSK queues will be closed. To cover these cases, perform the validation also in these flows: 1. A new UMEM is registered, but the XSK queues aren't going to be created due to missing XDP program or interface being down. 2. MTU changes while there are UMEMs registered. Having this early check prevents mlx5e_open_channels from failing at a later stage, where recovery is impossible and the application has no chance to handle the error, because it got the successful return value for an MTU change or XSK open operation. The performance testing was performed on a machine with the following configuration: - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link The results with retpoline disabled, single stream: txonly: 33.3 Mpps (21.5 Mpps with queue and app pinned to the same CPU) rxdrop: 12.2 Mpps l2fwd: 9.4 Mpps The results with retpoline enabled, single stream: txonly: 21.3 Mpps (14.1 Mpps with queue and app pinned to the same CPU) rxdrop: 9.9 Mpps l2fwd: 6.8 Mpps Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-04mlx5: Move firmware flash implementation to devlinkJiri Pirko1-9/+0
Benefit from the devlink flash update implementation and ethtool fallback to it and move firmware flash implementation there. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-21Merge tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds1-0/+1
Pull SPDX update from Greg KH: "Here is a series of patches that add SPDX tags to different kernel files, based on two different things: - SPDX entries are added to a bunch of files that we missed a year ago that do not have any license information at all. These were either missed because the tool saw the MODULE_LICENSE() tag, or some EXPORT_SYMBOL tags, and got confused and thought the file had a real license, or the files have been added since the last big sweep, or they were Makefile/Kconfig files, which we didn't touch last time. - Add GPL-2.0-only or GPL-2.0-or-later tags to files where our scan tools can determine the license text in the file itself. Where this happens, the license text is removed, in order to cut down on the 700+ different ways we have in the kernel today, in a quest to get rid of all of these. These patches have been out for review on the linux-spdx@vger mailing list, and while they were created by automatic tools, they were hand-verified by a bunch of different people, all whom names are on the patches are reviewers. The reason for these "large" patches is if we were to continue to progress at the current rate of change in the kernel, adding license tags to individual files in different subsystems, we would be finished in about 10 years at the earliest. There will be more series of these types of patches coming over the next few weeks as the tools and reviewers crunch through the more "odd" variants of how to say "GPLv2" that developers have come up with over the years, combined with other fun oddities (GPL + a BSD disclaimer?) that are being unearthed, with the goal for the whole kernel to be cleaned up. These diffstats are not small, 3840 files are touched, over 10k lines removed in just 24 patches" * tag 'spdx-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (24 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 25 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 23 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 22 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 21 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 20 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 19 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 15 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 14 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 12 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 11 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 9 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 3 ...
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner1-0/+1
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-17net/mlx5e: Fix wrong xmit_more applicationTariq Toukan2-2/+3
Cited patch refactored the xmit_more indication while not preserving its functionality. Fix it. Fixes: 3c31ff22b25f ("drivers: mellanox: use netdev_xmit_more() helper") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: Turn on HW tunnel offload in all TIRsTariq Toukan1-0/+1
Hardware requires that all TIRs that steer traffic to the same RQ should share identical tunneled_offload_en value. For that, the tunneled_offload_en bit should be set/unset (according to the HW capability) for all TIRs', not only the ones dedicated for tunneled (inner) traffic. Fixes: 1b223dd39162 ("net/mlx5e: Fix checksum handling for non-stripped vlan packets") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-04-05net/mlx5e: Unify logic of MTU boundariesTariq Toukan1-3/+2
Expose a new helper that wraps the logic for setting the netdevice's MTU boundaries. Use it for the different components (Eth, rep, IPoIB). Set the netdevice min MTU to ETH_MIN_MTU, and the max according to both the FW capability and the kernel definition. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-19net/mlx5e: Wrap the open and apply of channels in one fail-safe functionTariq Toukan1-2/+2
Take into a function the common code structure of opening a side set of channels followed by a call to apply them. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-05net/mlx5e: Move RSS params to a dedicated structAya Levin1-1/+1
Remove RSS params from params struct under channels, and introduce a new struct with RSS configuration params under priv struct. There is no functional change here. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-19net/mlx5e: IPoIB, Reset QP after channels are closedDenis Drozdov1-1/+1
The mlx5e channels should be closed before mlx5i_uninit_underlay_qp puts the QP into RST (reset) state during mlx5i_close. Currently QP state incorrectly set to RST before channels got deactivated and closed, since mlx5_post_send request expects QP in RTS (Ready To Send) state. The fix is to keep QP in RTS state until mlx5e channels get closed and to reset QP afterwards. Also this fix is simply correct in order to keep the open/close flow symmetric, i.e mlx5i_init_underlay_qp() is called first thing at open, the correct thing to do is to call mlx5i_uninit_underlay_qp() last thing at close, which is exactly what this patch is doing. Fixes: dae37456c8ac ("net/mlx5: Support for attaching multiple underlay QPs to root flow table") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+2
net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/mlx5: WQ, fixes for fragmented WQ buffers APITariq Toukan1-3/+2
mlx5e netdevice used to calculate fragment edges by a call to mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct indication for queues smaller than a PAGE_SIZE, (broken by default on PowerPC, where PAGE_SIZE == 64KB). Here it is replaced by the correct new calls/API. Since (TX/RX) Work Queues buffers are fragmented, here we introduce changes to the API in core driver, so that it gets a stride index and returns the index of last stride on same fragment, and an additional wrapping function that returns the number of physically contiguous strides that can be written contiguously to the work queue. This obsoletes the following API functions, and their buggy usage in EN driver: * mlx5_wq_cyc_get_frag_size() * mlx5_wq_cyc_ctr2fragix() The new API improves modularity and hides the details of such calculation for mlx5e netdevice and mlx5_ib rdma drivers. New calculation is also more efficient, and improves performance as follows: Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size. Before: 16,477,619 pps After: 17,085,793 pps 3.7% improvement Fixes: 3a2f70331226 ("net/mlx5: Use order-0 allocations for all WQ types") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10net/mlx5e: Do not ignore netdevice TX/RX queues numberFeras Daoud2-5/+5
The current design of mlx5e driver ignores the netdevice TX/RX queues number for netdevices that RDMA IPoIB ULP creates. Instead, the queue number is initialized to the maximum number that mlx5 thinks best for performance. As a result, ULP drivers that choose to create a netdevice with queue number that is less than the maximum channels mlx5 creates, will get a memory corruption. This fix changes the mlx5e netdev logic to respect ULP netdevices TX/RX queue number and use it when creating resources instead of the maximum channel number. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10net/mlx5e: Initialize all netdev common structures in one placeSaeed Mahameed1-9/+1
Move all mlx5e generic structures initializations to mlx5e_netdev_init. The common structure new initializer function will be used to initialize mlx5 context for netlink created netdevs such as IPoIB mlx5 accelerated child netdevs. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com>
2018-10-10net/mlx5e: Gather common netdev init/cleanup functionality in one placeFeras Daoud3-24/+30
Introduce a helper init/cleanup function that initializes mlx5e generic netdev private structure, and use them from all profiles init/cleanup callbacks. This patch will also be helpful to initialize/cleanup netdevs that are not created by mlx5 driver, e.g: accelerated ipoib child netdevs. Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10RDMA/netdev: Hoist alloc_netdev_mqs out of the driverDenis Drozdov1-41/+46
netdev has several interfaces that expect to call alloc_netdev_mqs from the core code, with the driver only providing the arguments. This is incompatible with the rdma_netdev interface that returns the netdev directly. Thus re-organize the API used by ipoib so that the verbs core code calls alloc_netdev_mqs for the driver. This is done by allowing the drivers to provide the allocation parameters via a 'get_params' callback and then initializing an allocated netdev as a second step. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01net/mlx5e: Provide explicit directive if to create inner indirect tirsOr Gerlitz1-3/+3
Change the driver functions that deal with creating indirect tirs to get a flag telling if inner ttc is desired. A pre-step for enabling rss on the vport representors, where inner ttc is not needed. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5e: Move Q counters allocation and drop RQ to init_rxRoi Dayan1-1/+16
Not all profiles query the HW Q counters in update_stats() callback. HW Q couners are limited per device and in case of representors all their Q counters are allocated on the parent PF device. Avoid reundant allocation of HW Q counters by moving the allocation to init_rx profile callback. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5e: Make function mlx5i_grp_sw_update_stats() staticWei Yongjun1-1/+1
Fixes the following sparse warning: drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c:119:6: warning: symbol 'mlx5i_grp_sw_update_stats' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02net/mlx5e: IPoIB, Add ndo stats support for IPoIB child devicesFeras Daoud1-0/+1
Expose RX and TX counters by implementing ndo_get_stats64 operation for child devices. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02net/mlx5e: IPoIB, Add ndo stats support for IPoIB netdevicesFeras Daoud2-0/+43
Expose RX and TX counters by implementing ndo_get_stats64 operation for both parent devices. After this change, all the relevant statistics can be retrieved using ifconfig. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02net/mlx5e: IPoIB, Initialize max_opened_tc in mlx5i_init flowFeras Daoud1-0/+1
Enhanced ipoib does not initialize max_opened_tc causing wrong ethtool statistics. As mlx5e_grp_sw_update_stats relies on this variable, without this change, the TX statistics will not be updated. Fixes: 05909babce53 ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16Merge tag 'v4.18' into rdma.git for-nextJason Gunthorpe1-0/+4
Resolve merge conflicts from the -rc cycle against the rdma.git tree: Conflicts: drivers/infiniband/core/uverbs_cmd.c - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next - Merge removal of file->ucontext in for-next with new code in -rc drivers/infiniband/core/uverbs_main.c - for-next removed code from ib_uverbs_write() that was modified in for-rc Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-02RDMA/netdev: Use priv_destructor for netdev cleanupJason Gunthorpe1-18/+19
Now that the unregister_netdev flow for IPoIB no longer relies on external code we can now introduce the use of priv_destructor and needs_free_netdev. The rdma_netdev flow is switched to use the netdev common priv_destructor instead of the special free_rdma_netdev and the IPOIB ULP adjusted: - priv_destructor needs to switch to point to the ULP's destructor which will then call the rdma_ndev's in the right order - We need to be careful around the error unwind of register_netdev as it sometimes calls priv_destructor on failure - ULPs need to use ndo_init/uninit to ensure proper ordering of failures around register_netdev Switching to priv_destructor is a necessary pre-requisite to using the rtnl new_link mechanism. The VNIC user for rdma_netdev should also be revised, but that is left for another patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-31net/mlx5e: IPoIB, Set the netdevice sw mtu in ipoib enhanced flowFeras Daoud1-0/+4
After introduction of the cited commit, mlx5e_build_nic_params receives the netdevice mtu in order to set the sw_mtu of mlx5e_params. For enhanced IPoIB, the netdevice mtu is not set in this stage, therefore, the initial sw_mtu equals zero. As a result, the hw_mtu of the receive queue will be calculated incorrectly causing traffic issues. To fix this issue, query for port mtu before building the nic params. Fixes: 472a1e44b349 ("net/mlx5e: Save MTU in channels params") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5i: Use compilation flag in IPOIB headerTariq Toukan1-0/+3
If CONFIG_MLX5_CORE_IPOIB is not set, compile-out the IPOIB related headers. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25net/mlx5e: TX, Use actual WQE size for SQ edge fillTariq Toukan1-0/+23
We fill SQ edge with NOPs to avoid WQEs wrap. Here, instead of doing that in advance for the maximum possible WQE size, we do it on-demand using the actual WQE size. We re-order some parts in mlx5e_sq_xmit to finish the calculation of WQE size (ds_cnt) before doing any writes to the WQE buffer. When SQ work queue is fragmented (introduced in an downstream patch), dealing with WQE wraps becomes more frequent. This change would drastically reduce the overhead in this case. Performance tests: ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet rate of 64B packets, single transmit ring, size 8K. Before: 14.9 Mpps After: 15.8 Mpps Improvement of 6%. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Derive Striding RQ size from MTUTariq Toukan2-2/+2
In Striding RQ, each WQE serves multiple packets (hence called Multi-Packet WQE, MPWQE). The size of a MPWQE is constant (currently 256KB). Upon a ringparam set operation, we calculate the number of MPWQEs per RQ. For this, first it is needed to determine the number of packets that can reside within a single MPWQE. In this patch we use the actual MTU size instead of ETH_DATA_LEN for this calculation. This implies that a change in MTU might require a change in Striding RQ ring size. In addition, this obsoletes some WQEs-to-packets translation functions and helps delete ~60 LOC. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: Save MTU in channels paramsTariq Toukan1-10/+13
Knowing the MTU is required for RQ creation flow. By our design, channels creation flow is totally isolated from priv/netdev, and can be completed with access to channels params and mdev. Adding the MTU to the channels params helps preserving that. In addition, we save it in RQ to make its access faster in datapath checks. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-30net/mlx5e: IPoIB, Fix spelling mistakeTalat Batheesh1-1/+1
Fix spelling mistake in debug message text. "dettaching" -> "detaching" Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27net/mlx5e: Add ethtool priv-flag for Striding RQTariq Toukan1-1/+2
Add a control private flag in ethtool to enable/disable Striding RQ feature. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27net/mlx5e: Do not reset Receive Queue params on every type changeTariq Toukan1-1/+2
Do not implicit a call to mlx5e_init_rq_type_params() upon every change in RQ type. It should be called only on channels creation. Fixes: 2fc4bfb7250d ("net/mlx5e: Dynamic RQ type infrastructure") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds4-10/+28
Pull networking updates from David Miller: 1) Significantly shrink the core networking routing structures. Result of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf 2) Add netdevsim driver for testing various offloads, from Jakub Kicinski. 3) Support cross-chip FDB operations in DSA, from Vivien Didelot. 4) Add a 2nd listener hash table for TCP, similar to what was done for UDP. From Martin KaFai Lau. 5) Add eBPF based queue selection to tun, from Jason Wang. 6) Lockless qdisc support, from John Fastabend. 7) SCTP stream interleave support, from Xin Long. 8) Smoother TCP receive autotuning, from Eric Dumazet. 9) Lots of erspan tunneling enhancements, from William Tu. 10) Add true function call support to BPF, from Alexei Starovoitov. 11) Add explicit support for GRO HW offloading, from Michael Chan. 12) Support extack generation in more netlink subsystems. From Alexander Aring, Quentin Monnet, and Jakub Kicinski. 13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From Russell King. 14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso. 15) Many improvements and simplifications to the NFP driver bpf JIT, from Jakub Kicinski. 16) Support for ipv6 non-equal cost multipath routing, from Ido Schimmel. 17) Add resource abstration to devlink, from Arkadi Sharshevsky. 18) Packet scheduler classifier shared filter block support, from Jiri Pirko. 19) Avoid locking in act_csum, from Davide Caratti. 20) devinet_ioctl() simplifications from Al viro. 21) More TCP bpf improvements from Lawrence Brakmo. 22) Add support for onlink ipv6 route flag, similar to ipv4, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits) tls: Add support for encryption using async offload accelerator ip6mr: fix stale iterator net/sched: kconfig: Remove blank help texts openvswitch: meter: Use 64-bit arithmetic instead of 32-bit tcp_nv: fix potential integer overflow in tcpnv_acked r8169: fix RTL8168EP take too long to complete driver initialization. qmi_wwan: Add support for Quectel EP06 rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK ipmr: Fix ptrdiff_t print formatting ibmvnic: Wait for device response when changing MAC qlcnic: fix deadlock bug tcp: release sk_frag.page in tcp_disconnect ipv4: Get the address of interface correctly. net_sched: gen_estimator: fix lockdep splat net: macb: Handle HRESP error net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring ipv6: addrconf: break critical section in addrconf_verify_rtnl() ipv6: change route cache aging logic i40e/i40evf: Update DESC_NEEDED value to reflect larger value bnxt_en: cleanup DIM work on device shutdown ...
2018-01-30Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.gitJason Gunthorpe1-2/+3
To resolve conflicts in: drivers/infiniband/hw/mlx5/main.c drivers/infiniband/hw/mlx5/qp.c From patches merged into the -rc cycle. The conflict resolution matches what linux-next has been carrying. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoringGal Pressman1-1/+1
On TTC table creation, the indirection TIRs should be used instead of the inner indirection TIRs. Fixes: 1ae1df3a1193 ("net/mlx5e: Refactor RSS related objects and code") Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Shalom Lagziel <shaloml@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19net/mlx5e: Refactor RSS related objects and codeOr Gerlitz1-6/+16
In order to use RSS for hairpin, we refactor the code that deals with setup of the TTC steering tables. This is done using an interim ttc params object that has the flow table attributes, TIR numbers, etc. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-18net/mlx5e: Fix trailing semicolonLuis de Bethencourt1-1/+1
The trailing semicolon is an empty statement that does no operation. Removing it since it doesn't do anything. Signed-off-by: Luis de Bethencourt <luisbg@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+2
Overlapping changes all over. The mini-qdisc bits were a little bit tricky, however. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-12net/mlx5e: Remove timestamp set from netdevice open flowFeras Daoud1-1/+2
To avoid configuration override, timestamp set call will be moved from the netdevice open flow to the init flow. By this, a close-open procedure will not override the timestamp configuration. In addition, the change will rename mlx5e_timestamp_set function to be mlx5e_timestamp_init. Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09net/mlx5e: IPoIB, Fix spelling mistake "functionts" -> "functions"Gal Pressman1-1/+1
Fix trivial spelling mistake: "functionts" -> "functions". Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09net/mlx5e: IPoIB, Add ethtool support to get child time stamping parametersFeras Daoud1-0/+1
Add support to get time stamping capabilities using ethtool for child interface. Usage example: ethtool -T CHILD-DEVNAME This change reuses the functionality of parent devices and does not introduce any new logic. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09net/mlx5e: IPoIB, Add PTP ioctl support for child interfaceFeras Daoud3-2/+9
Add support to control precision time protocol on child interfaces using ioctl. This commit changes the following: - Change parent ioctl function to be non static - Reuse the parent ioctl function in child devices Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>