aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers/net/ethernet/mellanox/mlx5/core/lag.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-03-05net/mlx5: Clear LAG notifier pointer after unregisterEli Cohen1-1/+3
After returning from unregister_netdevice_notifier_dev_net(), set the notifier_call field to NULL so successive call to mlx5_lag_add() will function as expected. Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-27mlx5: register lag notifier for init network namespace onlyJiri Pirko1-8/+3
The current code causes problems when the unregistering netdevice could be different then the registering one. Since the check in mlx5_lag_netdev_event() does not allow any other network namespace anyway, fix this by registerting the lag notifier per init network namespace only. Fixes: d48834f9d4b4 ("mlx5: Use dev_net netdevice notifier registrations") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Aya Levin <ayal@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-27mlx5: Use dev_net netdevice notifier registrationsJiri Pirko1-3/+5
Register the dev_net notifier and allow the per-net notifier to follow the device into different namespace. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01net/mlx5: LAG, Use port enumeratorsErez Alfasi1-30/+35
Instead of using explicit array indexes, simply use ports enumerators to make the code more readable. Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature") Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: E-Switch, Refactor eswitch SR-IOV interfaceBodong Wang1-2/+2
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-11net/mlx5: Remove redundant lag function to get pf numRoi Dayan1-21/+0
The function is not being used. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5e: Activate HW multipath and handle port affinity based on FIB eventsRoi Dayan1-0/+8
To support multipath offload we are going to track SW multipath route and related nexthops. To do that we register to FIB notifier and handle the route and next-hops events and reflect that as port affinity to HW. When there is a new multipath route entry that all next-hops are the ports of an HCA we will activate LAG in HW. Egress wise, we use HW LAG as the means to emulate multipath on current HW which doesn't support port selection based on xmit hash. In the presence of multiple VFs which use multiple SQs (send queues) this yields fairly good distribution. HA wise, HW LAG buys us the ability for a given RQ (receive queue) to receive traffic from both ports and for SQs to migrate xmitting over the active port if their base port fails. When the route entry is being updated to single path we will update the HW port affinity to use that port only. If a next-hop becomes dead we update the HW port affinity to the living port. When all next-hops are alive again we reset the affinity to default. Due to FW/HW limitations, when a route is deleted we are not disabling the HW LAG since doing so will not allow us to enable it again while VFs are bounded. Typically this is just a temporary state when a routing daemon removes dead routes and later adds them back as needed. This patch only handles events for AF_INET. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Use own workqueue for lag netdev events processingRoi Dayan1-1/+8
Instead of using the system workqueue, allocate our own workqueue. This workqueue will be used to handle more work in the next patch. This patch doesn't change functionality. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Expose lag operations in header fileRoi Dayan1-48/+8
The change is a refactoring step towards a multipath use case. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-03-01net/mlx5: Use unsigned int bit instead of bool as a struct memberRoi Dayan1-1/+1
This fix checkpatch check CHECK: Avoid using bool structure members because of possible alignment issues Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-15net/mlx5: Correctly set LAG mode for ECPFBodong Wang1-0/+5
When bonding is added, driver assumes that it's RoCE LAG if no VF is enabled. This is not enough for ECPF as the VF is enabled in host PF side. LAG should only choose RoCE mode when both slave devices meet conditions below: 1. E-Switch offloads mode is NONE. 2. No VF is enabled. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25net/mlx5e: Move to use common phys port names for vport representorsOr Gerlitz1-0/+21
With VF LAG commit 491c37e49b48 "net/mlx5e: In case of LAG, one switch parent id is used for all representors", both uplinks and all the VFs (on both of them) get the same switchdev id. This cause the provisioning system method to identify the rep of a given VF from the parent PF PCI device using switchev id and physical port name to break, since VFm of PF0 will have the (id, name) as VFm of PF1. To fix that, we align to use the framework agreed upstream and set by nfp commit 168c478e107e "nfp: wire get_phys_port_name on representors": $ cat /sys/class/net/eth4_*/phys_port_name p0 pf0vf0 pf0vf1 Now, the names will be different, e.g. pf0vf0 vs. pf1vf0. Fixes: 491c37e49b48 ("net/mlx5e: In case of LAG, one switch parent id is used for all representors") Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Waleed Musa <waleedm@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20net/mlx5: Fix LAG requirement when CONFIG_MLX5_ESWITCH is offAviv Heller1-5/+8
If CONFIG_MLX5_ESWITCH is not defined, test for SR-IOV being disabled, instead of calling e-switch LAG prereq routine. Since LAG with SRIOV is allowed only when switchdev mode is on. Fixes: eff849b2c669 ("net/mlx5: Allow/disallow LAG according to pre-req only") Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Handle LAG FW commands failure gracefullyAviv Heller1-21/+70
When create_lag or destroy_lag FW commands fail, display an appropriate error message, and try to recover, if possible. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Make RoCE and SR-IOV LAG modes explicitAviv Heller1-16/+63
With the introduction of SR-IOV LAG, checking whether LAG is active is no longer good enough, since RoCE and SR-IOV LAG each entails different behavior by both the core and infiniband drivers. This patch introduces facilities to discern LAG type, in addition to mlx5_lag_is_active(). These are implemented in such a way as to allow more complex mode combinations in the future. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Rename mlx5_lag_is_bonded() to __mlx5_lag_is_active()Aviv Heller1-9/+9
The new name better represents the function's aim, and sets a precedent for a '__' prefix for internal, non-locked versions of LAG APIs. Signed-off-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Allow/disallow LAG according to pre-req onlyRabie Loulou1-46/+16
Remove the lag forbid/allow functions, change the lag prereq check to run in the do-bond logic, so every change in the prereq state will cause LAG to be disabled/enabled accordingly after the next do-bond run. Add lag update function, so every component which changes the prereq state and want the LAG to re-calc the conditions can call the update function. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Adjustments for the activate LAG logic to run under sriovRabie Loulou1-12/+21
When HW lag is set/unset, roce must not be enabled on the port, as such we wrap such changes with roce enable/disable either directly or through re-creation of IB device. Currently, lag and sriov are mutually exclusive, so by definition this code doesn't run under sriov. Towards changing this exclusion, we need to make sure that roce will not be enabled on the eswitch manager port under sriov since this is requirement of the switchdev mode. We are going strict here and avoiding this all together under sriov. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Fold the modify lag code into functionShahar Klein1-16/+28
Handle the code of modifying the lag affinity within a separate function. Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Add lag affinity info to logRoi Dayan1-0/+3
Report the initial LAG port affinity upon LAG creation. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14net/mlx5: Split the activate lag function into two routinesRoi Dayan1-4/+10
Split the activate lag function in order to be symmetric with the deactivate lag function. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-21IB/mlx5: Fix congestion counters in LAG modeMajd Dibbiny1-0/+56
Congestion counters are counted and queried per physical function. When working in LAG mode, CNP packets can be sent or received on both of the functions, thus congestion counters should be aggregated from the two physical functions. Fixes: e1f24a79f424 ("IB/mlx5: Support congestion related counters") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-07-27net/mlx5: Consider tx_enabled in all modes on remapAviv Heller1-15/+10
The tx_enabled lag event field is used to determine whether a slave is active. Current logic uses this value only if the mode is active-backup. However, LACP mode, although considered a load balancing mode, can mark a slave as inactive in certain situations (e.g., LACP timeout). This fix takes the tx_enabled value into account when remapping, with no respect to the LAG mode (this should not affect the behavior in XOR mode, since in this mode both slaves are marked as active). Fixes: 7907f23adc18 (net/mlx5: Implement RoCE LAG feature) Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16net/mlx5: Undo LAG upon request to create virtual functionsMoni Shoua1-8/+61
LAG cannot work if virtual functions are present. Therefore, if LAG is configured, the attempt to create virtual functions will fail. This gives precedence to LAG over SRIOV which is not the desired behavior as users might want to use the bonding/teaming driver also want to work with SRIOV. In that case we don't want to force an order of actions, first create virtual functions and only than configure a bonding/teaming net device. To fix, if LAG is configured during a request to create virtual functions, remove it and continue. We ignore ENODEV when trying to forbid lag. This makes sense because "No such device" means that lag is forbidden anyway. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-06-16net/mlx5: Avoid using multiple blank linesOr Gerlitz1-2/+0
Fixed bunch of this checkpatch complaint: CHECK: Please don't use multiple blank lines Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5: Avoid dereferencing uninitialized pointerTalat Batheesh1-2/+3
In NETDEV_CHANGEUPPER event the upper_info field is valid only when linking is true. Otherwise it should be ignored. Fixes: 7907f23adc18 (net/mlx5: Implement RoCE LAG feature) Signed-off-by: Talat Batheesh <talatb@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10net/mlx5: Organize device list API in one placeMohamad Haj Yahia1-19/+5
Hide the exposed (external) mlx5_dev_list and mlx5_intf_mutex and expose an organized modular API to manage and manipulate mlx5 devices list. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-18net/mlx5: Configure IB devices according to LAG stateAviv Heller1-0/+17
When mlx5_ib is loaded, we would like each card's IB devices to be added according to its LAG state (one IB device, instead of two, is to be added if LAG is active). Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18net/mlx5: Vport LAG creation supportAviv Heller1-0/+22
Add interfaces for issuing CREATE_VPORT_LAG and DESTROY_VPORT_LAG commands. Used for receiving PF1's eth traffic on PF0's root ft. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18net/mlx5: LAG and SRIOV cannot be used togetherAviv Heller1-0/+6
Until support will be added for RoCE LAG SRIOV. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18net/mlx5: Get RoCE netdevAviv Heller1-0/+27
Used by IB driver for determining the IB bond device's netdev, when LAG is active. Returns PF0's netdev if mode is not active-backup, or the PF netdev of the active slave when mode is active-backup. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-08-18net/mlx5: Implement RoCE LAG featureAviv Heller1-0/+530
Available on dual port cards only, this feature keeps track, using netdev LAG events, of the bonding and link status of each port's PF netdev. When both of the card's PF netdevs are enslaved to the same bond/team master, and only them, LAG state is active. During LAG, only one IB device is present for both ports. In addition to the above, this commit includes FW commands used for managing the LAG, new facilities for adding and removing a single device by interface, and port remap functionality according to bond events. Please note that this feature is currently used only for mimicking Ethernet bonding for RoCE - netdevs functionality is not altered, and their bonding continues to be managed solely by bond/team driver. Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>