aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-09-23mlxsw: Make MLXSW_SP1_FWREV_MINOR a hard requirementPetr Machata1-1/+4
Up until now, mlxsw tolerated firmware versions that weren't exactly matching the required version, if the branch number matched. That allowed the users to test various firmware versions as long as they were on the right branch. On the other hand, it made it impossible for mlxsw to put a hard lower bound on a version that fixes all problems known to date. If a user had a somewhat older FW version installed, mlxsw would start up just fine, possibly performing non-optimally as it would use features that trigger problematic behavior. Therefore tweak the check to accept any FW version that is: - on the same branch as the preferred version, and - the same as or newer than the preferred version. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Tweak SBMM configurationPetr Machata1-15/+15
The SBMM register configures shared buffer allocation and settings for MC packets according to switch priority. The recommended values are no reserved buffer and alpha of 1/4, which corresponds to buf_max of 6. Update mlxsw_sp_sb_mms accordingly. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Configure MC poolPetr Machata1-8/+10
Pool 15 (indexed as 8) is dedicated to MC traffic. Its configuration has been kept at default, because the table-based configuration wasn't expressive enough to allow the explicit configuration. Now that the configuration of pool 15 can be described, do so. The MC pool should have infinite size, infinite per-TC quota, and per-port limit of 90K. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Allow configuration of static poolsPetr Machata1-17/+33
Some pools configured through the sb_pm entries may have by default static size. The MC pool is now not explicitly configured, however it gets configured as static implicitly by 0-initializing sb->prs, and a follow-up patch adds an explicit configuration to the same effect. To support this, pass max_buff taken from sb_pm and sb_cm entries through cell conversion before handing it to mlxsw_sp_sb_pm_write(), if the pool that the sb_pm entry configures is statically-sized. To keep current behavior, update mlxsw_sp_sb_cms_egress[] to denote buffer sizes in bytes (assuming Spectrum 1 cell sizes, which the original code assumed as well) instead of cells. Note that a follow-up patch changes this to infinite size. Also tweak a comment at SBMM configuration to remain true now that statically-sized pools exist. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Pass SBPM min_size in cellsPetr Machata1-1/+3
The SBPM register configures the shared buffer allocation and configuration per port and pool. The min_buff value is the buffer size dedicated to this single function, and is configured in cells. Currently, all sb_pm entries have 0 for min_buff, and therefore the actual unit is immaterial. However, in a follow-up patch we want to add entries with non-zero minimum. Therefore pass the min_buff from the sb_pm table through the cell conversion before handing it over to mlxsw_sp_sb_pm_write(). Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Allow an infinite maximum for per-TC pool limitPetr Machata2-8/+26
The SBCM register configures the shared buffer configuration according to port and TC. So far all pools have had a dynamic size, where the infinite size is easy to express by using max_buff of 0xff. However the MC pool should be configured with static size, and the infinite size thus needs to be set using the field SBCM.infi_max. Therefore add the field infi_max to the SBCM register and to mlxsw_reg_sbcm_pack(). Extend mlxsw_sp_sb_cm_write() to handle infinite sizes as well. Report infinite pool limits as if the limit actually were the total shared buffer size. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Allow pools of infinite sizePetr Machata2-7/+31
The MC pool should have an infinite size (i.e. no quota). To that end, add infi_size to the SBPR register and extend mlxsw_reg_sbpr_pack(). Also add MLXSW_SP_SB_INFI to denote buffers that should have an infinite size. Change mlxsw_sp_sb_pr_write() to take as parameter byte size, instead of cell size, and add the special handling of infinite buffers. Report pools with infinite size as if they actually take the full shared buffer size. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Keep shared buffer size in mlxsw_sp_sbPetr Machata1-3/+5
Entities of infinite size will be reported as if they had the maximum size allowed by the chip. To that end, keep track of maximum shared buffer size in mlxsw_sp->sb. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Split TC_COUNT into ingress and egressPetr Machata1-13/+30
Current code assumes that ingress and egress has the same number of traffic classes. Since the introduction of MC-aware mode that assumption hasn't held anymore, and there have been 16 TCs on the egress as opposed to 8 on ingress. Break the assumption of symmetry by splitting the artifacts related to shared-buffer TC counting to ingress and egress parts. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum_buffers: Use devlink pool indices throughoutPetr Machata1-205/+170
Currently, mlxsw assumes that each ingress pool has its egress counterpart, and that pool index for purposes of caching matches the index with which the hardware should be configured. As we want to expose the MC pool, both of these assumptions break. Instead, maintain the pool index as long as possible. Unify ingress and egress caches and use the pool index as cache index as well. Only translate to FW pool numbering when actually packing the registers. This simplifies things considerably, as the pool index is the only quantity necessary to uniquely identify a pool, and the pool/direction split is not necessary until firmware is talked to. To support the mapping between pool indices and pool numbers and directions, which is not neatly mathematical anymore, introduce a pool descriptor table, indexed by pool index, to facilitate the translation. Include the MC pool in the descriptor table as well, so that it can be referenced from mlxsw_sp_sb_cms_egress. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20mlxsw: spectrum: Init shaper for TCs 8..15Petr Machata1-0/+7
With introduction of MC-aware mode to mlxsw, it became necessary to configure TCs above 7 as well. There is now code in mlxsw to disable ETS for these higher classes, but disablement of max shaper was neglected. By default, max shaper is currently disabled to begin with, so the problem is just cosmetic. However, for symmetry, do like we do for ETS configuration, and call mlxsw_sp_port_ets_maxrate_set() for both TC i and i + 8. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller9-64/+83
2018-09-12net: ethernet: Use DIV_ROUND_UP instead of reimplementing its functionzhong jiang3-3/+3
DIV_ROUND_UP has implemented the code-opened function. Therefore, just replace the implementation with DIV_ROUND_UP. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packetsAlaa Hleihel1-0/+12
CHECKSUM_COMPLETE is not applicable to SCTP protocol. Setting it for SCTP packets leads to CRC32c validation failure. Fixes: bbceefce9adf ("net/mlx5e: Support RX CHECKSUM_COMPLETE") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5e: Set ECN for received packets using CQE indicationNatali Shechtman3-5/+35
In multi-host (MH) NIC scheme, a single HW port serves multiple hosts or sockets on the same host. The HW uses a mechanism in the PCIe buffer which monitors the amount of consumed PCIe buffers per host. On a certain configuration, under congestion, the HW emulates a switch doing ECN marking on packets using ECN indication on the completion descriptor (CQE). The driver needs to set the ECN bits on the packet SKB, such that the network stack can react on that, this commit does that. Signed-off-by: Natali Shechtman <natali@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5e: Replace PTP clock lock from RW lock to seq lockShay Agroskin2-20/+22
Changed "priv.clock.lock" lock from 'rw_lock' to 'seq_lock' in order to improve packet rate performance. Tested on Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz. Sent 64b packets between two peers connected by ConnectX-5, and measured packet rate for the receiver in three modes: no time-stamping (base rate) time-stamping using rw_lock (old lock) for critical region time-stamping using seq_lock (new lock) for critical region Only the receiver time stamped its packets. The measured packet rate improvements are: Single flow (multiple TX rings to single RX ring): without timestamping: 4.26 (M packets)/sec with rw-lock (old lock): 4.1 (M packets)/sec with seq-lock (new lock): 4.16 (M packets)/sec 1.46% improvement Multiple flows (multiple TX rings to six RX rings): without timestamping: 22 (M packets)/sec with rw-lock (old lock): 11.7 (M packets)/sec with seq-lock (new lock): 21.3 (M packets)/sec 82.05% improvement The packet rate improvement is due to the lack of atomic operations for the 'readers' by the seq-lock. Since there are much more 'readers' than 'writers' contention on this lock, almost all atomic operations are saved. this results in a dramatic decrease in overall cache misses. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5e: Move Q counters allocation and drop RQ to init_rxRoi Dayan4-25/+55
Not all profiles query the HW Q counters in update_stats() callback. HW Q couners are limited per device and in case of representors all their Q counters are allocated on the parent PF device. Avoid reundant allocation of HW Q counters by moving the allocation to init_rx profile callback. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5e: Move mlx5e_priv_flags into en_ethtool.cKamal Heib2-7/+7
Move the definition of mlx5e_priv_flags into en_ethtool.c because it's only used there. Fixes: 4e59e2888139 ("net/mlx5e: Introduce net device priv flags infrastructure") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Add flow counters idrVlad Buslov1-4/+33
Previous patch in series changed flow counter storage structure from rb_tree to linked list in order to improve flow counter traversal performance. The drawback of such solution is that flow counter lookup by id becomes linear in complexity. Store pointers to flow counters in idr in order to improve lookup performance to logarithmic again. Idr is non-intrusive data structure and doesn't require extending flow counter struct with new elements. This means that idr can be used for lookup, while linked list from previous patch is used for traversal, and struct mlx5_fc size is <= 2 cache lines. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Store flow counters in a listVlad Buslov2-49/+41
In order to improve performance of flow counter stats query loop that traverses all configured flow counters, replace rb_tree with double-linked list. This change improves performance of traversing flow counters by removing the tree traversal. (profiling data showed that call to rb_next was most top CPU consumer) However, lookup of flow flow counter in list becomes linear, instead of logarithmic. This problem is fixed by next patch in series, which adds idr for fast lookup. Idr is to be used because it is not an intrusive data structure and doesn't require adding any new members to struct mlx5_fc, which allows its control data part to stay <= 1 cache line in size. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Add new list to store deleted flow countersVlad Buslov2-23/+13
In order to prevent flow counters stats work function from traversing whole flow counters tree while searching for deleted flow counters, new list to store deleted flow counters is added to struct mlx5_fc_stats. Lockless NULL-terminated single linked list data type is used due to following reasons: - This use case only needs to add single element to list and remove/iterate whole list. Lockless list doesn't require any additional synchronization for these operations. - First cache line of flow counter data structure only has space to store single additional pointer, which precludes usage of double linked list. Remove flow counter 'deleted' flag that is no longer needed. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Change flow counters addlist type to single linked listVlad Buslov2-26/+22
In order to prevent flow counters stats work function from traversing whole flow counters tree while searching for deleted flow counters, new list to store deleted flow counters will be added to struct mlx5_fc_stats. However, the flow counter structure itself has no space left to store any more data in first cache line. To free space that is needed to store additional list node, convert current addlist double linked list (two pointers per node) to atomic single linked list (one pointer per node). Lockless NULL-terminated single linked list data type doesn't require any additional external synchronization for operations used by flow counters module (add single new element, remove all elements from list and traverse them). Remove addlist_lock that is no longer needed. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Fix possible deadlock from lockdep when adding fte to fgRoi Dayan1-37/+37
This is a false positive report due to incorrect nested lock annotations as we lock multiple fgs with the same subclass. Instead of locking all fgs only lock the one being used as was done before. Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5e: Ethtool steering, fix udp source port valueSaeed Mahameed1-1/+1
Copy and paste bug was introduced in the offending patch. We need to write udp source port value into the headers value and not headers criteria "mask". Fixes: 142644f8a1f8 ("net/mlx5e: Ethtool steering flow parsing refactoring") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Check for error in mlx5_attach_interfaceHuy Nguyen1-5/+10
Currently, mlx5_attach_interface does not check for error after calling intf->attach or intf->add. When these two calls fails, the client is not initialized and will cause issues such as kernel panic on invalid address in the teardown path (mlx5_detach_interface) Fixes: 737a234bb638 ("net/mlx5: Introduce attach/detach to interface API") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Consider PCI domain in search for next devDaniel Jurgens1-3/+4
The PCI BDF is not unique. PCI domain must also be considered when searching for the next physical device during lag setup. Example below: mlx5_core 0000:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0) mlx5_core 0000:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0) mlx5_core 0001:01:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0) mlx5_core 0001:01:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(128) RxCqeCmprss(0) Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Fix not releasing read lock when adding flow rulesRoi Dayan1-0/+2
If building match list fg fails and we never jumped to search_again_locked label then the function returned without unlocking the read lock. Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: E-Switch, Fix memory leak when creating switchdev mode FDB tablesRaed Salem1-0/+1
The memory allocated for the slow path table flow group input structure was not freed upon successful return, fix that. Fixes: 1967ce6ea5c8 ("net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Use u16 for Work Queue buffer strides offsetTariq Toukan1-1/+1
Minimal stride size is 16. Hence, the number of strides in a fragment (of PAGE_SIZE) is <= PAGE_SIZE / 16 <= 4K. u16 is sufficient to represent this. Fixes: d7037ad73daa ("net/mlx5: Fix QP fragmented buffer allocation") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Use u16 for Work Queue buffer fragment sizeTariq Toukan2-3/+3
Minimal stride size is 16. Hence, the number of strides in a fragment (of PAGE_SIZE) is <= PAGE_SIZE / 16 <= 4K. u16 is sufficient to represent this. Fixes: 388ca8be0037 ("IB/mlx5: Implement fragmented completion queue (CQ)") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Fix debugfs cleanup in the device init/remove flowJack Morgenstein1-2/+4
When initializing the device (procedure init_one), the driver calls mlx5_pci_init to perform pci initialization. As part of this initialization, mlx5_pci_init creates a debugfs directory. If this creation fails, init_one aborts, returning failure to the caller (which is the probe method caller). The main reason for such a failure to occur is if the debugfs directory already exists. This can happen if the last time mlx5_pci_close was called, debugfs_remove (silently) failed due to the debugfs directory not being empty. Guarantee that such a debugfs_remove failure will not occur by instead calling debugfs_remove_recursive in procedure mlx5_pci_close. Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5: Fix use-after-free in self-healing flowJack Morgenstein2-4/+12
When the mlx5 health mechanism detects a problem while the driver is in the middle of init_one or remove_one, the driver needs to prevent the health mechanism from scheduling future work; if future work is scheduled, there is a problem with use-after-free: the system WQ tries to run the work item (which has been freed) at the scheduled future time. Prevent this by disabling work item scheduling in the health mechanism when the driver is in the middle of init_one() or remove_one(). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-05net/mlx5e: Make function mlx5i_grp_sw_update_stats() staticWei Yongjun1-1/+1
Fixes the following sparse warning: drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c:119:6: warning: symbol 'mlx5i_grp_sw_update_stats' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05mlxsw: spectrum_buffers: Set up a dedicated pool for BUM trafficPetr Machata1-8/+8
MC-aware mode was recently enabled by mlxsw on Spectrum switches in commit 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports"). Unfortunately, testing has shown that the fix is incomplete and in the presented form actually makes the problem even worse, because any amount of MC traffic causes UC disruption. The reason for this is that currently, mlxsw configures the MC-specific TCs (8..15) to map to pool 0. It also configures a maximum buffer size of 0, but for MC traffic that maximum is disregarded and not part of the quota. Therefore MC traffic is always admitted to the egress buffer. Fix the configuration by directing the MC TCs into pool 15, which is dedicated to MC traffic and recognized as such by the silicon. Fixes: 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+3
2018-09-04net/mlx5: Fix SQ offset in QPs with small RQTariq Toukan1-2/+3
Correct the formula for calculating the RQ page remainder, which should be in byte granularity. The result will be non-zero only for RQs smaller than PAGE_SIZE, as an RQ size is a power of 2. Divide this by the SQ stride (MLX5_SEND_WQE_BB) to get the SQ offset in strides granularity. Fixes: d7037ad73daa ("net/mlx5: Fix QP fragmented buffer allocation") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02net/mlx5e: IPoIB, Use priv stats in completion rx flowFeras Daoud1-1/+2
Since the RQs are shared between all pkey interfaces, the stats should be taken from where the per-ring stats are stored instead of the parent RQ. Fixes: 4c6c615e3f30 ("net/mlx5e: IPoIB, Add PKEY child interface nic profile") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02net/mlx5e: IPoIB, Add ndo stats support for IPoIB child devicesFeras Daoud1-0/+1
Expose RX and TX counters by implementing ndo_get_stats64 operation for child devices. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02net/mlx5e: IPoIB, Add ndo stats support for IPoIB netdevicesFeras Daoud2-0/+43
Expose RX and TX counters by implementing ndo_get_stats64 operation for both parent devices. After this change, all the relevant statistics can be retrieved using ifconfig. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-02net/mlx5e: IPoIB, Initialize max_opened_tc in mlx5i_init flowFeras Daoud1-0/+1
Enhanced ipoib does not initialize max_opened_tc causing wrong ethtool statistics. As mlx5e_grp_sw_update_stats relies on this variable, without this change, the TX statistics will not be updated. Fixes: 05909babce53 ("net/mlx5e: Avoid reset netdev stats on configuration changes") Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-25mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridgeIdo Schimmel3-0/+33
When a bridge device is removed, the VLANs are flushed from each configured port. This causes the ports to decrement the reference count on the associated FIDs (filtering identifier). If the reference count of a FID is 1 and it has a RIF (router interface), then this RIF is destroyed. However, if no port is member in the VLAN for which a RIF exists, then the RIF will continue to exist after the removal of the bridge. To reproduce: # ip link add name br0 type bridge vlan_filtering 1 # ip link set dev swp1 master br0 # ip link add link br0 name br0.10 type vlan id 10 # ip address add 192.0.2.0/24 dev br0.10 # ip link del dev br0 The RIF associated with br0.10 continues to exist. Fix this by iterating over all the bridge device uppers when it is destroyed and take care of destroying their RIFs. Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-21net_sched: remove list_head from tc_actionCong Wang3-16/+12
After commit 90b73b77d08e, list_head is no longer needed. Now we just need to convert the list iteration to array iteration for drivers. Fixes: 90b73b77d08e ("net: sched: change action API to use array of pointers to actions") Cc: Jiri Pirko <jiri@mellanox.com> Cc: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-2/+2
Pull networking fixes from David Miller: 1) Fix races in IPVS, from Tan Hu. 2) Missing unbind in matchall classifier, from Hangbin Liu. 3) Missing act_ife action release, from Vlad Buslov. 4) Cure lockdep splats in ila, from Cong Wang. 5) veth queue leak on link delete, from Toshiaki Makita. 6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses. From Kees Cook. 7) RCU usage fixup in XDP, from Tariq Toukan. 8) Two TCP ULP fixes from Daniel Borkmann. 9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner Kallweit. 10) Always take tcf_lock with BH disabled, otherwise we can deadlock with rate estimator code paths. From Vlad Buslov. 11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume properly. From Jian-Hong Pan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits) ip6_vti: fix creating fallback tunnel device for vti6 ip_vti: fix a null pointer deferrence when create vti fallback tunnel r8169: don't use MSI-X on RTL8106e net: lan743x_ptp: convert to ktime_get_clocktai_ts64 net: sched: always disable bh when taking tcf_lock ip6_vti: simplify stats handling in vti6_xmit bpf: fix redirect to map under tail calls r8169: add missing Kconfig dependency tools/bpf: fix bpf selftest test_cgroup_storage failure bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist bpf, sockmap: fix map elem deletion race with smap_stop_sock bpf, sockmap: fix leakage of smap_psock_map_entry tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach tcp, ulp: add alias for all ulp modules bpf: fix a rcu usage warning in bpf_prog_array_copy_core() samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM net/xdp: Fix suspicious RCU usage warning net/mlx5e: Delete unneeded function argument Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb isdn: Disable IIOCDBGVAR ...
2018-08-16Merge branch 'linus/master' into rdma.git for-nextJason Gunthorpe127-4455/+10953
rdma.git merge resolution for the 4.19 merge window Conflicts: drivers/infiniband/core/rdma_core.c - Use the rdma code and revise with the new spelling for atomic_fetch_add_unless drivers/nvme/host/rdma.c - Replace max_sge with max_send_sge in new blk code drivers/nvme/target/rdma.c - Use the blk code and revise to use NULL for ib_post_recv when appropriate - Replace max_sge with max_recv_sge in new blk code net/rds/ib_send.c - Use the net code and revise to use NULL for ib_post_recv when appropriate Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16net/mlx5e: Delete unneeded function argumentYuval Shaia1-2/+2
priv argument is not used by the function, delete it. Fixes: a89842811ea98 ("net/mlx5e: Merge per priority stats groups") Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16Merge tag 'v4.18' into rdma.git for-nextJason Gunthorpe24-139/+179
Resolve merge conflicts from the -rc cycle against the rdma.git tree: Conflicts: drivers/infiniband/core/uverbs_cmd.c - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next - Merge removal of file->ucontext in for-next with new code in -rc drivers/infiniband/core/uverbs_main.c - for-next removed code from ib_uverbs_write() that was modified in for-rc Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13net/mlx5: Improve argument name for add flow APIEli Cohen1-4/+4
The last argument to mlx5_add_flow_rules passes the number of destinations in the struct pointed to by the dest arg. Change the name to better reflect this fact. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-08-13net/mlx5: Reorganize the makefileSaeed Mahameed1-17/+40
Reorganize the Makefile and group files together according to their functionality and importance. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-08-13net/mlx5e: clock.c depends on CONFIG_PTP_1588_CLOCKMoshe Shemesh6-6/+34
lib/clock.c includes clock related functions which require ptp support. Thus compile out lib/clock.c and add the needed function stubs in case kconfig CONFIG_PTP_1588_CLOCK is off. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-08-13net/mlx5e: vxlan.c depends on CONFIG_VXLANSaeed Mahameed3-9/+11
When vxlan is not enabled by kernel, no need to enable it in mlx5. Compile out lib/vxlan.c if CONFIG_VXLAN is not selected. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>