aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-18net/mlx4_en: fix spelling mistake: "quiting" -> "quitting"Colin Ian King1-1/+1
There is a spelling mistake in a en_err error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17mlxsw: __mlxsw_sp_port_headroom_set(): Fix a use of local variablePetr Machata1-5/+7
The function-local variable "delay" enters the loop interpreted as delay in bits. However, inside the loop it gets overwritten by the result of mlxsw_sp_pg_buf_delay_get(), and thus leaves the loop as quantity in cells. Thus on second and further loop iterations, the headroom for a given priority is configured with a wrong size. Fix by introducing a loop-local variable, delay_cells. Rename thres to thres_cells for consistency. Fixes: f417f04da589 ("mlxsw: spectrum: Refactor port buffer configuration") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net/mlx5e: XDP, fix redirect resources availability checkSaeed Mahameed4-4/+22
Currently mlx5 driver creates xdp redirect hw queues unconditionally on netdevice open, This is great until someone starts redirecting XDP traffic via ndo_xdp_xmit on mlx5 device and changes the device configuration at the same time, this might cause crashes, since the other device's napi is not aware of the mlx5 state change (resources un-availability). To fix this we must synchronize with other devices napi's on the system. Added a new flag under mlx5e_priv to determine XDP TX resources are available, set/clear it up when necessary and use synchronize_rcu() when the flag is turned off, so other napi's are in-sync with it, before we actually cleanup the hw resources. The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and it is sufficient to determine if it safe to transmit or not. The other two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become unnecessary. Thus, they are removed from data path. Fixes: 58b99ee3e3eb ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13net/mlx5: Fix a compilation warning in events.cTariq Toukan1-8/+9
Eliminate the following compilation warning: drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str' may be used uninitialized in this function [-Wuninitialized]: => 238:3 Fixes: c2fb3db22d35 ("net/mlx5: Rework handling of port module events") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Mikhael Goikhman <migo@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13net/mlx5: No command allowed when command interface is not readyHuy Nguyen3-1/+20
When EEH is injected and PCI bus stalls, mlx5's pci error detect function is called to deactivate the command interface and tear down the device. The issue is that there can be a thread that already passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command and stuck in the wait_func. Solution: Add function mlx5_cmd_flush to disable command interface and clear all the pending commands. When device state is set to MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all pending threads waiting for firmware commands completion are terminated. Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13net/mlx5e: Fix NULL pointer derefernce in set channels error flowMaria Pasechnik1-3/+4
New channels are applied to the priv channels only after they are successfully opened. Then, the indirection table should be built according to the new number of channels. Currently, such build is preformed independently of whether the channels opening is successful, and is not reverted on failure. The bug is caused due to removal of rss params from channels struct and moving it to priv struct. That change cause to independency between channels and rss params. This causes a crash on a later point, when accessing rqn of a non existing channel. This patch fixes it by moving the indirection table build right before switching the priv channels to new channels struct, after the new set of channels was successfully opened. Fixes: bbeb53b8b2c9 ("net/mlx5e: Move RSS params to a dedicated struct") Signed-off-by: Maria Pasechnik <mariap@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-12net/mlx4_en: Force CHECKSUM_NONE for short ethernet framesSaeed Mahameed1-2/+20
When an ethernet frame is padded to meet the minimum ethernet frame size, the padding octets are not covered by the hardware checksum. Fortunately the padding octets are usually zero's, which don't affect checksum. However, it is not guaranteed. For example, switches might choose to make other use of these octets. This repeatedly causes kernel hardware checksum fault. Prior to the cited commit below, skb checksum was forced to be CHECKSUM_NONE when padding is detected. After it, we need to keep skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP headers, it does not worth the effort as the packets are so small that CHECKSUM_COMPLETE has no significant advantage. Future work: when reporting checksum complete is not an option for IP non-TCP/UDP packets, we can actually fallback to report checksum unnecessary, by looking at cqe IPOK bit. Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net/mlx5e: Don't overwrite pedit action when multiple pedit usedTonghao Zhang1-10/+15
In some case, we may use multiple pedit actions to modify packets. The command shown as below: the last pedit action is effective. $ tc filter add dev netdev_rep parent ffff: protocol ip prio 1 \ flower skip_sw ip_proto icmp dst_ip 3.3.3.3 \ action pedit ex munge ip dst set 192.168.1.100 pipe \ action pedit ex munge eth src set 00:00:00:00:00:01 pipe \ action pedit ex munge eth dst set 00:00:00:00:00:02 pipe \ action csum ip pipe \ action tunnel_key set src_ip 1.1.1.100 dst_ip 1.1.1.200 dst_port 4789 id 100 \ action mirred egress redirect dev vxlan0 To fix it, we add max_mod_hdr_actions to mlx5e_tc_flow_parse_attr struction, max_mod_hdr_actions will store the max pedit action number we support and num_mod_hdr_actions indicates how many pedit action we used, and store all pedit action to mod_hdr_actions. Fixes: d79b6df6b10a ("net/mlx5e: Add parsing of TC pedit actions to HW format") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06net/mlx5e: Update hw flows when encap source mac changedTonghao Zhang3-0/+7
When we offload tc filters to hardware, hardware flows can be updated when mac of encap destination ip is changed. But we ignore one case, that the mac of local encap ip can be changed too, so we should also update them. To fix it, add route_dev in mlx5e_encap_entry struct to save the local encap netdevice, and when mac changed, kernel will flush all the neighbour on the netdevice and send NETEVENT_NEIGH_UPDATE event. The mlx5 driver will delete the flows and add them when neighbour available again. Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Cc: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-05net/mlx5e: Use the inner headers to determine tc/pedit offload limitation on decap flowsGuy Shattah1-2/+7
In packets that need to be decaped the internal headers have to be checked, not the external ones. Fixes: bdd66ac0aeed ("net/mlx5e: Disallow TC offloading of unsupported match/action combinations") Signed-off-by: Guy Shattah <sguy@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-05net/mlx5e: Properly set steering match levels for offloaded TC decap rulesOr Gerlitz5-18/+24
The match level computed by the driver gets to be wrong for decap rules with wildcarded inner packet match such as: tc filter add dev vxlan_sys_4789 protocol all parent ffff: prio 2 flower enc_dst_ip 192.168.0.9 enc_key_id 100 enc_dst_port 4789 action tunnel_key unset action mirred egress redirect dev eth1 The FW errs for a missing matching meta-data indicator for the outer headers (where we do have a match), and a wrong matching meta-data indicator for the inner headers (where we don't have a match). Fix that by taking into account the matching on the tunnel info and relating the match level of the encapsulated packet to the firmware inner headers indicator in case of decap. As for vxlan we mandate a match on the tunnel udp dst port, and in general we practically madndate a match on the source or dest ip for any IP tunnel, the fix was done in a minimal manner around the tunnel match parsing code. Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-05net/mlx5e: FPGA, fix Innova IPsec TX offload data path performanceRaed Salem1-0/+6
At Innova IPsec TX offload data path a special software parser metadata is used to pass some packet attributes to the hardware, this metadata is passed using the Ethernet control segment of a WQE (a HW descriptor) header. The cited commit might nullify this header, hence the metadata is lost, this caused a significant performance drop during hw offloading operation. Fix by restoring the metadata at the Ethernet control segment in case it was nullified. Fixes: 37fdffb217a4 ("net/mlx5: WQ, fixes for fragmented WQ buffers API") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-27Merge tag 'mlx5-fixes-2019-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller6-19/+58
Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-01-25 This series introduces some fixes to mlx5 driver. For more information please see tag log below. Please pull and let me know if there is any problem. For -stable v4.13 ('net/mlx5e: Allow MAC invalidation while spoofchk is ON') For -stable v4.18 ('Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25net/mlx5e: Unblock setting vid 0 for VFs through the uplink repOr Gerlitz1-0/+13
It turns out that libvirt uses 0-vid as a default if no vlan was set for the guest (which is the case for switchdev mode) and errs if we disallow that: error: Failed to start domain vm75 error: Cannot set interface MAC/vlanid to 6a:66:2d:48:92:c2/0 \ for ifname enp59s0f0 vf 0: Operation not supported So allow this in order not to break existing systems. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Maor Dickman <maord@mellanox.com> Reviewed-by: Gavi Teitz <gavi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25net/mlx5e: Move to use common phys port names for vport representorsOr Gerlitz3-2/+33
With VF LAG commit 491c37e49b48 "net/mlx5e: In case of LAG, one switch parent id is used for all representors", both uplinks and all the VFs (on both of them) get the same switchdev id. This cause the provisioning system method to identify the rep of a given VF from the parent PF PCI device using switchev id and physical port name to break, since VFm of PF0 will have the (id, name) as VFm of PF1. To fix that, we align to use the framework agreed upstream and set by nfp commit 168c478e107e "nfp: wire get_phys_port_name on representors": $ cat /sys/class/net/eth4_*/phys_port_name p0 pf0vf0 pf0vf1 Now, the names will be different, e.g. pf0vf0 vs. pf1vf0. Fixes: 491c37e49b48 ("net/mlx5e: In case of LAG, one switch parent id is used for all representors") Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Waleed Musa <waleedm@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25net/mlx5e: Allow MAC invalidation while spoofchk is ONAya Levin1-12/+6
Prior to this patch the driver prohibited spoof checking on invalid MAC. Now the user can set this configuration if it wishes to. This is required since libvirt might invalidate the VF Mac by setting it to zero, while spoofcheck is ON. Fixes: 1ab2068a4c66 ("net/mlx5: Implement vports admin state backup/restore") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25net/mlx5: Take lock with IRQs disabled to avoid deadlockMoni Shoua1-2/+3
The lock in qp_table might be taken from process context or from interrupt context. This may lead to a deadlock unless it is taken with IRQs disabled. Discovered by lockdep ================================ WARNING: inconsistent lock state 4.20.0-rc6 -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} python/12572 [HC1[1]:SC0[0]:HE0:SE1] takes: 00000000052a4df4 (&(&table->lock)->rlock#2){?.+.}, /0x50 [mlx5_core] {HARDIRQ-ON-W} state was registered at: _raw_spin_lock+0x33/0x70 mlx5_get_rsc+0x1a/0x50 [mlx5_core] mlx5_ib_eqe_pf_action+0x493/0x1be0 [mlx5_ib] process_one_work+0x90c/0x1820 worker_thread+0x87/0xbb0 kthread+0x320/0x3e0 ret_from_fork+0x24/0x30 irq event stamp: 103928 hardirqs last enabled at (103927): [] nk+0x1a/0x1c hardirqs last disabled at (103928): [] unk+0x1a/0x1c softirqs last enabled at (103924): [] tcp_sendmsg+0x31/0x40 softirqs last disabled at (103922): [] 80 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&table->lock)->rlock#2); lock(&(&table->lock)->rlock#2); *** DEADLOCK *** Fixes: 032080ab43ac ("IB/mlx5: Lock QP during page fault handling") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25net/mlx5e: Fix wrong private flag usage causing checksum disableShay Agroskin1-1/+1
MLX5E_PFLAG_* definitions were changed from bitmask to enumerated values. However, in mlx5e_open_rq(), the proper API (MLX5E_GET_PFLAG macro) was not used to read the flag value of MLX5E_PFLAG_RX_NO_CSUM_COMPLETE. Fixed it. Fixes: 8ff57c18e9f6 ("net/mlx5e: Improve ethtool private-flags code structure") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"Bodong Wang1-2/+2
This reverts commit 5f5991f36dce1e69dd8bd7495763eec2e28f08e7. With the original commit, eswitch instance will not be initialized for a function which is vport group manager but not eswitch manager such as host PF on SmartNIC (BlueField) card. This will result in a kernel crash when such a vport group manager is trying to access vports in its group. E.g, PF vport manager (not eswitch manager) tries to configure the MAC of its VF vport, a kernel trace will happen similar as bellow: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 ... RIP: 0010:mlx5_eswitch_get_vport_config+0xc/0x180 [mlx5_core] ... Fixes: 5f5991f36dce ("net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager") Signed-off-by: Bodong Wang <bodong@mellanox.com> Reported-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-24net/mlx4_core: Fix error handling when initializing CQ bufs in the driverJack Morgenstein1-2/+4
Procedure mlx4_init_user_cqes() handles returns by copy_to_user incorrectly. copy_to_user() returns the number of bytes not copied. Thus, a non-zero return should be treated as a -EFAULT error (as is done elsewhere in the kernel). However, mlx4_init_user_cqes() error handling simply returns the number of bytes not copied (instead of -EFAULT). Note, though, that this is a harmless bug: procedure mlx4_alloc_cq() (which is the only caller of mlx4_init_user_cqes()) treats any non-zero return as an error, but that returned error value is processed internally, and not passed further up the call stack. In addition, fixes the following sparse warning: warning: incorrect type in argument 1 (different address spaces) expected void [noderef] <asn:1>*to got void *buf Fixes: e45678973dcb ("{net, IB}/mlx4: Initialize CQ buffers in the driver when possible") Reported by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24net/mlx4_core: Add masking for a few queries on HCA capsAya Levin1-29/+46
Driver reads the query HCA capabilities without the corresponding masks. Without the correct masks, the base addresses of the queues are unaligned. In addition some reserved bits were wrongly read. Using the correct masks, ensures alignment of the base addresses and allows future firmware versions safe use of the reserved bits. Fixes: ab9c17a009ee ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet") Fixes: 0ff1fb654bec ("{NET, IB}/mlx4: Add device managed flow steering firmware API") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18Merge tag 'mlx5-fixes-2019-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxDavid S. Miller3-14/+34
Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-01-18 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.18 ('net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames') The patch doesn't apply cleanly to 4.18.y, but it is very simple to resolve, what should be the procedure here ? ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18net/mlx5e: Fix cb_ident duplicate in indirect block registerEli Britstein1-13/+16
Previously the identifier used for indirect block callback registry and for block rule cb registry (when done via indirect blocks) was the pointer to the tunnel netdev we were interested in receiving updates on. This worked fine if a single PF existed that registered one callback for the tunnel netdev of interest. However, if multiple PFs are in place then the 2nd PF tries to register with the same tunnel netdev identifier. This leads to EEXIST errors and/or incorrect cb deletions. Prevent this conflict by using the rpriv pointer as the identifier for netdev indirect block cb registry, allowing each PF to register a unique callback per tunnel netdev. For block cb registry, the same PF may register multiple cbs to the same block if using TC shared blocks. Instead of the rpriv, use the pointer to the allocated indr_priv data as the identifier here. This means that there can be a unique block callback for each PF/tunnel netdev combo. Fixes: f5bc2c5de101 ("net/mlx5e: Support TC indirect block notifications for eswitch uplink reprs") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-18net/mlx5e: Fix wrong (zero) TX drop counter indication for representorTariq Toukan1-0/+1
For representors, the TX dropped counter is not folded from the per-ring counters. Fix it. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-18net/mlx5e: Fix wrong error code return on FEC query failureShay Agroskin1-1/+4
Advertised and configured FEC query failure resulted in printing wrong error code. Fixes: 6cfa94605091 ("net/mlx5e: Ethtool driver callback for query/set FEC policy") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-18net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet framesCong Wang1-0/+13
When an ethernet frame is padded to meet the minimum ethernet frame size, the padding octets are not covered by the hardware checksum. Fortunately the padding octets are usually zero's, which don't affect checksum. However, we have a switch which pads non-zero octets, this causes kernel hardware checksum fault repeatedly. Prior to: commit '88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE ...")' skb checksum was forced to be CHECKSUM_NONE when padding is detected. After it, we need to keep skb->csum updated, like what we do for RXFCS. However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP headers, it is not worthy the effort as the packets are so small that CHECKSUM_COMPLETE can't save anything. Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"), Cc: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Nikola Ciprich <nikola.ciprich@linuxbox.cz> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-18mlxsw: spectrum_switchdev: Do not treat static FDB entries as stickyIdo Schimmel1-6/+6
The driver currently treats static FDB entries as both static and sticky. This is incorrect and prevents such entries from being roamed to a different port via learning. Fix this by configuring static entries with ageing disabled and roaming enabled. In net-next we can add proper support for the newly introduced 'sticky' flag. Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18mlxsw: spectrum_fid: Update dummy FID indexNir Dotan1-2/+2
When using a tc flower action of egress mirred redirect, the driver adds an implicit FID setting action. This implicit action sets a dummy FID to the packet and is used as part of a design for trapping unmatched flows in OVS. While this implicit FID setting action is supposed to be a NOP when a redirect action is added, in Spectrum-2 the FID record is consulted as the dummy FID index is an 802.1D FID index and the packet is dropped instead of being redirected. Set the dummy FID index value to be within 802.1Q range. This satisfies both Spectrum-1 which ignores the FID and Spectrum-2 which identifies it as an 802.1Q FID and will then follow the redirect action. Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC") Signed-off-by: Nir Dotan <nird@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18mlxsw: pci: Return error on PCI reset timeoutNir Dotan1-2/+2
Return an appropriate error in the case when the driver timeouts on waiting for firmware to go out of PCI reset. Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check") Signed-off-by: Nir Dotan <nird@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18mlxsw: pci: Increase PCI SW reset timeoutNir Dotan1-1/+1
Spectrum-2 PHY layer introduces a calibration period which is a part of the Spectrum-2 firmware boot process. Hence increase the SW timeout waiting for the firmware to come out of boot. This does not increase system boot time in cases where the firmware PHY calibration process is done quickly. Fixes: c3ab435466d5 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC") Signed-off-by: Nir Dotan <nird@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18mlxsw: pci: Ring CQ's doorbell before RDQ'sIdo Schimmel2-5/+8
When a packet should be trapped to the CPU the device consumes a WQE (work queue element) from an RDQ (receive descriptor queue) and copies the packet to the address specified in the WQE. The device then tries to post a CQE (completion queue element) that contains various metadata (e.g., ingress port) about the packet to a CQ (completion queue). In case the device managed to consume a WQE, but did not manage to post the corresponding CQE, it will get stuck. This unlikely situation can be triggered due to the scheme the driver is currently using to process CQEs. The driver will consume up to 512 CQEs at a time and after processing each corresponding WQE it will ring the RDQ's doorbell, letting the device know that a new WQE was posted for it to consume. Only after processing all the CQEs (up to 512), the driver will ring the CQ's doorbell, letting the device know that new ones can be posted. Fix this by having the driver ring the CQ's doorbell for every processed CQE, but before ringing the RDQ's doorbell. This guarantees that whenever we post a new WQE, there is a corresponding CQE available. Copy the currently processed CQE to prevent the device from overwriting it with a new CQE after ringing the doorbell. Note that the driver still arms the CQ only after processing all the pending CQEs, so that interrupts for this CQ will only be delivered after the driver finished its processing. Before commit 8404f6f2e8ed ("mlxsw: pci: Allow to use CQEs of version 1 and version 2") the issue was virtually impossible to trigger since the number of CQEs was twice the number of WQEs and the number of CQEs processed at a time was equal to the number of available WQEs. Fixes: 8404f6f2e8ed ("mlxsw: pci: Allow to use CQEs of version 1 and version 2") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Semion Lisyansky <semionl@mellanox.com> Tested-by: Semion Lisyansky <semionl@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds8-64/+106
Pull networking fixes from David Miller: 1) Fix regression in multi-SKB responses to RTM_GETADDR, from Arthur Gautier. 2) Fix ipv6 frag parsing in openvswitch, from Yi-Hung Wei. 3) Unbounded recursion in ipv4 and ipv6 GUE tunnels, from Stefano Brivio. 4) Use after free in hns driver, from Yonglong Liu. 5) icmp6_send() needs to handle the case of NULL skb, from Eric Dumazet. 6) Missing rcu read lock in __inet6_bind() when operating on mapped addresses, from David Ahern. 7) Memory leak in tipc-nl_compat_publ_dump(), from Gustavo A. R. Silva. 8) Fix PHY vs r8169 module loading ordering issues, from Heiner Kallweit. 9) Fix bridge vlan memory leak, from Ido Schimmel. 10) Dev refcount leak in AF_PACKET, from Jason Gunthorpe. 11) Infoleak in ipv6_local_error(), flow label isn't completely initialized. From Eric Dumazet. 12) Handle mv88e6390 errata, from Andrew Lunn. 13) Making vhost/vsock CID hashing consistent, from Zha Bin. 14) Fix lack of UMH cleanup when it unexpectedly exits, from Taehee Yoo. 15) Bridge forwarding must clear skb->tstamp, from Paolo Abeni. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) bnxt_en: Fix context memory allocation. bnxt_en: Fix ring checking logic on 57500 chips. mISDN: hfcsusb: Use struct_size() in kzalloc() net: clear skb->tstamp in bridge forwarding path net: bpfilter: disallow to remove bpfilter module while being used net: bpfilter: restart bpfilter_umh when error occurred net: bpfilter: use cleanup callback to release umh_info umh: add exit routine for UMH process isdn: i4l: isdn_tty: Fix some concurrency double-free bugs vhost/vsock: fix vhost vsock cid hashing inconsistent net: stmmac: Prevent RX starvation in stmmac_napi_poll() net: stmmac: Fix the logic of checking if RX Watchdog must be enabled net: stmmac: Check if CBS is supported before configuring net: stmmac: dwxgmac2: Only clear interrupts that are active net: stmmac: Fix PCI module removal leak tools/bpf: fix bpftool map dump with bitfields tools/bpf: test btf bitfield with >=256 struct member offset bpf: fix bpffs bitfield pretty print net: ethernet: mediatek: fix warning in phy_start_aneg tcp: change txhash on SYN-data timeout ...
2019-01-08mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletionIdo Schimmel1-1/+1
When a VLAN is deleted from a bridge port we should not change the PVID unless the deleted VLAN is the PVID. Fixes: fe9ccc785de5 ("mlxsw: spectrum_switchdev: Don't batch VLAN operations") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08mlxsw: spectrum_nve: Replace error code with EINVALIdo Schimmel1-2/+2
Adding a VLAN on a port can trigger the offload of a VXLAN tunnel which is already a member in the VLAN. In case the configuration of the VXLAN is not supported, the driver would return -EOPNOTSUPP. This is problematic since bridge code does not interpret this as error, but rather that it should try to setup the VLAN using the 8021q driver instead of switchdev. Fixes: d70e42b22dd4 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08mlxsw: spectrum_switchdev: Avoid returning errors in commit phaseIdo Schimmel1-12/+9
Drivers are not supposed to return errors in switchdev commit phase if they returned OK in prepare phase. Otherwise, a WARNING is emitted. However, when the offloading of a VXLAN tunnel is triggered by the addition of a VLAN on a local port, it is not possible to guarantee that the commit phase will succeed without doing a lot of work. In these cases, the artificial division between prepare and commit phase does not make sense, so simply do the work in the prepare phase. Fixes: d70e42b22dd4 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08mlxsw: spectrum: Add VXLAN dependency for spectrumIdo Schimmel1-0/+1
When VXLAN is a loadable module, MLXSW_SPECTRUM must not be built-in: drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:2547: undefined reference to `vxlan_fdb_find_uc' Add Kconfig dependency to enforce usable configurations. Fixes: 1231e04f5bba ("mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08mlxsw: spectrum: Disable lag port TX before removing itJiri Pirko1-2/+5
Make sure that lag port TX is disabled before mlxsw_sp_port_lag_leave() is called and prevent from possible EMAD error. Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08mlxsw: spectrum_acl: Remove ASSERT_RTNL()s in module removal flowNir Dotan1-2/+0
Removal of the mlxsw driver on Spectrum-2 platforms hits an ASSERT_RTNL() in Spectrum-2 ACL Bloom filter and in ERP removal paths. This happens because the multicast router implementation in Spectrum-2 relies on ACLs. Taking the RTNL lock upon driver removal is useless since the driver first removes its ports and unregisters from notifiers so concurrent writes cannot happen at that time. The assertions were originally put as a reminder for future work involving ERP background optimization, but having these assertions only during addition serves this purpose as well. Therefore remove the ASSERT_RTNL() in both places related to ERP and Bloom filter removal. Fixes: cf7221a4f5a5 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2") Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08mlxsw: spectrum_acl: Add cleanup after C-TCAM update error conditionNir Dotan1-1/+9
When writing to C-TCAM, mlxsw driver uses cregion->ops->entry_insert(). In case of C-TCAM HW insertion error, the opposite action should take place. Add error handling case in which the C-TCAM region entry is removed, by calling cregion->ops->entry_remove(). Fixes: a0a777b9409f ("mlxsw: spectrum_acl: Start using A-TCAM") Signed-off-by: Nir Dotan <nird@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08cross-tree: phase out dma_zalloc_coherent()Luis Chamberlain3-11/+11
We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-07net/mlx4: replace pci_{,un}map_sg with dma_{,un}map_sgStephen Warren1-8/+7
pci_{,un}map_sg are deprecated and replaced by dma_{,un}map_sg. This is especially relevant since the rest of the driver uses the DMA API. Fix the driver to use the replacement APIs. Signed-off-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07net/mlx4: Get rid of page operation after dma_alloc_coherentStephen Warren2-39/+75
This patch solves a crash at the time of mlx4 driver unload or system shutdown. The crash occurs because dma_alloc_coherent() returns one value in mlx4_alloc_icm_coherent(), but a different value is passed to dma_free_coherent() in mlx4_free_icm_coherent(). In turn this is because when allocated, that pointer is passed to sg_set_buf() to record it, then when freed it is re-calculated by calling lowmem_page_address(sg_page()) which returns a different value. Solve this by recording the value that dma_alloc_coherent() returns, and passing this to dma_free_coherent(). This patch is roughly equivalent to commit 378efe798ecf ("RDMA/hns: Get rid of page operation after dma_alloc_coherent"). Based-on-code-from: Christoph Hellwig <hch@lst.de> Signed-off-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-5/+3
Pull rdma updates from Jason Gunthorpe: "This has been a fairly typical cycle, with the usual sorts of driver updates. Several series continue to come through which improve and modernize various parts of the core code, and we finally are starting to get the uAPI command interface cleaned up. - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5, qib, rxe, usnic - Rework the entire syscall flow for uverbs to be able to run over ioctl(). Finally getting past the historic bad choice to use write() for command execution - More functional coverage with the mlx5 'devx' user API - Start of the HFI1 series for 'TID RDMA' - SRQ support in the hns driver - Support for new IBTA defined 2x lane widths - A big series to consolidate all the driver function pointers into a big struct and have drivers provide a 'static const' version of the struct instead of open coding initialization - New 'advise_mr' uAPI to control device caching/loading of page tables - Support for inline data in SRPT - Modernize how umad uses the driver core and creates cdev's and sysfs files - First steps toward removing 'uobject' from the view of the drivers" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits) RDMA/srpt: Use kmem_cache_free() instead of kfree() RDMA/mlx5: Signedness bug in UVERBS_HANDLER() IB/uverbs: Signedness bug in UVERBS_HANDLER() IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported IB/umad: Start using dev_groups of class IB/umad: Use class_groups and let core create class file IB/umad: Refactor code to use cdev_device_add() IB/umad: Avoid destroying device while it is accessed IB/umad: Simplify and avoid dynamic allocation of class IB/mlx5: Fix wrong error unwind IB/mlx4: Remove set but not used variable 'pd' RDMA/iwcm: Don't copy past the end of dev_name() string IB/mlx5: Fix long EEH recover time with NVMe offloads IB/mlx5: Simplify netdev unbinding IB/core: Move query port to ioctl RDMA/nldev: Expose port_cap_flags2 IB/core: uverbs copy to struct or zero helper IB/rxe: Reuse code which sets port state IB/rxe: Make counters thread safe IB/mlx5: Use the correct commands for UMEM and UCTX allocation ...
2018-12-24net/mlx4_core: drop useless LIST_HEADJulia Lawall1-5/+0
Drop LIST_HEAD where the variable it declares has never been used. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: c82e9aa0a8bc ("mlx4_core: resource tracking for HCA resources used by guests") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24mlxsw: spectrum: drop useless LIST_HEADJulia Lawall1-1/+0
Drop LIST_HEAD where the variable it declares is never used. The uses were removed in 244cd96adb5f ("net_sched: remove list_head from tc_action"), but not the declaration. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 244cd96adb5f ("net_sched: remove list_head from tc_action") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24net/mlx5e: drop useless LIST_HEADJulia Lawall1-3/+0
Drop LIST_HEAD where the variable it declares is never used. These became useless in 244cd96adb5f ("net_sched: remove list_head from tc_action") The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 244cd96adb5f ("net_sched: remove list_head from tc_action") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24net/mlx5e: fix semicolon.cocci warningskbuild test robot1-1/+1
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:1339:57-58: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: 4c8fb2986d44 ("net/mlx5e: Increase VF representors' SQ size to 128") CC: Gavi Teitz <gavi@mellanox.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20net/mlx5e: XDP, Add user control for XDP TX MPWQE featureTariq Toukan3-1/+34
Add ethtool private flag 'xdp_tx_mpwqe' to control the feature from userspace. Feature is set ON by default, if supported. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20net/mlx5e: XDP, Support Enhanced Multi-Packet TX WQETariq Toukan4-27/+173
Add support for the HW feature of multi-packet WQE in XDP xmit flow. The conventional TX descriptor (WQE, Work Queue Element) serves a single packet. Our HW has support for multi-packet WQE (MPWQE) in which a single descriptor serves multiple TX packets. This reduces both the PCI overhead and the CPU cycles wasted on writing them. In this patch we add support for the HW feature, which is supported starting from ConnectX-5. Performance: Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_TX: We see a huge gain on single port ConnectX-5, and reach the 100 Mpps milestone. * Single-port HCA: Before: 70 Mpps After: 100 Mpps (+42.8%) * Dual-port HCA: Before: 51.7 Mpps After: 57.3 Mpps (+10.8%) * In both cases we tested traffic on one port and for now On Dual-port HCAs we see only small gain, we are working to overcome this bottleneck, but for the moment only with experimental firmware on dual port HCAs we can reach the wanted numbers as seen on Single-port HCAs. XDP_REDIRECT: Redirect from (A) ConnectX-5 to (B) ConnectX-5. Due to a setup limitation, (A) and (B) are on different NUMA nodes, so absolute performance numbers are not optimal. Note: Below is the transmit rate of (B), not the redirect rate of (A) which is in some cases higher. * (B) is single-port: Before: 77 Mpps After: 90 Mpps (+16.8%) * (B) is dual-port: Before: 61 Mpps After: 72 Mpps (+18%) Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20net/mlx5e: XDP, Add array for WQE info descriptorsTariq Toukan3-21/+54
Each xdp_wqe_info instance describes the number of data-segments and WQEBBs of the WQE. This is useful for a downstream patch that adds support for Multi-Packet TX WQE feature. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>