aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-01-02{net,IB}/mlx5: Refactor page fault handlingArtemy Kovalyov5-143/+311
* Update page fault event according to last specification. * Separate code path for page fault EQ, completion EQ and async EQ. * Move page fault handling work queue from mlx5_ib static variable into mlx5_core page fault EQ. * Allocate memory to store ODP event dynamically as the events arrive, since in atomic context - use mempool. * Make mlx5_ib page fault handler run in process context. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02net/mlx5: Update PAGE_FAULT_RESUME layoutArtemy Kovalyov1-8/+2
Update PAGE_FAULT_RESUME command layout. Three bit fields describing page fault: rdma, rdma_write, req_res gave 8 possible combinations, while only a few were legal. Now they are interpreted as three-bit type field, where former legal combinations turns into corresponding types and unused were added as new types. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Add MR cache for large UMR regionsArtemy Kovalyov1-0/+20
In this change we turn mlx5_ib_update_mtt() into generic mlx5_ib_update_xlt() to perfrom HCA translation table modifiactions supporting both atomic and process contexts and not limited by number of modified entries. Using this function we increase preallocated MRs up to 16GB. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02IB/mlx5: Refactor UMR post send formatArtemy Kovalyov1-1/+1
* Update struct mlx5_wqe_umr_ctrl_seg. * Currenlty UMR send_flags aim only certain use cases: enabled/disable cached MR, modifying XLT for ODP. By making flags independent make UMR more flexible allowing arbitrary manipulations. * Since different UMR formats have different entry sizes UMR request should receive exact size of translation table update instead of number of entries. Rename field npages to xlt_size in struct mlx5_umr_wr and update relevant code accordingly. * Add support of length64 bit. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+2
Pull networking fixes from David Miller: 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar. The most important is to not assume the packet is RX just because the destination address matches that of the device. Such an assumption causes problems when an interface is put into loopback mode. 2) If we retry when creating a new tc entry (because we dropped the RTNL mutex in order to load a module, for example) we end up with -EAGAIN and then loop trying to replay the request. But we didn't reset some state when looping back to the top like this, and if another thread meanwhile inserted the same tc entry we were trying to, we re-link it creating an enless loop in the tc chain. Fix from Daniel Borkmann. 3) There are two different WRITE bits in the MDIO address register for the stmmac chip, depending upon the chip variant. Due to a bug we could set them both, fix from Hock Leong Kweh. 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: stmmac: fix incorrect bit set in gmac4 mdio addr register r8169: add support for RTL8168 series add-on card. net: xdp: remove unused bfp_warn_invalid_xdp_buffer() openvswitch: upcall: Fix vlan handling. ipv4: Namespaceify tcp_tw_reuse knob net: korina: Fix NAPI versus resources freeing net, sched: fix soft lockup in tc_classify net/mlx4_en: Fix user prio field in XDP forward tipc: don't send FIN message from connectionless socket ipvlan: fix multicast processing ipvlan: fix various issues in ipvlan_process_multicast()
2016-12-25clocksource: Use a plain u64 instead of cycle_tThomas Gleixner5-7/+7
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
2016-12-23net/mlx4_en: Fix user prio field in XDP forwardTariq Toukan1-1/+2
The user prio field is wrong (and overflows) in the XDP forward flow. This is a result of a bad value for num_tx_rings_p_up, which should account all XDP TX rings, as they operate for the same user prio. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23mlxsw: spectrum_router: Correctly remove nexthop groupsIdo Schimmel1-0/+3
At the end of the nexthop initialization process we determine whether the nexthop should be offloaded or not based on the NUD state of the neighbour representing it. After all the nexthops were initialized we refresh the nexthop group and potentially offload it to the device, in case some of the nexthops were resolved. Make the destruction of a nexthop group symmetric with its creation by marking all nexthops as invalid and then refresh the nexthop group to make sure it was removed from the device's tables. Fixes: b2157149b0b0 ("mlxsw: spectrum_router: Add the nexthop neigh activity update") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23mlxsw: spectrum_router: Don't reflect dead neighsIdo Schimmel1-4/+6
When a neighbour is considered to be dead, we should remove it from the device's table regardless of its NUD state. Without this patch, after setting a port to be administratively down we get the following errors when we periodically try to update the kernel about neighbours activity: [ 461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find matching neighbour for IP=192.168.100.2 Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20net/mlx5: use rb_entry()Geliang Tang1-1/+1
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+1
Pull networking fixes and cleanups from David Miller: 1) Revert bogus nla_ok() change, from Alexey Dobriyan. 2) Various bpf validator fixes from Daniel Borkmann. 3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04 drivers, from Dongpo Li. 4) Several ethtool ksettings conversions from Philippe Reynes. 5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert. 6) XDP support for virtio_net, from John Fastabend. 7) Fix NAT handling within a vrf, from David Ahern. 8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits) net: mv643xx_eth: fix build failure isdn: Constify some function parameters mlxsw: spectrum: Mark split ports as such cgroup: Fix CGROUP_BPF config qed: fix old-style function definition net: ipv6: check route protocol when deleting routes r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected irda: w83977af_ir: cleanup an indent issue net: sfc: use new api ethtool_{get|set}_link_ksettings net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings bpf: fix mark_reg_unknown_value for spilled regs on map value marking bpf: fix overflow in prog accounting bpf: dynamically allocate digest scratch buffer gtp: Fix initialization of Flags octet in GTPv1 header gtp: gtp_check_src_ms_ipv4() always return success net/x25: use designated initializers isdn: use designated initializers ...
2016-12-17mlxsw: spectrum: Mark split ports as suchIdo Schimmel1-1/+1
When a port is split we should mark it as such, as otherwise the split ports aren't renamed correctly (e.g. sw1p3 -> sw1p3s1) and the unsplit operation fails: $ devlink port split sw1p3 count 4 $ devlink port unsplit eth0 devlink answers: Invalid argument [ 598.565307] mlxsw_spectrum 0000:03:00.0 eth0: Port wasn't split Fixes: 67963a33b4fd ("mlxsw: Make devlink port instances independent of spectrum/switchx2 port instances") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Tamir Winetroub <tamirw@mellanox.com> Reviewed-by: Elad Raz <eladr@mellanox.com> Tested-by: Tamir Winetroub <tamirw@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-15Merge tag 'pci-v4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds1-41/+43
Pull PCI updates from Bjorn Helgaas: "PCI changes: - add support for PCI on ARM64 boxes with ACPI. We already had this for theoretical spec-compliant hardware; now we're adding quirks for the actual hardware (Cavium, HiSilicon, Qualcomm, X-Gene) - add runtime PM support for hotplug ports - enable runtime suspend for Intel UHCI that uses platform-specific wakeup signaling - add yet another host bridge registration interface. We hope this is extensible enough to subsume the others - expose device revision in sysfs for DRM - to avoid device conflicts, make sure any VF BAR updates are done before enabling the VF - avoid unnecessary link retrains for ASPM - allow INTx masking on Mellanox devices that support it - allow access to non-standard VPD for Chelsio devices - update Broadcom iProc support for PAXB v2, PAXC v2, inbound DMA, etc - update Rockchip support for max-link-speed - add NVIDIA Tegra210 support - add Layerscape LS1046a support - update R-Car compatibility strings - add Qualcomm MSM8996 support - remove some uninformative bootup messages" * tag 'pci-v4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (115 commits) PCI: Enable access to non-standard VPD for Chelsio devices (cxgb3) PCI: Expand "VPD access disabled" quirk message PCI: pciehp: Remove loading message PCI: hotplug: Remove hotplug core message PCI: Remove service driver load/unload messages PCI/AER: Log AER IRQ when claiming Root Port PCI/AER: Log errors with PCI device, not PCIe service device PCI/AER: Remove unused version macros PCI/PME: Log PME IRQ when claiming Root Port PCI/PME: Drop unused support for PMEs from Root Complex Event Collectors PCI: Move config space size macros to pci_regs.h x86/platform/intel-mid: Constify mid_pci_platform_pm PCI/ASPM: Don't retrain link if ASPM not possible PCI: iproc: Skip check for legacy IRQ on PAXC buses PCI: pciehp: Leave power indicator on when enabling already-enabled slot PCI: pciehp: Prioritize data-link event over presence detect PCI: rcar: Add gen3 fallback compatibility string for pcie-rcar PCI: rcar: Use gen2 fallback compatibility last PCI: rcar-gen2: Use gen2 fallback compatibility last PCI: rockchip: Move the deassert of pm/aclk/pclk after phy_init() ..
2016-12-12Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-2/+2
Pull timer updates from Thomas Gleixner: "The time/timekeeping/timer folks deliver with this update: - Fix a reintroduced signed/unsigned issue and cleanup the whole signed/unsigned mess in the timekeeping core so this wont happen accidentaly again. - Add a new trace clock based on boot time - Prevent injection of random sleep times when PM tracing abuses the RTC for storage - Make posix timers configurable for real tiny systems - Add tracepoints for the alarm timer subsystem so timer based suspend wakeups can be instrumented - The usual pile of fixes and updates to core and drivers" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) timekeeping: Use mul_u64_u32_shr() instead of open coding it timekeeping: Get rid of pointless typecasts timekeeping: Make the conversion call chain consistently unsigned timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion alarmtimer: Add tracepoints for alarm timers trace: Update documentation for mono, mono_raw and boot clock trace: Add an option for boot clock as trace clock timekeeping: Add a fast and NMI safe boot clock timekeeping/clocksource_cyc2ns: Document intended range limitation timekeeping: Ignore the bogus sleep time if pm_trace is enabled selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous" clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map() arm64: dts: rockchip: Arch counter doesn't tick in system suspend clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend posix-timers: Make them configurable posix_cpu_timers: Move the add_device_randomness() call to a proper place timer: Move sys_alarm from timer.c to itimer.c ptp_clock: Allow for it to be optional Kconfig: Regenerate *.c_shipped files after previous changes ...
2016-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+0
2016-12-09net: mlx5: Fix Kconfig help textChristopher Covington1-2/+0
Since the following commit, Infiniband and Ethernet have not been mutually exclusive. Fixes: 4aa17b28 mlx5: Enable mutual support for IB and Ethernet Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08net/mlx5e: use %pad format string for dma_addr_tArnd Bergmann1-2/+2
On 32-bit ARM with 64-bit dma_addr_t I get this warning about an incorrect format string: In file included from /git/arm-soc/drivers/net/ethernet/mellanox/mlx5/core/alloc.c:42:0: drivers/net/ethernet/mellanox/mlx5/core/alloc.c: In function ‘mlx5_frag_buf_alloc_node’: drivers/net/ethernet/mellanox/mlx5/core/alloc.c:134:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] We have the special %pad format for printing dma_addr_t, so use that to print the correct address and avoid the warning. Fixes: 1c1b522808a1 ("net/mlx5e: Implement Fragmented Work Queue (WQ)") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08mlx4: xdp: Reserve headroom for receiving packet when XDP prog is activeMartin KaFai Lau4-17/+27
Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head() support. This patch only affects the code path when XDP is active. After testing, the tx_dropped counter is incremented if the xdp_prog sends more than wire MTU. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrsMartin KaFai Lau2-30/+44
When XDP is active in mlx4, mlx4 is using one page/pkt. At the same time (i.e. when XDP is active), it is currently limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 in x86. AFAICT, we can at least raise the MTU limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is doing. It will be useful in the next patch which allows XDP program to extend the packet by adding new header(s). Note: In the earlier XDP patches, there is already existing guard to ensure the page/pkt scheme only applies when XDP is active in mlx4. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08bpf: xdp: Allow head adjustment in XDP progMartin KaFai Lau2-0/+10
This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08net/mlx5e: Offload TC matching on packets being IP fragmentsOr Gerlitz1-0/+12
Enable offloading of matching on packets being fragments. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-42/+53
2016-12-06net/mlx5e: Change the SQ/RQ operational state to positive logicMohamad Haj Yahia5-13/+17
When using the negative logic (i.e. FLUSH state), after the RQ/SQ reopen we will have a time interval that the RQ/SQ is not really ready and the state indicates that its not in FLUSH state because the initial SQ/RQ struct memory starts as zeros. Now we changed the state to indicate if the SQ/RQ is opened and we will set the READY state after finishing preparing all the SQ/RQ resources. Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Fixes: f2fde18c52a7 ("net/mlx5e: Don't wait for RQ completions on close") Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5e: Don't flush SQ on errorSaeed Mahameed1-1/+0
We are doing SQ descriptors cleanup in driver. Fixes: 6e8dd6d6f4bd ("net/mlx5e: Don't wait for SQ completions on close") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5e: Don't notify HW when filling the edge of ICO SQSaeed Mahameed1-1/+1
We are going to do this a couple of steps ahead anyway. Fixes: d3c9bc2743dc ("net/mlx5e: Added ICO SQs") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5: Fix query ISSI flowKamal Heib3-11/+14
In old FWs query ISSI command is not supported and for some of those FWs it might fail with status other than "MLX5_CMD_STAT_BAD_OP_ERR". In such case instead of failing the driver load, we will treat any FW status other than 0 for Query ISSI FW command as ISSI not supported and assume ISSI=0 (most basic driver/FW interface). In case of driver syndrom (query ISSI failure by driver) we will fail driver load. Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5: Remove duplicate pci dev name printKamal Heib1-4/+4
Remove duplicate pci dev name printing from mlx5_core_warn/dbg. Fixes: 5a7883989b1c ('net/mlx5_core: Improve mlx5 messages') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06net/mlx5: Verify module parametersKamal Heib2-12/+17
Verify the mlx5_core module parameters by making sure that they are in the expected range and if they aren't restore them to their default values. Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03ipv4: fib: Replay events when registering FIB notifierIdo Schimmel1-1/+19
Commit b90eb7549499 ("fib: introduce FIB notification infrastructure") introduced a new notification chain to notify listeners (f.e., switchdev drivers) about addition and deletion of routes. However, upon registration to the chain the FIB tables can already be populated, which means potential listeners will have an incomplete view of the tables. Solve that by dumping the FIB tables and replaying the events to the passed notification block. The dump itself is done using RCU in order not to starve consumers that need RTNL to make progress. The integrity of the dump is ensured by reading the FIB change sequence counter before and after the dump under RTNL. This allows us to avoid the problematic situation in which the dumping process sends a ENTRY_ADD notification following ENTRY_DEL generated by another process holding RTNL. Callers of the registration function may pass a callback that is executed in case the dump was inconsistent with current FIB tables. The number of retries until a consistent dump is achieved is set to a fixed number to prevent callers from looping for long periods of time. In case current limit proves to be problematic in the future, it can be easily converted to be configurable using a sysctl. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03mlxsw: spectrum_router: Implement FIB offload in deferred workIdo Schimmel1-10/+62
FIB offload is currently done in process context with RTNL held, but we're about to dump the FIB tables in RCU critical section, so we can no longer sleep. Instead, defer the operation to process context using deferred work. Make sure fib info isn't freed while the work is queued by taking a reference on it and releasing it after the operation is done. Deferring the operation is valid because the upper layers always assume the operation was successful. If it's not, then the driver-specific abort mechanism is called and all routed traffic is directed to slow path. The work items are submitted to an ordered workqueue to prevent a mismatch between the kernel's FIB table and the device's. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03mlxsw: core: Create an ordered workqueue for FIB offloadIdo Schimmel2-0/+24
We're going to start processing FIB entries addition / deletion events in deferred work. These work items must be processed in the order they were submitted or otherwise we can have differences between the kernel's FIB table and the device's. Solve this by creating an ordered workqueue to which these work items will be submitted to. Note that we can't simply convert the current workqueue to be ordered, as EMADs re-transmissions are also processed in deferred work. Later on, we can migrate other work items to this workqueue, such as FDB notification processing and nexthop resolution, since they all take the same lock anyway. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03mlx4: use reset to set mac headerZhang Shengju1-1/+1
Since offset is zero, it's not necessary to use set function. Reset function is straightforward, and will remove the unnecessary add operation in set function. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-20/+9
Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02mlx4: fix use-after-free in mlx4_en_fold_software_stats()Eric Dumazet2-1/+5
My recent commit to get more precise rx/tx counters in ndo_get_stats64() can lead to crashes at device dismantle, as Jesper found out. We must prevent mlx4_en_fold_software_stats() trying to access tx/rx rings if they are deleted. Fix this by adding a test against priv->port_up in mlx4_en_fold_software_stats() Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port() allows us to eventually broadcast the latest/current counters to rtnetlink monitors. Fixes: 40931b85113d ("mlx4: give precise rx/tx bytes/packets counters") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Support adding ingress tc rule when egress device flag is setHadar Hen Zion1-0/+8
When ndo_setup_tc is called with an egress_dev flag set, it means that the ndo call was executed on the mirred action (egress) device and not on the ingress device. In order to support this kind of ndo_setup_tc call, and insert the correct decap rule to the hardware, the uplink device on the same eswitch should be found. Currently, we use this resolution between the mirred device and the uplink on the same eswitch to offload vxlan shared device decap rules. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Save the represntor netdevice as part of the representorHadar Hen Zion4-10/+22
Replace the representor private data to a net_device pointer holding the representor netdevice, instead of void pointer holding mlx5e_priv. It will be used by a new eswitch service function, returning the uplink representor netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Bring back representor's ndos that were accidentally removedHadar Hen Zion1-0/+2
The VF Representor udp tunnel ndo entries were removed by mistake, return them. Fixes: 370bad0f9a52 ('net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: skip loopback selftest with !CONFIG_INETArnd Bergmann1-1/+9
When CONFIG_INET is disabled, the new selftest results in a link error: drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: In function `mlx5e_test_loopback': en_selftest.c:(.text.mlx5e_test_loopback+0x2ec): undefined reference to `ip_send_check' en_selftest.c:(.text.mlx5e_test_loopback+0x34c): undefined reference to `udp4_hwcsum' This hides the specific test in that configuration. Fixes: 0952da791c97 ("net/mlx5e: Add support for loopback selftest") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to callerDaniel Borkmann1-2/+6
After 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers need to hold rcu_read_lock() already to make sure BPF program doesn't get released in the background. Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading. Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping in XDP supported drivers and to keep the typecheck on the context intact. For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock() out of the helper. When the driver gets atomic replace support, this will move to call-sites eventually. mlx5 needs actual fixing as it has the same issue as described already in 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), that is, we're under RCU bh at this time, BPF programs are released via call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue reset. Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Remove flow encap entry in the correct placeRoi Dayan1-21/+22
Handling flow encap entry should be inside tc del flow and is only relevant for offloaded eswitch TC rules. Fixes: 11a457e9b6c1 ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Refactor tc del flow to accept mlx5e_tc_flow instanceRoi Dayan1-7/+6
Change the function that deletes offloaded TC rule to get struct mlx5e_tc_flow instance which contains both the flow handle and flow attributes. This is a cleanup needed for downstream patches, it doesn't change any functionality. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Correct cleanup order when deleting offloaded TC rulesRoi Dayan1-2/+2
According to the reverse unwinding principle, on delete time we should first handle deletion of the steering rule and later handle the vlan deletion from the eswitch. Fixes: 8b32580df1cb ("net/mlx5e: Add TC vlan action for SRIOV offloads") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Remove redundant hashtable lookup in configure flowerRoi Dayan1-19/+7
We will never find a flow with the same cookie as cls_flower always allocates a new flow and the cookie is the allocated memory address. Fixes: e3a2b7ed018e ("net/mlx5e: Support offload cls_flower with drop action") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Create UMR MKey per RQTariq Toukan3-42/+35
In Striding RQ implementation, we used a single UMR (User-Mode Memory Registration) memory key for all RQs. When the product of RQs number*size gets high, we hit a limitation of u16 field size in FW. Here we move to using a UMR memory key per RQ, so we can scale to any number of rings, with the maximum buffer size in each. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Move function mlx5e_create_umr_mkeyTariq Toukan1-37/+37
In next patch we are going to create a UMR MKey per RQ, we need mlx5e_create_umr_mkey declared before mlx5e_create_rq. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02net/mlx5e: Implement Fragmented Work Queue (WQ)Tariq Toukan5-17/+105
Add new type of struct mlx5_frag_buf which is used to allocate fragmented buffers rather than contiguous, and make the Completion Queues (CQs) use it as they are big (default of 2MB per CQ in Striding RQ). This fixes the failures of type: "mlx5e_open_locked: mlx5e_open_channels failed, -12" due to dma_zalloc_coherent insufficient contiguous coherent memory to satisfy the driver's request when the user tries to setup more or larger rings. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30ethernet :mellanox :mlx5: Replace pci_pool_alloc by pci_pool_zallocSouptick Joarder1-3/+2
In alloc_cmd_box(), pci_pool_alloc() followed by memset will be replaced by pci_pool_zalloc() Signed-off-by: Souptick joarder <jrdr.linux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zallocSouptick Joarder1-4/+2
In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be replaced by pci_pool_zalloc() Signed-off-by: Souptick joarder <jrdr.linux@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29mlxsw: core: Change order of operations in removal pathIdo Schimmel1-1/+1
We call bus->init() before allocating 'lag.mapping'. Change the order of operations in removal path to reflect that. This makes the error path of mlxsw_core_bus_device_register() symmetric with mlxsw_core_bus_device_unregister(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-29mlxsw: core: Add missing rollback in error pathIdo Schimmel1-0/+1
Without this rollback, the thermal zone is still registered during the error path, whereas its private data is freed upon the destruction of the underlying bus device due to the use of devm_kzalloc(). This results in use after free. Fix this by calling mlxsw_thermal_fini() from the appropriate place in the error path. Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>