aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx4 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-07-25net/mlx4_core: Check device state before unregistering itAlex Vesker1-0/+3
Verify that the device state is registered before un-registering it. This check is required to prevent an OOPS on flows that do re-registration of the device and its previous state was unregistered. Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-40/+136
Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20net/mlx4_en: use READ_ONCE when freeing xdp_progBrenden Blanco1-2/+4
For consistency, and in order to hint at the synchronous nature of the xdp_prog field, use READ_ONCE in the destroy path of the ring. All occurrences should now use either READ_ONCE or xchg. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: add xdp forwarding and data write supportBrenden Blanco5-8/+211
A user will now be able to loop packets back out of the same port using a bpf program attached to xdp hook. Updates to the packet contents from the bpf program is also supported. For the packet write feature to work, the rx buffers are now mapped as bidirectional when the page is allocated. This occurs only when the xdp hook is active. When the program returns a TX action, enqueue the packet directly to a dedicated tx ring, so as to avoid completely any locking. This requires the tx ring to be allocated 1:1 for each rx ring, as well as the tx completion running in the same softirq. Upon tx completion, this dedicated tx ring recycles pages without unmapping directly back to the original rx ring. In steady state tx/drop workload, effectively 0 page allocs/frees will occur. In order to separate out the paths between free and recycle, a free_tx_desc func pointer is introduced that is optionally updated whenever recycle_ring is activated. By default the original free function is always initialized. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: break out tx_desc write into separate functionBrenden Blanco1-58/+76
In preparation for writing the tx descriptor from multiple functions, create a helper for both normal and blueflame access. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: add page recycle to prepare rx ring for tx supportBrenden Blanco3-10/+109
The mlx4 driver by default allocates order-3 pages for the ring to consume in multiple fragments. When the device has an xdp program, this behavior will prevent tx actions since the page must be re-mapped in TODEVICE mode, which cannot be done if the page is still shared. Start by making the allocator configurable based on whether xdp is running, such that order-0 pages are always used and never shared. Since this will stress the page allocator, add a simple page cache to each rx ring. Pages in the cache are left dma-mapped, and in drop-only stress tests the page allocator is eliminated from the perf report. Note that setting an xdp program will now require the rings to be reconfigured. Before: 26.91% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq 17.88% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags 6.00% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag 4.49% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist 3.21% swapper [kernel.vmlinux] [k] intel_idle 2.73% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem 2.57% swapper [mlx4_en] [k] mlx4_en_process_rx_cq After: 31.72% swapper [kernel.vmlinux] [k] intel_idle 8.79% swapper [mlx4_en] [k] mlx4_en_process_rx_cq 7.54% swapper [kernel.vmlinux] [k] poll_idle 6.36% swapper [mlx4_core] [k] mlx4_eq_int 4.21% swapper [kernel.vmlinux] [k] tasklet_action 4.03% swapper [kernel.vmlinux] [k] cpuidle_enter_state 3.43% swapper [mlx4_en] [k] mlx4_en_prepare_rx_desc 2.18% swapper [kernel.vmlinux] [k] native_irq_return_iret 1.37% swapper [kernel.vmlinux] [k] menu_select 1.09% swapper [kernel.vmlinux] [k] bpf_map_lookup_elem Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: add support for fast rx drop bpf programBrenden Blanco3-4/+102
Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver. In tc/socket bpf programs, helpers linearize skb fragments as needed when the program touches the packet data. However, in the pursuit of speed, XDP programs will not be allowed to use these slower functions, especially if it involves allocating an skb. Therefore, disallow MTU settings that would produce a multi-fragment packet that XDP programs would fail to access. Future enhancements could be done to increase the allowable MTU. The xdp program is present as a per-ring data structure, but as of yet it is not possible to set at that granularity through any ndo. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: Add resilience in low memory systemsEugenia Emantayev3-37/+132
This patch fixes the lost of Ethernet port on low memory system, when driver frees its resources and fails to allocate new resources. Issue could happen while changing number of channels, rings size or changing the timestamp configuration. This fix is necessary because of removing vmap use in the code. When vmap was in use driver could allocate non-contiguous memory and make it contiguous with vmap. Now it could fail to allocate a large chunk of contiguous memory and lose the port. Current code tries to allocate new resources and then upon success frees the old resources. Fixes: 73898db04301 ('net/mlx4: Avoid wrong virtual mappings') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19net/mlx4_en: Move filters cleanup to a proper locationEugenia Emantayev2-3/+4
Filters cleanup should be done once before destroying net device, since filters list is contained in the private data. Fixes: 1eb8c695bda9 ('net/mlx4_en: Add accelerated RFS support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04net/mlx4: Fix some indent inconsistancyChristophe Jaillet5-19/+17
Silent a few smatch warnings about indentation. This include the removal of a 'return' statement in 'resource_tracker.c'. This 'return' will still be performed when breaking out of the corresponding 'switch' block. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-14/+39
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-23net/mlx4_en: Add DCB PFC support through CEE netlink commandsRana Shahout7-14/+329
This patch adds support for reading and updating priority flow control (PFC) attributes in the driver via netlink. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-22net/mlx4_en: Avoid unregister_netdev at shutdown flowEran Ben Elisha2-3/+19
This allows a clean shutdown, even if some netdev clients do not release their reference from this netdev. It is enough to release the HW resources only as the kernel is shutting down. Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-22net/mlx4_en: Fix the return value of a failure in VLAN VID add/killKamal Heib1-7/+11
Modify mlx4_en_vlan_rx_[add/kill]_vid to return error value in case of failure. Fixes: 8e586137e6b6 ('net: make vlan ndo_vlan_rx_[add/kill]_vid return error value') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17mlx4_en: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_portAlexander Duyck3-30/+20
This change replaces the network device operations for adding or removing a VXLAN port with operations that are more generically defined to be used for any UDP offload port but provide a type. As such by just adding a line to verify that the offload type is VXLAN we can maintain the same functionality. In addition I updated the socket address family check so that instead of excluding IPv6 we instead abort of type is not IPv4. This makes much more sense as we should only be supporting IPv4 outer addresses on this hardware. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16mlx4e: Do not attempt to offload VXLAN ports that are unrecognizedAlexander Duyck1-3/+8
The mlx4e driver does not support more than one port for VXLAN offload. As such expecting the hardware to offload other ports is invalid since it appears the parsing logic is used to perform Tx checksum and segmentation offloads. Use the vxlan_port number to determine in which cases we can apply the offload and in which cases we can not. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net/mlx4_en: initialize cmd.context_lock spinlock earlierEric Dumazet2-1/+1
Maciej Żenczykowski reported lockdep warning a spinlock was not registered before being held in mlx4_cmd_wake_completions() cmd.context_lock initialization is not at the right place. 1) mlx4_cmd_use_events() can be called multiple times. Calling spin_lock_init() on a live spinlock can lead to hangs. 2) mlx4_cmd_wake_completions() can be called while lock has not been initialized. Lockdep complains, and current logic is not race prone. It seems better to move the initialization earlier in mlx4_load_one() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx4_en: fix ethtool -xEric Dumazet1-13/+12
mlx4 RSS is limited to spread incoming packets to a power of two number of queues. An uniformly distibuted traffic would be split on queues 0 to N-1, N being a power of two, each queue having a 1/N weight. If number of RX queues is not a power of two, upper RX queues do not receive traffic. ethtool -x is lying, because it pretends some queues have higher weight. Before patch: lpaa24:~# ethtool -L eth1 rx 24 lpaa24:~# ethtool -x eth1 RX flow hash indirection table for eth1 with 24 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 8 9 10 11 12 13 14 15 16: 0 1 2 3 4 5 6 7 RSS hash key: e0:7c:3a:89:07:55:b6:58:69:cc:f4:e5:24:62:e3:25:88:6c:42:5b:d2:cb:9a:d2:e0:06:e1:dc:f9:09:a1:89:0f:a0:30:43:73:6f:0c:b6 If this information was correct, user space tools could expect queues 0 to 7 to receive twice more traffic than queues 8 to 15 After patch : lpaa24:~# ethtool -L eth1 rx 24 lpaa24:~# ethtool -x eth1 RX flow hash indirection table for eth1 with 24 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 8 9 10 11 12 13 14 15 RSS hash key: da:7b:09:60:f1:ac:67:b4:d0:72:d4:ec:a2:e5:80:0a:ad:50:22:1a:f8:f9:66:54:5f:22:45:c3:88:f4:57:82:c1:c1:90:ed:70:cb:40:ce lpaa24:~# ethtool -X eth1 equal 8 lpaa24:~# ethtool -x eth1 RX flow hash indirection table for eth1 with 24 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 0 1 2 3 4 5 6 7 RSS hash key: da:7b:09:60:f1:ac:67:b4:d0:72:d4:ec:a2:e5:80:0a:ad:50:22:1a:f8:f9:66:54:5f:22:45:c3:88:f4:57:82:c1:c1:90:ed:70:cb:40:ce Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Wei Wang <weiwan@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09net/mlx4_en: mlx4_en_netpoll() should schedule TX, not RXEric Dumazet1-2/+2
I am not sure mlx4_en_netpoll() is doing anything useful right now. mlx4 has different NAPI structures for RX and TX, and netpoll only wants to drain TX queues. Lets schedule NAPI polls on TX, not RX. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25net/mlx4_en: get rid of private net_device_statsEric Dumazet4-15/+5
We simply can use the standard net_device stats. We do not need to clear fields that are already 0. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25net/mlx4_en: get rid of ret_statsEric Dumazet2-6/+6
mlx4 uses a private struct net_device_stats in a vain attempt to avoid races. This is buggy because multiple cpus could call mlx4_en_get_stats() at the same time, so ret_stats can not guarantee stable results. To fix this, we need to switch to ndo_get_stats64() as this method provides per-thread storage. This allows to reduce mlx4_en_priv bloat. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25net/mlx4_en: clear some TX ring stats in mlx4_en_clear_stats()Eric Dumazet1-0/+4
mlx4_en_clear_stats() clears about everything but few TX ring fields are missing : - queue_stopped, wake_queue, tso_packets, xmit_more Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25net/mlx4_en: fix tx_dropped bugEric Dumazet4-5/+9
1) mlx4_en_xmit() can increment priv->stats.tx_dropped, but this variable is overwritten in mlx4_en_DUMP_ETH_STATS(). 2) This increment was not SMP safe, as a port might have many TX queues. Add a per TX ring tx_dropped to fix these issues. This is u32 as mlx4_en_DUMP_ETH_STATS() will add a 32bit field. So lets avoid bugs with SNMP agents having to cope with partial overwraps. (One of these agents being bond_fold_stats()) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16net/mlx4_core: Fix access to uninitialized indexTariq Toukan1-2/+2
Prevent using uninitialized or negative index when handling steering entries. Fixes: b12d93d63c32 ('mlx4: Add support for promiscuous mode in the new steering model.') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-05net/mlx4_en: Fix endianness bug in IPV6 csum calculationDaniel Jurgens1-1/+1
Use htons instead of unconditionally byte swapping nexthdr. On a little endian systems shifting the byte is correct behavior, but it results in incorrect csums on big endian architectures. Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Carol Soto <clsoto@us.ibm.com> Tested-by: Carol Soto <clsoto@us.ibm.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-05net/mlx4: Avoid wrong virtual mappingsHaggai Abramovsky7-117/+45
The dma_alloc_coherent() function returns a virtual address which can be used for coherent access to the underlying memory. On some architectures, like arm64, undefined behavior results if this memory is also accessed via virtual mappings that are not coherent. Because of their undefined nature, operations like virt_to_page() return garbage when passed virtual addresses obtained from dma_alloc_coherent(). Any subsequent mappings via vmap() of the garbage page values are unusable and result in bad things like bus errors (synchronous aborts in ARM64 speak). The mlx4 driver contains code that does the equivalent of: vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the device is opened. Prevent Ethernet driver to run this problematic code by forcing it to allocate contiguous memory. As for the Infiniband driver, at first we are trying to allocate contiguous memory, but in case of failure roll back to work with fragmented memory. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reported-by: David Daney <david.daney@cavium.com> Tested-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04net/mlx4_en: Add support for inner IPv6 checksum offloads and TSOAlexander Duyck2-7/+33
>From what I can tell the ConnectX-3 will support an inner IPv6 checksum and segmentation offload, however it cannot support outer IPv6 headers. This assumption is based on the fact that I could see the checksum being offloaded for inner header on IPv4 tunnels, but not on IPv6 tunnels. For this reason I am adding the feature to the hw_enc_features and adding an extra check to the features_check call that will disable GSO and checksum offload in the case that the encapsulated frame has an outer IP version of that is not 4. The check in mlx4_en_features_check could be removed if at some point in the future a fix is found that allows the hardware to offload segmentation/checksum on tunnels with an outer IPv6 header. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04net/mlx4_en: Add support for UDP tunnel segmentation with outer checksum offloadAlexander Duyck1-4/+13
This patch assumes that the mlx4 hardware will ignore existing IPv4/v6 header fields for length and checksum as well as the length and checksum fields for outer UDP headers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+4
Minor overlapping changes in the conflicts. In the macsec case, the change of the default ID macro name overlapped with the 64-bit netlink attribute alignment fixes in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26net/mlx4_en: fix spurious timestamping callbacksEric Dumazet1-2/+4
When multiple skb are TX-completed in a row, we might incorrectly keep a timestamp of a prior skb and cause extra work. Fixes: ec693d47010e8 ("net/mlx4_en: Add HW timestamping (TS) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller7-25/+89
Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21mlx4: protect mlx4_en_start_port in mlx4_en_restart with rtnl_lockHannes Frederic Sowa1-0/+2
mlx4_en_start_port requires rtnl_lock to be held. Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net/mlx4_en: Split SW RX dropped counter per RX ringEran Ben Elisha4-3/+10
Count SW packet drops per RX ring instead of a global counter. This will allow monitoring the number of rx drops per ring. In addition, SW rx_dropped counter was overwritten by HW rx_dropped counter, sum both of them instead to show the accurate value. Fixes: a3333b35da16 ('net/mlx4_en: Moderate ethtool callback to [...] ') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net/mlx4_core: Don't allow to VF change global pause settingsEugenia Emantayev2-0/+15
Currently changing global pause settings is done via SET_PORT command with input modifier GENERAL. This command is allowed for each VF since MTU setting is done via the same command. Change the above to the following scheme: before passing the request to the FW, the PF will check whether it was issued by a slave. If yes, don't change global pause and warn, otherwise change to the requested value and store for further reference. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net/mlx4_core: Avoid repeated calls to pci enable/disableDaniel Jurgens1-5/+34
Maintain the PCI status and provide wrappers for enabling and disabling the PCI device. Performing the actions more than once without doing its opposite results in warning logs. This occurred when EEH hotplugged the device causing a warning for disabling an already disabled device. Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21net/mlx4_core: Implement pci_resume callbackDaniel Jurgens1-15/+24
Move resume related activities to a new pci_resume function instead of performing them in mlx4_pci_slot_reset. This change is needed to avoid a hotplug during EEH recovery due to commit f2da4ccf8bd4 ("powerpc/eeh: More relaxed hotplug criterion"). Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19net/mlx4_en: do batched put_page using atomic_subKonstantin Khlebnikov1-2/+6
This patch fixes couple error paths after allocation failures. Atomic set of page reference counter is safe only if it is zero, otherwise set can race with any speculative get_page_unless_zero. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-19net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOCKonstantin Khlebnikov1-1/+1
High order pages are optional here since commit 51151a16a60f ("mlx4: allow order-0 memory allocations in RX path"), so here is no reason for depleting reserves. Generic __netdev_alloc_frag() implements the same logic. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20net/mlx4: remove unused array zero_gid[]Colin Ian King1-2/+0
zero_gid is not used, so remove this redundant array. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds10-218/+344
Pull networking updates from David Miller: "Highlights: 1) Support more Realtek wireless chips, from Jes Sorenson. 2) New BPF types for per-cpu hash and arrap maps, from Alexei Starovoitov. 3) Make several TCP sysctls per-namespace, from Nikolay Borisov. 4) Allow the use of SO_REUSEPORT in order to do per-thread processing of incoming TCP/UDP connections. The muxing can be done using a BPF program which hashes the incoming packet. From Craig Gallek. 5) Add a multiplexer for TCP streams, to provide a messaged based interface. BPF programs can be used to determine the message boundaries. From Tom Herbert. 6) Add 802.1AE MACSEC support, from Sabrina Dubroca. 7) Avoid factorial complexity when taking down an inetdev interface with lots of configured addresses. We were doing things like traversing the entire address less for each address removed, and flushing the entire netfilter conntrack table for every address as well. 8) Add and use SKB bulk free infrastructure, from Jesper Brouer. 9) Allow offloading u32 classifiers to hardware, and implement for ixgbe, from John Fastabend. 10) Allow configuring IRQ coalescing parameters on a per-queue basis, from Kan Liang. 11) Extend ethtool so that larger link mode masks can be supported. From David Decotigny. 12) Introduce devlink, which can be used to configure port link types (ethernet vs Infiniband, etc.), port splitting, and switch device level attributes as a whole. From Jiri Pirko. 13) Hardware offload support for flower classifiers, from Amir Vadai. 14) Add "Local Checksum Offload". Basically, for a tunneled packet the checksum of the outer header is 'constant' (because with the checksum field filled into the inner protocol header, the payload of the outer frame checksums to 'zero'), and we can take advantage of that in various ways. From Edward Cree" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits) bonding: fix bond_get_stats() net: bcmgenet: fix dma api length mismatch net/mlx4_core: Fix backward compatibility on VFs phy: mdio-thunder: Fix some Kconfig typos lan78xx: add ndo_get_stats64 lan78xx: handle statistics counter rollover RDS: TCP: Remove unused constant RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket net: smc911x: convert pxa dma to dmaengine team: remove duplicate set of flag IFF_MULTICAST bonding: remove duplicate set of flag IFF_MULTICAST net: fix a comment typo ethernet: micrel: fix some error codes ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it bpf, dst: add and use dst_tclassid helper bpf: make skb->tc_classid also readable net: mvneta: bm: clarify dependencies cls_bpf: reset class and reuse major in da ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c ldmvsw: Add ldmvsw.c driver code ...
2016-03-18net/mlx4_core: Fix backward compatibility on VFsEli Cohen1-6/+18
Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless of system page size") introduced dependency where old VF drivers without this fix fail to load if the PF driver runs with this commit. To resolve this add a module parameter which disables that functionality by default. If both the PF and VFs are running with a driver with that commit the administrator may set the module param to true. The module parameter is called enable_4k_uar. Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...') Signed-off-by: Eli Cohen <eli@mellanox.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-5/+4
Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...
2016-03-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds2-3/+8
Pull rdma updates from Doug Ledford: "Initial roundup of 4.6 merge window patches. This is the first of two pull requests. It is the smaller request, but touches for more different things (this is everything but what is in or going into staging). The pull request for the code in staging/rdma is on hold until after we decide what to do on the write/writev API issue and may be partially deferred until 4.7 as a result. Summary: - cxgb4 updates - nes updates - unification of iwarp portmapper code to core - add drain_cq API - various ib_core updates - minor ipoib updates - minor mlx4 updates - more significant mlx5 updates (including a minor merge conflict with net-next tree...merge is simple to resolve and Stephen's resolution was confirmed by Mellanox) - trivial net/9p rdma conversion - ocrdma RoCEv2 update - srpt updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (85 commits) iwpm: crash fix for large connections test iw_cxgb3: support for iWARP port mapping iw_cxgb4: remove port mapper related code iw_nes: remove port mapper related code iwcm: common code for port mapper net/9p: convert to new CQ API IB/mlx5: Add support for don't trap rules net/mlx5_core: Introduce forward to next priority action net/mlx5_core: Create anchor of last flow table iser: Accept arbitrary sg lists mapping if the device supports it mlx5: Add arbitrary sg list support IB/core: Add arbitrary sg_list support IB/mlx5: Expose correct max_fast_reg_page_list_len IB/mlx5: Make coding style more consistent IB/mlx5: Convert UMR CQ to new CQ API IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Skip using unneeded intermediate variable IB/ocrdma: Delete unnecessary variable initialisations in 11 functions IB/core: Documentation fix in the MAD header file IB/core: trivial prink cleanup. ...
2016-03-17mm: introduce page reference manipulation functionsJoonsoo Kim1-5/+4
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-14mlx4: add missing braces in verify_qp_parametersArnd Bergmann1-1/+2
The implementation of QP paravirtualization back in linux-3.7 included some code that looks very dubious, and gcc-6 has grown smart enough to warn about it: drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'verify_qp_parameters': drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3154:5: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation] if (optpar & MLX4_QP_OPTPAR_ALT_ADDR_PATH) { ^~ drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3144:4: note: ...this 'if' clause, but it is not if (slave != mlx4_master_func_num(dev)) >From looking at the context, I'm reasonably sure that the indentation is correct but that it should have contained curly braces from the start, as the update_gid() function in the same patch correctly does. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 54679e148287 ("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop") Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13mlx4: use napi_consume_skb API to get bulk free operationsJesper Dangaard Brouer1-6/+9
Bulk free of SKBs happen transparently by the API call napi_consume_skb(). The napi budget parameter is usually needed by napi_consume_skb() to detect if called from netpoll. In this patch it has an extra meaning. For mlx4 driver, the mlx4_en_stop_port() call is done outside NAPI/softirq context, and cleanup the entire TX ring via mlx4_en_free_tx_buf(). The code mlx4_en_free_tx_desc() for freeing SKBs are shared with NAPI calls. To handle this shared use the zero budget indication is reused, and handled appropriately in napi_consume_skb(). To reflect this, variable is called napi_mode for the function call that needed this distinction. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-14/+22
Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03net: mellanox: add DEVLINK dependenciesArnd Bergmann1-0/+1
The new NET_DEVLINK infrastructure can be a loadable module, but the drivers using it might be built-in, which causes link errors like: drivers/net/built-in.o: In function `mlx4_load_one': :(.text+0x2fbfda): undefined reference to `devlink_port_register' :(.text+0x2fc084): undefined reference to `devlink_port_unregister' drivers/net/built-in.o: In function `mlxsw_sx_port_remove': :(.text+0x33a03a): undefined reference to `devlink_port_type_clear' :(.text+0x33a04e): undefined reference to `devlink_port_unregister' There are multiple ways to avoid this: a) add 'depends on NET_DEVLINK || !NET_DEVLINK' dependencies for each user b) use 'select NET_DEVLINK' from each driver that uses it and hide the symbol in Kconfig. c) make NET_DEVLINK a 'bool' option so we don't have to list it as a dependency, and rely on the APIs to be stubbed out when it is disabled d) use IS_REACHABLE() rather than IS_ENABLED() to check for NET_DEVLINK in include/net/devlink.h This implements a variation of approach a) by adding an intermediate symbol that drivers can depend on, and changes the three drivers using it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 09d4d087cd48 ("mlx4: Implement devlink interface") Fixes: c4745500e988 ("mlxsw: Implement devlink interface") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03net: relax setup_tc ndo op handle restrictionJohn Fastabend1-1/+1
I added this check in setup_tc to multiple drivers, if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO) Unfortunately restricting to TC_H_ROOT like this breaks the old instantiation of mqprio to setup a hardware qdisc. This patch relaxes the test to only check the type to make it equivalent to the check before I broke it. With this the old instantiation continues to work. A good smoke test is to setup mqprio with, # tc qdisc add dev eth4 root mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7 Fixes: e4c6734eaab9 ("net: rework ndo tc op to consume additional qdisc handle paramete") Reported-by: Singh Krishneil <krishneil.k.singh@intel.com> Reported-by: Jake Keller <jacob.e.keller@intel.com> CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>