aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx4/main.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-01-19IB/mlx4: Advertise RoCE v2 supportMatan Barak1-3/+9
Advertise RoCE v2 support in port_immutable attributes according to the hardware's capabilities. This enables the verbs stack to use RoCE v2 mode. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/mlx4: Enable RoCE v2 when the IB device is addedMoni Shoua1-1/+8
If the hardware supports RoCE v2, we configure the hardware UDP port according to the RoCE v2 Annex when mlx4_ib device is added. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/mlx4: Add support for setting RoCEv2 gids in hardwareMoni Shoua1-3/+60
To tell hardware about a gid with type RoCEv2, software needs a new modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can replace the old method, MLX4_SET_PORT_GID_TABLE, for RoCEv1 gids. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-01-19IB/mlx4: Add gid_type to GID propertiesMoni Shoua1-4/+13
IB core driver adds a property of type to struct ib_gid_attr. The mlx4 driver should take that in consideration when modifying or querying the hardware gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB: remove in-kernel support for memory windowsChristoph Hellwig1-1/+0
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08mlx4: Expose correct max_sge_rd limitSagi Grimberg1-1/+1
mlx4 devices (ConnectX-2, ConnectX-3) has a limitation where rdma read work queue entries cannot exceed 512 bytes. A rdma_read wqe needs to fit in 512 bytes: - wqe control segment (16 bytes) - rdma segment (16 bytes) - scatter elements (16 bytes each) So max_sge_rd should be: (512 - 16 - 16) / 16 = 30. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/mlx4: Remove old FRWR API supportSagi Grimberg1-2/+0
No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/mlx4: Support the new memory registration APISagi Grimberg1-0/+1
Support the new memory registration API by allocating a private page list array in mlx4_ib_mr and populate it when mlx4_ib_map_mr_sg is invoked. Also, support IB_WR_REG_MR by setting the exact WQE as IB_WR_FAST_REG_MR, just take the needed information from different places: - page_size, iova, length, access flags (ib_mr) - page array (mlx4_ib_mr) - key (ib_reg_wr) The IB_WR_FAST_REG_MR handlers will be removed later when all the ULPs will be converted. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21IB/core: Add netdev and gid attributes paramteres to cacheMatan Barak1-2/+2
Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21IB/mlx4: Add support for blocking multicast loopback QP creation user flagEran Ben Elisha1-1/+2
MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is now supported downstream. In addition, this flag was supported only for IB_QPT_UD, now, with the new implementation it is supported for all QP types. Support IB_USER_VERBS_EX_CMD_CREATE_QP in order to get the flag from user space using the extension create qp command. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21IB/mlx4: Add IB counters tableEran Ben Elisha1-13/+50
This is an infrastructure step for allocating and attaching more than one counter to QPs on the same port. Allocate a counters table and manage the insertion and removals of the counters in load and unload of mlx4 IB. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-28IB/mlx4: Report checksum offload cap for RAW QP when query deviceBodong Wang1-0/+2
Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/mlx4_ib: Disassociate supportYishai Hadas1-2/+137
Implements the IB core disassociate_ucontext API. The driver detaches the HW resources for a given user context to prevent a dependency between application termination and device disconnecting. This is done by managing the VMAs that were mapped to the HW bars such as door bell and blueflame. When need to detach remap them to an arbitrary kernel page returned by the zap API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/mlx4: Replace mechanism for RoCE GID managementMoni Shoua1-486/+27
Manage RoCE gid table with logic in IB/core, which is common to all vendors, and remove the mechanism from the mlx4 IB driver. Since management of the GID cache may lead to index mismatch with the hardware GID table, a translation between indexes is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/mlx4: Implement ib_device callbacksMoni Shoua1-2/+234
get_netdev: get the net_device on the physical port of the IB transport port. In port aggregation mode it is required to return the netdev of the active port. modify_gid: note for a change in the RoCE gid cache. Handle this by writing to the harsware GID table. It is possible that indexes in cahce and hardware tables won't match so a translation is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30mlx4: Support ib_alloc_mr verbSagi Grimberg1-1/+1
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28mlx4, mlx5, mthca: Expose max_sge_rd correctlySagi Grimberg1-0/+1
Applications must not assume that max_sge and max_sge_rd are the same, Hence expose max_sge_rd correctly as well. Reported-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Optimize do_slave_initDoug Ledford1-7/+9
There is little chance our memory allocation will fail, so we can combine initializing the work structs with allocating them instead of looping through all of them once to allocate and again to initialize. Then when we need to actually find out if our device is up or in the process of going down, have all of our work structs batched up, take the spin_lock once and only once, and do all of the batch under the one spin_lock invocation instead of incurring all of the locked memory cycles we would otherwise incur to take/release the spin_lock over and over again. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Fix memory leak in do_slave_initDoug Ledford1-0/+2
We create a number of work structs to be queued up to a workqueue, and on completion of the workqueue handler, the workqueue handler frees the allocated memory. If, however, we don't queue the work struct because the device is going down, then we need to free the memory ourselves. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Optimize freeing of items on error unwindManinder Singh1-5/+3
On failure, we loop through all possible pointers and test them before calling kfree. But really, why even attempt to free items we didn't allocate when we can easily loop through exactly and only the devices for which the original memory allocation succeeded and free just those. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-14IB/mlx4: Do not attemp to report HCA clock offset on VFsMatan Barak1-5/+6
mlx4 VFs can provide CQE raw time-stamping services, but they don't have the hca core clock mapped to their PCI bars. As such, we should not attempt to query and report the clock offset to user space for VFs. Doing so causes query_device over VFs to fail with -ENOSUPP. Fixes: 4b664c4355b2 ('IB/mlx4: Add support for CQ time-stamping') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-63/+55
Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...
2015-06-15IB/mlx4: Add RoCE/IB dedicated countersEran Ben Elisha1-13/+30
This is an infrastructure step to attach all the QPs opened from the IB driver to a counter in order to collect VF stats from the PF using those counters. If the port's type is Ethernet, the counter policy demands two counters per port (one for RoCE and one for Ethernet). The port default counter (allocated in mlx4_core) is used for the Ethernet netdev QPs and we allocate another counter for RoCE. If the port's traffic is Infiniband, the counter policy demands one counter per port, so it can use the port's default counter. Also, Add 'allocated' flag for each counter in order to clean it at unload. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12IB/core: Add ability for drivers to report an alternate MAD size.Ira Weiny1-0/+2
Add max MAD size to the device immutable data set and have all drivers that support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core. Verify MAD size data in both the MAD core and when reading the immutable data. OPA drivers will report alternate MAD sizes in subsequent patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/mlx4: Add support for CQ time-stampingMatan Barak1-2/+40
This includes: * support allocation of CQ with the TIMESTAMP_COMPLETION creation flag. * add timestamp_mask and hca_core_clock to query_device, reporting the number of supported timestamp bits (mask) and the hca_core_clock frequency. * return hca core clock's offset in query_device vendor's data, this is needed in order to read the HCA's core clock. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/mlx4: Add mmap call to map the hardware clockMatan Barak1-1/+17
In order to read the HCA's cycle counter efficiently in user space, we need to map the HCA's register. This is done through mmap call. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Pass hardware specific data in query_deviceMatan Barak1-1/+5
Vendors should be able to pass vendor specific data to/from user-space via query_device uverb. In order to do this, we need to pass the vendors' specific udata. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Change ib_create_cq to use struct ib_cq_init_attrMatan Barak1-1/+3
Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-02Merge branch 'for-4.2-misc' into k.o/for-4.2Doug Ledford1-2/+3
2015-06-02IB/mlx4: Fix error paths in mlx4_ib_create_flow()Roland Dreier1-2/+3
The unwinding clean up code are err_create_flow starts at the current index i. That means we shouldn't increment i until we're really sure we won't have to destroy the current flow; otherwise we might increment the index, fail inside an is_bonded block, and end up accessing off the end of the reg_id[] array. This was detected by Coverity (CID 1271229). Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-30net/mlx4: Add EQ poolMatan Barak1-48/+23
Previously, mlx4_en allocated EQs and used them exclusively. This affected RoCE performance, as applications which are events sensitive were limited to use only the legacy EQs. Change that by introducing an EQ pool. This pool is managed by mlx4_core. EQs are assigned to ports (when there are limited number of EQs, multiple ports could be assigned to the same EQs). An exception to this rule is the ASYNC EQ which handles various events. Legacy EQs are completely removed as all EQs could be shared. When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for EQ serving on a specific port. The core driver calculates which EQ should be assigned to that request. Because IRQs are shared between IB and Ethernet modules, their names only include the PCI device BDF address. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx4_core: Demote simple multicast and broadcast flow steering rulesMatan Barak1-2/+2
In SRIOV, when simple (i.e - Ethernet L2 only) flow steering rules are created, always create them at MLX4_DOMAIN_NIC priority (instead of the real priority the function created them at). This is done in order to let multiple functions add broadcast/multicast rules without affecting other functions, which is necessary for DPDK in SRIOV. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20IB/core: Convert core to use bitfield for capsIra Weiny1-10/+5
Remove query_protocol callback Use the new Core Capability bits for: rdma_protocol_* rdma_cap_ib_mad rdma_cap_ib_smi rdma_cap_ib_cm rdma_cap_iw_cm rdma_cap_ib_sa rdma_cap_ib_mcast rdma_cap_af_ib rdma_cap_eth_ah Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-20IB/core: Add per port immutable struct to ib_deviceIra Weiny1-0/+17
As of commit 5eb620c81ce3 "IB/core: Add helpers for uncached GID and P_Key searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in the ib_device. The per port core capability flags to be added later are also immutable data to be stored in the ib_device object. In preparation for this create a structure for per port immutable data and place the pkey and gid table lengths within this structure. "get_port_immutable" is added as a mandatory device function to allow the drivers to fill in this data. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Implement new callback query_protocol()Michael Wang1-0/+10
Add new callback query_protocol() and implement for each HW. Mapping List: node-type link-layer transport protocol nes RNIC ETH IWARP IWARP amso1100 RNIC ETH IWARP IWARP cxgb3 RNIC ETH IWARP IWARP cxgb4 RNIC ETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4 IB_CA IB/ETH IB IB/IBOE mlx5 IB_CA IB IB IB ehca IB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-12infiniband: Remove duplicated KERN_<LEVEL> from pr_<level> usesJoe Perches1-2/+1
These KERN_<LEVEL> uses are unnecessary with pr_<level> and cause bad logging output so remove them. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15IB/mlx4: Change alias guids default to be host assignedYishai Hadas1-2/+2
Change the default mode to be HOST assigned instead of SM assigned. This is the expected operational mode, because it doesn't depend on SM availability. As PF generates random GUIDs as the initial admin values, this gives out of the box experience. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15IB/mlx4: Request alias GUID on demandYishai Hadas1-0/+22
Request GIDs from the SM on demand, i.e., when a VF actually needs them, and release them when the GIDs are no longer in use. In cloud environments, this is useful for GID migrations, in which a GID is assigned to a VF on the destination HCA, while the VF on the source HCA is shutdown (but the GID was not administratively released). Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-02net/mlx4: Add SET_PORT opcode modifiers enumerationIdo Shamay1-5/+6
The calls to SET_PORT used hard-code numbers, when supplying command's opcode modifiers, fix that to use well defined constants. Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18IB/mlx4: Verify net device validity on port change eventMoni Shoua1-1/+5
Processing an event is done in a different context from the one when the event was dispatched. This requires a check that the slave net device is still valid when the event is being processed. The check is done under the iboe lock which ensure correctness. Fixes: a57500903093 ('IB/mlx4: Add port aggregation support') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-21Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infinibandLinus Torvalds1-6/+5
Pull InfiniBand/RDMA updates from Roland Dreier: - Re-enable on-demand paging changes with stable ABI - Fairly large set of ocrdma HW driver fixes - Some qib HW driver fixes - Other miscellaneous changes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (43 commits) IB/qib: Add blank line after declaration IB/qib: Fix checkpatch warnings IB/mlx5: Enable the ODP capability query verb IB/core: Add on demand paging caps to ib_uverbs_ex_query_device IB/core: Add support for extended query device caps RDMA/cxgb4: Don't hang threads forever waiting on WR replies RDMA/ocrdma: Fix off by one in ocrdma_query_gid() RDMA/ocrdma: Use unsigned for bit index RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bit RDMA/ocrdma: Update the ocrdma module version string RDMA/ocrdma: set vlan present bit for user AH RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structure RDMA/ocrdma: Add support for interrupt moderation RDMA/ocrdma: Honor return value of ocrdma_resolve_dmac RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the QP RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATE RDMA/ocrdma: Host crash on destroying device resources RDMA/ocrdma: Report correct state in ibv_query_qp RDMA/ocrdma: Debugfs enhancments for ocrdma driver RDMA/ocrdma: Report correct count of interrupt vectors while registering ocrdma device ...
2015-02-16IB/mlx4: Fix wrong usage of IPv4 protocol for multicast attach/detachOr Gerlitz1-5/+5
The MLX4_PROT_IB_IPV4 protocol should only be used with RoCEv2 and such. Removing this wrong usage allows to run multicast applications over RoCE. Fixes: d487ee77740c ("IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table") Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-09IB/mlx4: Reset flow support for IB kernel ULPsYishai Hadas1-0/+64
The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09IB/mlx4: Always use the correct port for mirrored multicast attachmentsMoni Shoua1-1/+5
When attaching a QP to a multicast address in bonded mode, there was an assumption that the port of the QP must be #1. This assumption isn't the case under the flow which enables maximal usage of the physical ports. Fix it by always checking the port of the original flow and create the mirrored flow on the other port. Fixes: c6215745b66a ('IB/mlx4: Load balance ports in port aggregation mode') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04IB/mlx4: Load balance ports in port aggregation modeMoni Shoua1-0/+1
When the mlx4 IB (RoCE) device works in link aggregation mode, it exposes a single port to upper layers. Therefore, applications always set '1' in port_num attribute when modifying a QP or creating an address handle. To make sure that a node uses all available ports the mlx4 driver will override the port_num attribute with a round robin policy. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04IB/mlx4: Create mirror flows in port aggregation modeMoni Shoua1-12/+72
In port aggregation mode flows for port #1 (the only port) should be mirrored on port #2. This is because packets can arrive from either physical ports. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04IB/mlx4: Add port aggregation supportMoni Shoua1-6/+70
Register the interface with the mlx4 core driver with port aggregation support and check for port aggregation mode when the 'add' function is called. In this mode, only one physical port is reported to the upper layer (RoCE/IB core stack and ULPs). Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+2
Conflicts: arch/arm/boot/dts/imx6sx-sdb.dts net/sched/cls_bpf.c Two simple sets of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25net/mlx4_core: Maintain a persistent memory for mlx4 deviceYishai Hadas1-7/+10
Maintain a persistent memory that should survive reset flow/PCI error. This comes as a preparation for coming series to support above flows. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15net/mlx4: Don't disable vxlan offloads under DMFS-A0 optimized steeringOr Gerlitz1-1/+2
Except for VXLAN steering rules, all offloads should work as they were under plain DMFS mode. Fix that by enabling all the offloads under DMFS-A0 mode, except for VXLAN steering rules. Fixes: d57febe1a478 "net/mlx4: Add A0 hybrid steering" Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>