aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx4 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-03-20net/mlx4: Adapt code for N-Port VFMatan Barak3-18/+31
Adds support for N-Port VFs, this includes: 1. Adding support in the wrapped FW command In wrapped commands, we need to verify and convert the slave's port into the real physical port. Furthermore, when sending the response back to the slave, a reverse conversion should be made. 2. Adjusting sqpn for QP1 para-virtualization The slave assumes that sqpn is used for QP1 communication. If the slave is assigned to a port != (first port), we need to adjust the sqpn that will direct its QP1 packets into the correct endpoint. 3. Adjusting gid[5] to modify the port for raw ethernet In B0 steering, gid[5] contains the port. It needs to be adjusted into the physical port. 4. Adjusting number of ports in the query / ports caps in the FW commands When a slave queries the hardware, it needs to view only the physical ports it's assigned to. 5. Adjusting the sched_qp according to the port number The QP port is encoded in the sched_qp, thus in modify_qp we need to encode the correct port in sched_qp. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20IB/mlx4_ib: Adapt code to use caps.num_ports instead of a constantMatan Barak1-4/+4
Some code in the mlx4 IB driver stack assumed MLX4_MAX_PORTS ports. Instead, we should only loop until the number of actual ports in i the device, which is stored in dev->caps.num_ports. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12mlx4: Activate RoCE/SRIOVJack Morgenstein1-8/+0
To activate RoCE/SRIOV, need to remove the following: 1. In mlx4_ib_add, need to remove the error return preventing initialization of a RoCE port under SRIOV. 2. In update_vport_qp_params (in resource_tracker.c) need to remove the error return when a RoCE RC or UD qp is detected. This error return causes the INIT-to-RTR qp transition to fail in the wrapper function under RoCE/SRIOV. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12mlx4_ib: Fix SIDR support of for UD QPs under SRIOV/RoCEShani Michaelli1-15/+57
* Handle CM_SIDR_REQ_ATTR_ID and CM_SIDR_REP_ATTR_ID in multiplex_cm_handler and demux_cm_handler. * Handle Service ID Resolution messages and REQ messages separately, for their formats are different. Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12mlx4: Implement IP based gids support for RoCE/SRIOVJack Morgenstein5-24/+84
Since there is no connection between the MAC/VLAN and the GID when using IP-based addressing, the proxy QP1 (running on the slave) must pass the source-mac, destination-mac, and vlan_id information separately from the GID. Additionally, the Host must pass the remote source-mac and vlan_id back to the slave, This is achieved as follows: Outgoing MADs: 1. Source MAC: obtained from the CQ completion structure (struct ib_wc, smac field). 2. Destination MAC: obtained from the tunnel header 3. vlan_id: obtained from the tunnel header. Incoming MADs 1. The source (i.e., remote) MAC and vlan_id are passed in the tunnel header to the proxy QP1. VST mode support: For outgoing MADs, the vlan_id obtained from the header is discarded, and the vlan_id specified by the Hypervisor is used instead. For incoming MADs, the incoming vlan_id (in the wc) is discarded, and the "invalid" vlan (0xffff) is substituted when forwarding to the slave. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12mlx4: Add ref counting to port MAC table for RoCEJack Morgenstein2-43/+261
The IB side of RoCE requires the MAC table index of the MAC address used by its QPs. To obtain the real MAC index, the IB side registers the MAC (increasing its ref count, and also returning the real MAC index) during the modify-qp sequence. This protects against the ETH side deleting or modifying that MAC table entry while the QP is active. Note that until the modify-qp command returns success, the MAC and VLAN information only has "candidate" status. If the modify-qp succeeds, the "candidate" info is promoted to the operational MAC/VLAN info for the qp. If the modify fails, the candidate MAC/VLAN is unregistered, and the old qp info is preserved. The patch is a bit complex, because there are multiple qp transitions where the primary-path information may be modified: INIT-to-RTR, and SQD-to-SQD. Similarly for the alternate path information. Therefore the code must handle cases where path information has already been entered into the QP context by previous qp transitions. For the MAC address, the success logic is as follows: 1. If there was no previous MAC, simply move the candidate MAC information to the operational information, and reset the candidate MAC info. 2. If there was a previous MAC, unregister it. Then move the MAC information from candidate to operational, and reset the candidate info (as in 1. above). The MAC address failure logic is the same for all cases: - Unregister the candidate MAC, and reset the candidate MAC info. For Vlan registration, the logic is similar. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12mlx4: In RoCE allow guests to have multiple GIDSJack Morgenstein1-4/+30
The GIDs are statically distributed, as follows: PF: gets 16 GIDs VFs: Remaining GIDS are divided evenly between VFs activated by the driver. If the division is not even, lower-numbered VFs get an extra GID. For an IB interface, the number of gids per guest remains as before: one gid per guest. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12mlx4: Adjust QP1 multiplexing for RoCE/SRIOVJack Morgenstein3-20/+67
This requires the following modifications: 1. Fix build_mlx4_header to properly fill in the ETH fields 2. Adjust mux and demux QP1 flow to support RoCE. This commit still assumes only one GID per slave for RoCE. The commit enabling multiple GIDs is a subsequent commit, and is done separately because of its complexity. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-25net,IB/mlx: Bump all Mellanox driver versionsAmir Vadai1-2/+2
Bump all Mellanox driver versions. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'usnic' into for-nextRoland Dreier1-50/+133
2014-02-13IB: Report using RoCE IP based gids in port capsMoni Shoua1-1/+1
For userspace RoCE UD QPs we need to know the GID format that the kernel uses, e.g when working over older kernels. For that end, add a new port capability IB_PORT_IP_BASED_GIDS and report it when query port is issued. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13IB/mlx4: Build the port IBoE GID table properly under bondingMoni Shoua1-4/+37
When scanning netdevices we need to check a few more conditions and cases to build the IBoE GID table properly. For example, under bonding we must make sure that when a port is down, the bond IP address isn't programmed as a GID, since doing so will cause failure with IB core flows that selects ports by GID. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13IB/mlx4: Do IBoE GID table resets per-portMoni Shoua1-17/+18
The IBoE code used to reset the GID table did it for all Ethernet ports of the device. Since the whole architecture of generating GIDs and responding to events is port-based, this is inefficient and can lead to wrong content in the GID table. Change the reset flow to be per-port. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13IB/mlx4: Do IBoE locking earlier when initializing the GID tableMoni Shoua1-3/+3
Updating the GID table under IBoE requires read/write from/to shared data structures. These data structures are protected with the device iboe lock. The flows that modify the GID table start from 1. Initializing the GID table 2. NETDEV events 3. INET or INET6 events This patch makes sure that the flow of initializing the GID table is consistent with the other two flows w.r.t on what step the lock is taken. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13IB/mlx4: Move rtnl locking to the right placeMoni Shoua1-2/+2
On the one hand, the invocation of netdev_master_upper_dev_get() within mlx4_ib_scan_netdevs() must be done with rtnl lock held. On the other hand, it's wrong to call rtnl_lock() from within this function since it's also called by our netdev notifier callback. Therefore move the locking to mlx4_ib_add() so that both cases are covered. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13IB/mlx4: Make sure GID index 0 is always occupiedMoni Shoua1-24/+68
Make sure that for Ethernet ports, the port GID table index 0 is always occupied with a default GID of the relevant IPv6 link-local adderss. This provides better user experience for legacy applications that don't use the RDMA CM and were working on index 0 prior to the IP addressing change. Also, as GIDs are generated from IP addresses of the network devices that are associated with the port, it's basically possible that the GID table will be empty if no IP address was assigned. This doesn't comply with the IB spec section 4.1.1 "GID usage and properties". Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only deviceMatan Barak1-1/+6
When the device has only Ethernet ports, don't try to allocate range of steerable UD QPs since they aren't needed. This fixes an issue where mlx4 VFs tried to allocate a range of UD steerable QPs, but failed to do so. Fixes: c1c98501121e ("IB/mlx4: Add support for steerable IB UD QPs") Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22Merge branch 'ip-roce' into for-nextRoland Dreier6-199/+437
Conflicts: drivers/infiniband/hw/mlx4/main.c
2014-01-22Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-nextRoland Dreier4-22/+323
2014-01-19IB/mlx4: Use IS_ENABLED(CONFIG_IPV6)Roland Dreier1-6/+6
...instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19IB/mlx4: Add dependency INETMatan Barak1-1/+1
Since mlx4_ib supports IP based addressing, a dependency on INET needs to be added, since mlx4_ib registers itself for net device events. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18IB/mlx4: Handle Ethernet L2 parameters for IP based GID addressingMoni Shoua4-58/+99
IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN. Therefore, we need to extract them from the CQE and place them in struct ib_wc (to be used for cases were they were taken from the gid). Also, when modifying a QP or building address handle, instead of parsing the dgid to get the MAC and VLAN, take them from the address handle attributes. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID tableMoni Shoua2-143/+334
Currently, the mlx4 driver set IBoE (RoCE) gids to encode related Ethernet netdevice interface MAC address and possibly VLAN id. Change this scheme such that gids encode interface IP addresses (both IP4 and IPv6). This requires learning the IP addresses which are of use by a netdevice associated with the HCA port, formatting them to gids and adding them to the port gid table. Furthermore, events of add and delete address are caught to maintain the gid table accordingly. Associated IP addresses may belong to a master of an Ethernet netdevice on top of that port so this should be considered when building and maintaining the gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-18IB/mlx4: Fix error return codeJulia Lawall1-2/+6
Set the return variable to an error code as done elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> ( if@p1 (\(ret < 0\|ret != 0\)) { ... return ret; } | ret@p1 = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14IB/core: Ethernet L2 attributes in verbs/cm structuresMatan Barak1-2/+7
This patch add the support for Ethernet L2 attributes in the verbs/cm/cma structures. When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority in a similar manner that the IB L2 (and the L4 PKEY) attributes are used. Thus, those attributes were added to the following structures: * ib_ah_attr - added dmac * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority) * ib_wc - added smac, vlan_id * ib_sa_path_rec - added smac, dmac, vlan_id * cm_av - added smac and vlan_id For the path record structure, extra care was taken to avoid the new fields when packing it into wire format, so we don't break the IB CM and SA wire protocol. On the active side, the CM fills. its internal structures from the path provided by the ULP. We add there taking the ETH L2 attributes and placing them into the CM Address Handle (struct cm_av). On the passive side, the CM fills its internal structures from the WC associated with the REQ message. We add there taking the ETH L2 attributes from the WC. When the HW driver provides the required ETH L2 attributes in the WC, they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core code checks for the presence of these flags, and in their absence does address resolution from the ib_init_ah_from_wc() helper function. ib_modify_qp_is_ok is also updated to consider the link layer. Some parameters are mandatory for Ethernet link layer, while they are irrelevant for IB. Vendor drivers are modified to support the new function signature. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14IB/mlx4: Add support for steerable IB UD QPsMatan Barak3-7/+163
This patch adds support for steerable (NETIF) QP creation. When we create the device, we allocate a range of steerable QPs. Afterward when a QP is created with the NETIF flag, it's allocated from this range. Allocation is managed by bitmap allocator. Internal steering rules for those QPs is automatically generated on their creation. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14IB/mlx4: Add mechanism to support flow steering over IB linksMatan Barak1-1/+134
The mlx4 device requires adding IB flow spec to rules that apply over infiniband link layer. This patch adds a mechanism to add such a rule. If higher levels e.g. IP/UDP/TCP flow specs are provided, the device requires us to add an empty wild-carded IB rule. Furthermore, the device requires the QPN to be put in the rule. Add here specific parsing support for IB empty rules and the ability to self-generate missing specs based on existing ones. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14IB/mlx4: Enable device-managed steering support for IB ports tooMatan Barak2-12/+20
Up until now, flow steering wasn't supported when using IB ports. This patch enables support for flow steering if all hardware ports support that, for example the new MLX4_DEV_CAP_FLAG2_DMFS_IPOIB mlx4 device capability. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-18Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infinibandLinus Torvalds2-7/+10
Pull infiniband/rdma updates from Roland Dreier: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) IB/core: Re-enable create_flow/destroy_flow uverbs IB/core: extended command: an improved infrastructure for uverbs commands IB/core: Remove ib_uverbs_flow_spec structure from userspace IB/core: Use a common header for uverbs flow_specs IB/core: Make uverbs flow structure use names like verbs ones IB/core: Rename 'flow' structs to match other uverbs structs IB/core: clarify overflow/underflow checks on ib_create/destroy_flow IB/ucma: Convert use of typedef ctl_table to struct ctl_table IB/cm: Convert to using idr_alloc_cyclic() IB/mlx5: Fix page shift in create CQ for userspace IB/mlx4: Fix device max capabilities check IB/mlx5: Fix list_del of empty list IB/mlx5: Remove dead code IB/core: Encorce MR access rights rules on kernel consumers IB/mlx4: Fix endless loop in resize CQ RDMA/cma: Remove unused argument and minor dead code RDMA/ucma: Discard events for IDs not yet claimed by user space IB/core: Add Cisco usNIC rdma node and transport types RDMA/nes: Remove self-assignment from nes_query_qp() IB/srp: Report receive errors correctly ...
2013-11-17Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-nextRoland Dreier2-7/+10
2013-11-17IB/core: Re-enable create_flow/destroy_flow uverbsMatan Barak1-2/+0
This commit reverts commit 7afbddfae993 ("IB/core: Temporarily disable create_flow/destroy_flow uverbs"). Since the uverbs extensions functionality was experimental for v3.12, this patch re-enables the support for them and flow-steering for v3.13. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-17IB/core: extended command: an improved infrastructure for uverbs commandsYann Droneaud1-3/+3
Commit 400dbc96583f ("IB/core: Infrastructure for extensible uverbs commands") added an infrastructure for extensible uverbs commands while later commit 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs") exported ib_create_flow()/ib_destroy_flow() functions using this new infrastructure. According to the commit 400dbc96583f, the purpose of this infrastructure is to support passing around provider (eg. hardware) specific buffers when userspace issue commands to the kernel, so that it would be possible to extend uverbs (eg. core) buffers independently from the provider buffers. But the new kernel command function prototypes were not modified to take advantage of this extension. This issue was exposed by Roland Dreier in a previous review[1]. So the following patch is an attempt to a revised extensible command infrastructure. This improved extensible command infrastructure distinguish between core (eg. legacy)'s command/response buffers from provider (eg. hardware)'s command/response buffers: each extended command implementing function is given a struct ib_udata to hold core (eg. uverbs) input and output buffers, and another struct ib_udata to hold the hw (eg. provider) input and output buffers. Having those buffers identified separately make it easier to increase one buffer to support extension without having to add some code to guess the exact size of each command/response parts: This should make the extended functions more reliable. Additionally, instead of relying on command identifier being greater than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on unused bits in command field: on the 32 bits provided by command field, only 6 bits are really needed to encode the identifier of commands currently supported by the kernel. (Even using only 6 bits leaves room for about 23 new commands). So this patch makes use of some high order bits in command field to store flags, leaving enough room for more command identifiers than one will ever need (eg. 256). The new flags are used to specify if the command should be processed as an extended one or a legacy one. While designing the new command format, care was taken to make usage of flags itself extensible. Using high order bits of the commands field ensure that newer libibverbs on older kernel will properly fail when trying to call extended commands. On the other hand, older libibverbs on newer kernel will never be able to issue calls to extended commands. The extended command header includes the optional response pointer so that output buffer length and output buffer pointer are located together in the command, allowing proper parameters checking. This should make implementing functions easier and safer. Additionally the extended header ensure 64bits alignment, while making all sizes multiple of 8 bytes, extending the maximum buffer size: legacy extended Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes) For the purpose of doing proper buffer size accounting, the headers size are no more taken in account in "in_words". One of the odds of the current extensible infrastructure, reading twice the "legacy" command header, is fixed by removing the "legacy" command header from the extended command header: they are processed as two different parts of the command: memory is read once and information are not duplicated: it's making clear that's an extended command scheme and not a different command scheme. The proposed scheme will format input (command) and output (response) buffers this way: - command: legacy header + extended header + command data (core + hw): +----------------------------------------+ | flags | 00 00 | command | | in_words | out_words | +----------------------------------------+ | response | | response | | provider_in_words | provider_out_words | | padding | +----------------------------------------+ | | . <uverbs input> . . (in_words * 8) . | | +----------------------------------------+ | | . <provider input> . . (provider_in_words * 8) . | | +----------------------------------------+ - response, if present: +----------------------------------------+ | | . <uverbs output space> . . (out_words * 8) . | | +----------------------------------------+ | | . <provider output space> . . (provider_out_words * 8) . | | +----------------------------------------+ The overall design is to ensure that the extensible infrastructure is itself extensible while begin more reliable with more input and bound checking. Note: The unused field in the extended header would be perfect candidate to hold the command "comp_mask" (eg. bit field used to handle compatibility). This was suggested by Roland Dreier in a previous review[2]. But "comp_mask" field is likely to be present in the uverb input and/or provider input, likewise for the response, as noted by Matan Barak[3], so it doesn't make sense to put "comp_mask" in the header. [1]: http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com [2]: http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com [3]: http://marc.info/?i=525C1149.6000701@mellanox.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com [ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15IB/mlx4: Fix device max capabilities checkEli Cohen1-1/+6
Move the check on max supported CQEs after the final number of entries is evaluated. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-15IB/mlx4: Fix endless loop in resize CQEli Cohen1-1/+1
When calling get_sw_cqe() we need pass the consumer_index and not the masked value. Failure to do so will cause incorrect result of get_sw_cqe() possibly leading to endless loop. This problem was reported and analyzed by Michael Rice from HP. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-07net/mlx4_core: Initialize all mailbox buffers to zero before useJack Morgenstein1-6/+0
To guarantee that all unused fields in all FW commands for both inboxes and outboxes are zeroed out, initialize the mailbox buffer to all zeroes. This is especially important for SRIOV comm-channel virtual commands (such as QUERY_FUNC_CAP), where if new fields are added to support new features, the driver can depend on older kernels passing zeroes in these fields. In addition to zeroing out the mailbox buffer at allocation time, all (now unnecessary) calls to memset by the callers of mlx4_alloc_cmd_mailbox() are removed. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04mlx4: Structures and init/teardown for VF resource quotasJack Morgenstein1-4/+4
This is step #1 for implementing SRIOV resource quotas for VFs. Quotas are implemented per resource type for VFs and the PF, to prevent any entity from simply grabbing all the resources for itself and leaving the other entities unable to obtain such resources. Resources which are allocated using quotas: QPs, CQs, SRQs, MPTs, MTTs, MAC, VLAN, and Counters. The quota system works as follows: Each entity (VF or PF) is given a max number of a given resource (its quota), and a guaranteed minimum number for each resource (starvation prevention). For QPs, CQs, SRQs, MPTs and MTTs: 50% of the available quantity for the resource is divided equally among the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool. The other 50% is the "free pool", allocated on a first-come-first-serve basis. For each VF/PF, resources are first allocated from its "guaranteed-minimum" pool. When that pool is exhausted, the driver attempts to allocate from the resource "free-pool". The quota (i.e., max) for the VFs and the PF is: The free-pool amount (50% of the real max) + the guaranteed minimum For MACs: Guarantee 2 MACs per VF/PF per port. As a result, since we have only 128 MACs per port, reduce the allowable number of VFs from 64 to 63. Any remaining MACs are put into a free pool. For VLANs: For the PF, the per-port quota is 128 and guarantee is 64 (to allow the PF to register at least a VLAN per VF in VST mode). For the VFs, the per-port quota is 64 and the guarantee is 0. We assume that VGT VFs are trusted not to abuse the VLAN resource. For Counters: For all functions (PF and VFs), the quota is 128 and the guarantee is 0. In this patch, we define the needed structures, which are added to the resource-tracker struct. In addition, we do initialization for the resource quota, and adjust the query_device response to use quotas rather than resource maxima. As part of the implementation, we introduce a new field in mlx4_dev: quotas. This field holds the resource quotas used to report maxima to the upper layers (ib_core, via query_device). The HCA maxima of these values are passed to the VFs (via QUERY_HCA) so that they may continue to use these in handling QPs, CQs, SRQs and MPTs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21IB/core: Temporarily disable create_flow/destroy_flow uverbsYann Droneaud1-0/+2
The create_flow/destroy_flow uverbs and the associated extensions to the user-kernel verbs ABI are under review and are too experimental to freeze at this point. So userspace is not exposed to experimental features and an uinstable ABI, temporarily disable this for v3.12 (with a Kconfig option behind staging to reenable it if desired). The feature will be enabled after proper cleanup for v3.13. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com [ Add a Kconfig option to reenable these verbs. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-28IB/mlx4: Add receive flow steering supportHadar Hen Zion2-0/+247
Implement ib_create_flow() and ib_destroy_flow(). Translate the verbs structures provided by the user to HW structures and call the MLX4_QP_FLOW_STEERING_ATTACH/DETACH firmware commands. On the ATTACH command completion, the firmware provides a 64-bit registration ID, which is placed into struct mlx4_ib_flow that wraps the instance of struct ib_flow which is retuned to caller. Later, this reg ID is used for detaching that flow from the firmware. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31IB/mlx4: Use default pkey when creating tunnel QPsJack Morgenstein1-2/+8
When creating tunnel QPs for special QP tunneling, look for the default pkey in the slave's virtual pkey table. If it is present, use the real pkey index where the default pkey is located. If the default pkey is not found in the pkey table, use the real pkey index which is stored at index 0 in the slave's virtual pkey table (this is the current behavior). This change is required to support cloud computing, where the paravirtualized index of the default pkey is moved to index 1 or higher. The pkey at paravirtualized index 0 is used for the default IPoIB interface created by the VF. Its possible for the pkey value at paravirtualized index 0 to be invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey index 127, which contains pkey = 0). At some point after the VF probe, the cloud computing interface at the hypervisor maps virtual index 0 for the VF to the pkey index containing the pkey that IPoIB will use in its operation. However, when the tunnel QP is created, the pkey at the slave's virtual index 0 is still mapped to the invalid pkey index, so tunnel QP creation fails. This commit causes the hypervisor to search for the default pkey in the slave's pkey table -- and this pkey is present in the table (at index > 0) at tunnel QP creation time, so that the tunnel QP creation will succeed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-28net: pass info struct via netdevice notifierJiri Pirko1-1/+1
So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-08Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infinibandLinus Torvalds2-0/+27
Pull InfiniBand/RDMA changes from Roland Dreier: - XRC transport fixes - Fix DHCP on IPoIB - mlx4 preparations for flow steering - iSER fixes - miscellaneous other fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits) IB/iser: Add support for iser CM REQ additional info IB/iser: Return error to upper layers on EAGAIN registration failures IB/iser: Move informational messages from error to info level IB/iser: Add module version mlx4_core: Expose a few helpers to fill DMFS HW strucutures mlx4_core: Directly expose fields of DMFS HW rule control segment mlx4_core: Change a few DMFS fields names to match firmare spec mlx4: Match DMFS promiscuous field names to firmware spec mlx4_core: Move DMFS HW structs to common header file IB/mlx4: Set link type for RAW PACKET QPs in the QP context IB/mlx4: Disable VLAN stripping for RAW PACKET QPs mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level RDMA/iwcm: Don't touch cmid after dropping reference IB/qib: Correct qib_verbs_register_sysfs() error handling IB/ipath: Correct ipath_verbs_register_sysfs() error handling RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled SRPT: Fix odd use of WARN_ON() IPoIB: Fix ipoib_hard_header() return value RDMA: Rename random32() to prandom_u32() RDMA/cxgb3: Fix uninitialized variable ...
2013-05-08Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-nextRoland Dreier3-1/+28
2013-05-07drivers/infiniband/hw: rename random32() to prandom_u32()Andrew Morton1-1/+1
Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-1/+1
Pull networking updates from David Miller: "Highlights (1721 non-merge commits, this has to be a record of some sort): 1) Add 'random' mode to team driver, from Jiri Pirko and Eric Dumazet. 2) Make it so that any driver that supports configuration of multiple MAC addresses can provide the forwarding database add and del calls by providing a default implementation and hooking that up if the driver doesn't have an explicit set of handlers. From Vlad Yasevich. 3) Support GSO segmentation over tunnels and other encapsulating devices such as VXLAN, from Pravin B Shelar. 4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton. 5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita Dukkipati. 6) In the PHY layer, allow supporting wake-on-lan in situations where the PHY registers have to be written for it to be configured. Use it to support wake-on-lan in mv643xx_eth. From Michael Stapelberg. 7) Significantly improve firewire IPV6 support, from YOSHIFUJI Hideaki. 8) Allow multiple packets to be sent in a single transmission using network coding in batman-adv, from Martin Hundebøll. 9) Add support for T5 cxgb4 chips, from Santosh Rastapur. 10) Generalize the VXLAN forwarding tables so that there is more flexibility in configurating various aspects of the endpoints. From David Stevens. 11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver, from Dmitry Kravkov. 12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo Neira Ayuso. 13) Start adding networking selftests. 14) In situations of overload on the same AF_PACKET fanout socket, or per-cpu packet receive queue, minimize drop by distributing the load to other cpus/fanouts. From Willem de Bruijn and Eric Dumazet. 15) Add support for new payload offset BPF instruction, from Daniel Borkmann. 16) Convert several drivers over to mdoule_platform_driver(), from Sachin Kamat. 17) Provide a minimal BPF JIT image disassembler userspace tool, from Daniel Borkmann. 18) Rewrite F-RTO implementation in TCP to match the final specification of it in RFC4138 and RFC5682. From Yuchung Cheng. 19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear you like netlink, so I implemented netlink dumping of netlink sockets.") From Andrey Vagin. 20) Remove ugly passing of rtnetlink attributes into rtnl_doit functions, from Thomas Graf. 21) Allow userspace to be able to see if a configuration change occurs in the middle of an address or device list dump, from Nicolas Dichtel. 22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes Frederic Sowa. 23) Increase accuracy of packet length used by packet scheduler, from Jason Wang. 24) Beginning set of changes to make ipv4/ipv6 fragment handling more scalable and less susceptible to overload and locking contention, from Jesper Dangaard Brouer. 25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*() instead. From Hong Zhiguo. 26) Optimize route usage in IPVS by avoiding reference counting where possible, from Julian Anastasov. 27) Convert IPVS schedulers to RCU, also from Julian Anastasov. 28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger Eitzenberger. 29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG, nfnetlink_log, and nfnetlink_queue. From Gao feng. 30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa. 31) Support several new r8169 chips, from Hayes Wang. 32) Support tokenized interface identifiers in ipv6, from Daniel Borkmann. 33) Use usbnet_link_change() helper in USB net driver, from Ming Lei. 34) Add 802.1ad vlan offload support, from Patrick McHardy. 35) Support mmap() based netlink communication, also from Patrick McHardy. 36) Support HW timestamping in mlx4 driver, from Amir Vadai. 37) Rationalize AF_PACKET packet timestamping when transmitting, from Willem de Bruijn and Daniel Borkmann. 38) Bring parity to what's provided by /proc/net/packet socket dumping and the info provided by netlink socket dumping of AF_PACKET sockets. From Nicolas Dichtel. 39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin Poirier" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) filter: fix va_list build error af_unix: fix a fatal race with bit fields bnx2x: Prevent memory leak when cnic is absent bnx2x: correct reading of speed capabilities net: sctp: attribute printl with __printf for gcc fmt checks netlink: kconfig: move mmap i/o into netlink kconfig netpoll: convert mutex into a semaphore netlink: Fix skb ref counting. net_sched: act_ipt forward compat with xtables mlx4_en: fix a build error on 32bit arches Revert "bnx2x: allow nvram test to run when device is down" bridge: avoid OOPS if root port not found drivers: net: cpsw: fix kernel warn on cpsw irq enable sh_eth: use random MAC address if no valid one supplied 3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA) tg3: fix to append hardware time stamping flags unix/stream: fix peeking with an offset larger than data in queue unix/dgram: fix peeking with an offset larger than data in queue unix/dgram: peek beyond 0-sized skbs openvswitch: Remove unneeded ovs_netdev_get_ifindex() ...
2013-04-29drivers/infiniband/hw/mlx4: convert to using idr_alloc_cyclic()Jeff Layton1-3/+1
Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-24IB/mlx4: Set link type for RAW PACKET QPs in the QP contextEli Cohen1-0/+4
When the link type is Ethernet, setting the link type in the QP context will enable TCP/IP stateless offloads (checksum, LSO, RSS) for RAW PACKET Ethernet QPs. For IB UD QPs this worked OK since the value assumed by the firmware for IB link layer is zero. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-24IB/mlx4: Disable VLAN stripping for RAW PACKET QPsDotan Barak1-0/+2
Fix the asymmetric behavior w.r.t VLAN insertion/stripping for RAW PACKET QPs -- we don't insert on send and need not strip on receive. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-24net/mlx4_en: Add HW timestamping (TS) supportAmir Vadai1-1/+1
The patch allows to enable/disable HW timestamping for incoming and/or outgoing packets. It adds and initializes all structs and callbacks needed by kernel TS API. To enable/disable HW timestamping appropriate ioctl should be used. Currently HWTSTAMP_FILTER_ALL/NONE and HWTSAMP_TX_ON/OFF only are supported. When enabling TS on receive flow - VLAN stripping will be disabled. Also were made all relevant changes in RX/TX flows to consider TS request and plant HW timestamps into relevant structures. mlx4_ib was fixed to compile with new mlx4_cq_alloc() signature. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-16RDMA: Rename random32() to prandom_u32()Akinobu Mita1-1/+1
Use more preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Steve Wise <swise@chelsio.com> Cc: linux-rdma@vger.kernel.org Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16IB/mlx4: Fetch XRC SRQ in the CQ polling codeShlomo Pongratz1-0/+21
An XRC target QP may redirect to more than one XRC SRQ. This means that for work completions associated with a XRC TGT QP, the srq field in the QP has no usage and the real XRC SRQ need to be retrived using the information from the XRCETH placed into the CQE, do that. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>