aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core/verbs.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-12-18IB/{core/cm}: Fix generating a return AH for RoCEEParav Pandit1-78/+58
When computing a UD reverse path (return AH) from a WC the code was not doing a route lookup anchored in a specific netdevice. This caused several bugs, including broken IPv6 link-local address support in RoCEv2. [1] This fixes the lookup by determining the GID table entry that the HW matched to the SGID for the WC and then using the netdevice from that entry to perform the route and ND lookup for the 'DGID' to build a return AH. RoCE GID table management ensures that right upper netdevices of the physical netdevices are added. Therefore init_ah_from_wc doesn't need to perform such check. Now that route lookup is done based on the netdevice of the GID entry, simplify code to not have ifindex and vlan pointers. As part of that, refactor to have netdevice as input parameter. This is already discussed at [2]. Finally ib_init_ah_from_wc resolves dmac for unicast GID in similar way as what ib_resolve_eth_dmac() does. So ib_resolve_eth_dmac is refactored to split for unicast and non unicast GIDs, so that it can be reused by ib_init_ah_from_wc. While we are at refactoring ib_resolve_eth_dmac(), it is further simplified (a) to avoid hoplimit as optional parameter, as there is only one user who always queries hoplimit. (b) for empty line. (c) avoided zero initialization of ret. (d) removed as exported symbol as only ib core uses it. For IPv6, this is tested using simple rping test as below. rping -sv -a ::0 rping -c -a fe80::268a:7ff:fe55:4661%ens2f1 -C 1 -v -d [1] https://www.spinics.net/lists/linux-rdma/msg45690.html [2] https://www.spinics.net/lists/linux-rdma/msg45710.html Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reported-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-11-13RDMA/core: Rename kernel modify_cq to better describe its usageLeon Romanovsky1-2/+2
Current ib_modify_cq() is used to set CQ moderation parameters. This patch renames ib_modify_cq() to be rdma_set_cq_moderation(), because the kernel version of RDMA API doesn't need to follow already exposed to user's API pattern (create_XXX/modify_XXX/query_XXX/destroy_XXX) and better to have more accurate name which describes the actual usage. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18IB: Let ib_core resolve destination mac addressParav Pandit1-3/+5
Since IB/core resolves the destination mac address for user and kernel consumers, avoid resolving in multiple provider drivers. Only ib_core resolves DMAC now, therefore resolve_eth_dmac is removed as exported symbol. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18IB/core: Introduce and use rdma_create_user_ahParav Pandit1-2/+38
Introduce rdma_create_user_ah API which allows passing udata to provider driver and additionally which resolves DMAC for RoCE. ib_resolve_eth_dmac() resolves destination mac address for unicast, multicast, link local ipv4 mapped ipv6 and ipv6 destination gid entry. This allows all RoCE provider drivers to avoid duplicating such code. Such change brings consistency where IB core always resolves dmac and pass it to RoCE provider drivers for user and kernel consumers, with this ah_attr->roce.dmac is always an input field for provider drivers. This uniformity avoids exporting ib_resolve_eth_dmac symbol to providers or other modules. Therefore its removed as exported symbol at later in the patch series. Now uverbs and umad both makes use of rdma_create_user_ah API which fixes the issue where umad has invalid DMAC for address. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-22IB/core: Fix for core panicAlex Estrin1-2/+2
Build with the latest patches resulted in panic: 11384.486289] BUG: unable to handle kernel NULL pointer dereference at (null) [11384.486293] IP: (null) [11384.486295] PGD 0 [11384.486295] P4D 0 [11384.486296] [11384.486299] Oops: 0010 [#1] SMP ......... snip ...... [11384.486401] CPU: 0 PID: 968 Comm: kworker/0:1H Tainted: G W O 4.13.0-a-stream-20170825 #1 [11384.486402] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015 [11384.486418] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [11384.486419] task: ffff880850579680 task.stack: ffffc90007fec000 [11384.486420] RIP: 0010: (null) [11384.486420] RSP: 0018:ffffc90007fef970 EFLAGS: 00010206 [11384.486421] RAX: ffff88084cfe8000 RBX: ffff88084dce4000 RCX: ffffc90007fef978 [11384.486422] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88084cfe8000 [11384.486422] RBP: ffffc90007fefab0 R08: 0000000000000000 R09: ffff88084dce4080 [11384.486423] R10: ffffffffa02d7f60 R11: 0000000000000000 R12: ffff88105af65a00 [11384.486423] R13: ffff88084dce4000 R14: 000000000000c000 R15: 000000000000c000 [11384.486424] FS: 0000000000000000(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000 [11384.486425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11384.486425] CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000001406f0 [11384.486426] Call Trace: [11384.486431] ? is_valid_mcast_lid.isra.21+0xfb/0x110 [ib_core] [11384.486436] ib_attach_mcast+0x6f/0xa0 [ib_core] [11384.486441] ipoib_mcast_attach+0x81/0x190 [ib_ipoib] [11384.486443] ipoib_mcast_join_complete+0x354/0xb40 [ib_ipoib] [11384.486448] mcast_work_handler+0x330/0x6c0 [ib_core] [11384.486452] join_handler+0x101/0x220 [ib_core] [11384.486455] ib_sa_mcmember_rec_callback+0x54/0x80 [ib_core] [11384.486459] recv_handler+0x3a/0x60 [ib_core] [11384.486462] ib_mad_recv_done+0x423/0x9b0 [ib_core] [11384.486466] __ib_process_cq+0x5d/0xb0 [ib_core] [11384.486469] ib_cq_poll_work+0x20/0x60 [ib_core] [11384.486472] process_one_work+0x149/0x360 [11384.486474] worker_thread+0x4d/0x3c0 [11384.486487] kthread+0x109/0x140 [11384.486488] ? rescuer_thread+0x380/0x380 [11384.486489] ? kthread_park+0x60/0x60 [11384.486490] ? kthread_park+0x60/0x60 [11384.486493] ret_from_fork+0x25/0x30 [11384.486493] Code: Bad RIP value. [11384.486493] Code: Bad RIP value. [11384.486496] RIP: (null) RSP: ffffc90007fef970 [11384.486497] CR2: 0000000000000000 [11384.486531] ---[ end trace b1acec6fb4ff6e75 ]--- [11384.532133] Kernel panic - not syncing: Fatal exception [11384.536541] Kernel Offset: disabled [11384.969491] ---[ end Kernel panic - not syncing: Fatal exception [11384.976875] sched: Unexpected reschedule of offline CPU#1! [11384.983646] ------------[ cut here ]------------ Rdma device driver may not have implemented (*get_link_layer)() so it can not be called directly. Should use appropriate helper function. Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Fixes: 523633359224 ("IB/core: Fix the validations of a multicast LID in attach or detach operations") Cc: stable@kernel.org # 4.13 Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()Roland Dreier1-0/+2
For RoCE, ib_init_ah_from_wc() can follow the path ib_init_ah_from_wc() -> rdma_addr_find_l2_eth_by_grh() -> rdma_resolve_ip() and rdma_resolve_ip() will sleep in kzalloc() and wait_for_completion(). However, developers will not see any warnings if they use ib_init_ah_from_wc() in an atomic context and test only on IB, because the function doesn't sleep in that case. Add a might_sleep() so that lockdep will catch bugs no matter what hardware is used to test. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/core: Separate CQ handle in SRQ contextArtemy Kovalyov1-7/+9
Before this change CQ attached to SRQ was part of XRC specific extension. Moving CQ handle out makes it available to other types extending SRQ functionality. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/core: Refactor get link layer wrapperLeon Romanovsky1-9/+5
The return values from rdma_node_get_transport() are strict and IB_LINK_LAYER_UNSPECIFIED is unreachable in this flow. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/core: Delete BUG() from unreachable flowLeon Romanovsky1-13/+7
Remove call to BUG() in case wrong node_type was provided. This flow is unreachable, because node_types are supplied from specific enum. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24Merge branch 'k.o/for-4.13-rc' into k.o/for-nextDoug Ledford1-1/+6
Pick up -rc fixes. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/core: Avoid accessing non-allocated memory when inferring port typeNoa Osherovich1-1/+6
Commit 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") introduced the concept of type in ah_attr: * During ib_register_device, each port is checked for its type which is stored in ib_device's port_immutable array. * During uverbs' modify_qp, the type is inferred using the port number in ib_uverbs_qp_dest struct (address vector) by accessing the relevant port_immutable array and the type is passed on to providers. IB spec (version 1.3) enforces a valid port value only in Reset to Init. During Init to RTR, the address vector must be valid but port number is not mentioned as a field in the address vector, so its value is not validated, which leads to accesses to a non-allocated memory when inferring the port type. Save the real port number in ib_qp during modify to Init (when the comp_mask indicates that the port number is valid) and use this value to infer the port type. Avoid copying the address vector fields if the matching bit is not set in the attr_mask. Address vector can't be modified before the port, so no valid flow is affected. Fixes: 44c58487d51a ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18Merge branch 'k.o/for-4.13-rc' into k.o/for-nextDoug Ledford1-1/+0
Merging our (hopefully) final -rc pull branch into our for-next branch because some of our pending patches won't apply cleanly without having the -rc patches in our tree. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-27Merge branch 'misc' into k.o/for-nextDoug Ledford1-0/+55
2017-07-24IB/core: Add support for RoCEv2 multicastNoa Osherovich1-0/+12
When creating address handle from multicast GID, set MAC according to the appropriate formula instead of searching for it in the GID table: - For IPv4 multicast GID use ip_eth_mc_map(). - For IPv6 multicast GID use ipv6_eth_mc_map(). Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24IB/core: Set RoCEv2 MGID according to specNoa Osherovich1-4/+6
RoCEv2 Annex states that for RoCEv2 over IPv4, the corresponding IPv4 address is encoded into the GID according to the following rule: GID= :ffff:<IPv4 address> Remove the 0xff0e prefix for RoCEv2 packets with IPv4 and leave it zeroed and change rdma_is_multicast_addr() to consider the new logic. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24IB/core: Fix the validations of a multicast LID in attach or detach operationsNoa Osherovich1-4/+40
RoCE Annex (A16.9.10/11) declares that during attach (detach) QP to a multicast group, if the QP is associated with a RoCE port, the multicast group MLID is unused and is ignored. During attach or detach multicast, when the QP is associated with a port, it is enough to check the port's link layer and validate the LID only if it is Infiniband. Otherwise, avoid validating the multicast LID. Fixes: 8561eae60ff9 ("IB/core: For multicast functions, verify that LIDs are multicast LIDs") Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24IB/core: Add generic function to extract IB speed from netdevYuval Shaia1-0/+55
Logic of retrieving netdev speed from net_device and translating it to IB speed is implemented in rxe, in usnic and in bnxt drivers. Define new function which merges all. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Christian Benvenuti <benve@cisco.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-23Revert "IB/core: Allow QP state transition from reset to error"Leon Romanovsky1-1/+0
The commit ebc9ca43e1d5 ("IB/core: Allow QP state transition from reset to error") allowed transition from Reset to Error state for the QPs. This behavior doesn't follow the IBTA specification 1.3, which in 10.3.1 QUEUE PAIR AND EE CONTEXT STATES section. The quote from the spec: "An error can be forced from any state, except Reset, with the Modify QP/EE Verb." Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-07-17IB/core: Allow QP state transition from reset to errorTadeusz Struk1-0/+1
Playing with IP-O-IB interface can trigger a warning message: "ib0: Failed to modify QP to ERROR state" to be logged. This happens when the QP is in IB_QPS_RESET state and the stack is trying to transition it to IB_QPS_ERR state in ipoib_ib_dev_stop(). According to the IB spec, Table 91 - "QP State Transition Properties" it looks like the transition from reset to error is valid: Transition: Any State to Error Required Attributes: None Optional Attributes: None allowed Actions: Queue processing is stopped. Work Requests pending or in process are completed in error, when possible. This patch allows the transition and quiets the message. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17IB/core: Introduce modify QP operation with udataParav Pandit1-8/+24
This patch adds new function ib_modify_qp_with_udata so that uverbs layer can avoid handling L2 mac address at verbs layer and depend on the core layer to resolve the mac address consistently for all required QPs. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17IB/core: Don't resolve IP address to the loopback deviceMoni Shoua1-5/+0
When resolving an IP address that is on the host of the caller the result from querying the routing table is the loopback device. This is not a valid response, because it doesn't represent the RDMA device and the port. Therefore, callers need to check the resolved device and if it is a loopback device find an alternative way to resolve it. To avoid this we make sure that the response from rdma_resolve_ip() will not be the loopback device. While that, we fix an static checker warning about dereferencing an unintitialized pointer using the same solution as in commit abeffce90c7f ("net/mlx5e: Fix a -Wmaybe-uninitialized warning") as a reference. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-17RDMA/core: Document confusing codeGustavo A. R. Silva1-0/+13
While looking into Coverity ID 1351047 I ran into the following piece of code at drivers/infiniband/core/verbs.c:496: ret = rdma_addr_find_l2_eth_by_grh(&dgid, &sgid,                                    ah_attr->dmac,                                    wc->wc_flags & IB_WC_WITH_VLAN ?                                    NULL : &vlan_id,                                    &if_index, &hoplimit); The issue here is that the position of arguments in the call to rdma_addr_find_l2_eth_by_grh() function do not match the order of the parameters: &dgid is passed to sgid &sgid is passed to dgid This is the function prototype: int rdma_addr_find_l2_eth_by_grh(const union ib_gid *sgid,  const union ib_gid *dgid,  u8 *dmac, u16 *vlan_id, int *if_index,  int *hoplimit) My question here is if this is intentional? Answer: Yes. ib_init_ah_from_wc() creates ah from the incoming packet. Incoming packet has dgid of the receiver node on which this code is getting executed and sgid contains the GID of the sender. When resolving mac address of destination, you use arrived dgid as sgid and use sgid as dgid because sgid contains destinations GID whom to respond to. Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-23IB/core: Enforce PKey security on QPsDaniel Jurgens1-1/+26
Add new LSM hooks to allocate and free security contexts and check for permission to access a PKey. Allocate and free a security context when creating and destroying a QP. This context is used for controlling access to PKeys. When a request is made to modify a QP that changes the port, PKey index, or alternate path, check that the QP has permission for the PKey in the PKey table index on the subnet prefix of the port. If the QP is shared make sure all handles to the QP also have access. Store which port and PKey index a QP is using. After the reset to init transition the user can modify the port, PKey index and alternate path independently. So port and PKey settings changes can be a merge of the previous settings and the new ones. In order to maintain access control if there are PKey table or subnet prefix change keep a list of all QPs are using each PKey index on each port. If a change occurs all QPs using that device and port must have access enforced for the new cache settings. These changes add a transaction to the QP modify process. Association with the old port and PKey index must be maintained if the modify fails, and must be removed if it succeeds. Association with the new port and PKey index must be established prior to the modify and removed if the modify fails. 1. When a QP is modified to a particular Port, PKey index or alternate path insert that QP into the appropriate lists. 2. Check permission to access the new settings. 3. If step 2 grants access attempt to modify the QP. 4a. If steps 2 and 3 succeed remove any prior associations. 4b. If ether fails remove the new setting associations. If a PKey table or subnet prefix changes walk the list of QPs and check that they have permission. If not send the QP to the error state and raise a fatal error event. If it's a shared QP make sure all the QPs that share the real_qp have permission as well. If the QP that owns a security structure is denied access the security structure is marked as such and the QP is added to an error_list. Once the moving the QP to error is complete the security structure mark is cleared. Maintaining the lists correctly turns QP destroy into a transaction. The hardware driver for the device frees the ib_qp structure, so while the destroy is in progress the ib_qp pointer in the ib_qp_security struct is undefined. When the destroy process begins the ib_qp_security structure is marked as destroying. This prevents any action from being taken on the QP pointer. After the QP is destroyed successfully it could still listed on an error_list wait for it to be processed by that flow before cleaning up the structure. If the destroy fails the QPs port and PKey settings are reinserted into the appropriate lists, the destroying flag is cleared, and access control is enforced, in case there were any cache changes during the destroy flow. To keep the security changes isolated a new file is used to hold security related functionality. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> [PM: merge fixup in ib_verbs.h and uverbs_cmd.c] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-01IB/core: Define 'ib' and 'roce' rdma_ah_attr typesDasaratharaman Chandramouli1-4/+9
rdma_ah_attr can now be either ib or roce allowing core components to use one type or the other and also to define attributes unique to a specific type. struct ib_ah is also initialized with the type when its first created. This ensures that calls such as modify_ah dont modify the type of the address handle attribute. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Use rdma_ah_attr accessor functionsDasaratharaman Chandramouli1-22/+23
Modify core and driver components to use accessor functions introduced to access individual fields of rdma_ah_attr Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_destroy_ah to rdma_destroy_ahDasaratharaman Chandramouli1-2/+2
Rename ib_destroy_ah to rdma_destroy_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_query_ah to rdma_query_ahDasaratharaman Chandramouli1-2/+2
Rename ib_query_ah to rdma_query_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_modify_ah to rdma_modify_ahDasaratharaman Chandramouli1-2/+2
Rename ib_modify_ah to rdma_modify_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_create_ah to rdma_create_ahDasaratharaman Chandramouli1-3/+3
Rename ib_create_ah to rdma_create_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename struct ib_ah_attr to rdma_ah_attrDasaratharaman Chandramouli1-6/+6
This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/core: For multicast functions, verify that LIDs are multicast LIDsMichael J. Ruhl1-2/+6
The Infiniband spec defines "A multicast address is defined by a MGID and a MLID" (section 10.5). Currently the MLID value is not validated. Add check to verify that the MLID value is in the correct address range. Fixes: 0c33aeedb2cf ("[IB] Add checks to multicast attach and detach") Cc: stable@vger.kernel.org Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-02-19IB/core: Add support for draining IB_POLL_DIRECT completion queuesBart Van Assche1-20/+15
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-27IB/core: Add inline function to validate portYuval Shaia1-2/+1
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/core: Support rate limit for packet pacingBodong Wang1-0/+2
Add new member rate_limit to ib_qp_attr which holds the packet pacing rate in kbps, 0 means unlimited. IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW QPs when changing QP state from RTR to RTS, RTS to RTS. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/core: Let create_ah return extended response to userMoni Shoua1-1/+1
Add struct ib_udata to the signature of create_ah callback that is implemented by IB device drivers. This allows HW drivers to return extra data to the userspace library. This patch prepares the ground for mlx5 driver to resolve destination mac address for a given GID and return it to userspace. This patch was previously submitted by Knut Omang as a part of the patch set to support Oracle's Infiniband HCA (SIF). Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/core: Change ib_resolve_eth_dmac to use it in create AHMoni Shoua1-42/+42
The function ib_resolve_eth_dmac() requires struct qp_attr * and qp_attr_mask as parameters while the function might be useful to resolve dmac for address handles. This patch changes the signature of the function so it can be used in the flow of creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/mlx4: Handle IPv4 header when demultiplexing MADMoni Shoua1-7/+9
When MAD arrives to the hypervisor, we need to identify which slave it should be sent by destination GID. When L3 protocol is IPv4 the GRH is replaced by an IPv4 header. This patch detects when IPv4 header needs to be parsed instead of GRH. Fixes: b6ffaeffaea4 ('mlx4: In RoCE allow guests to have multiple GIDS') Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/core: Save QP in ib_flow structureMark Bloch1-1/+3
When we create flow steering rule, we need to save the related QP in the ib_flow struct. this QP is used in destroy flow. Move the QP assignment from ib_uverbs_ex_create_flow into ib_create_flow, this would allow both kernel and userspace consumers to use it. This bug wasn't seen in the wild because there are no kernel consumers currently in the kernel. Fixes: 319a441d1361 ("IB/core: Add receive flow steering support") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/core: correctly handle rdma_rw_init_mrs() failureSteve Wise1-1/+1
Function ib_create_qp() was failing to return an error when rdma_rw_init_mrs() fails, causing a crash further down in ib_create_qp() when trying to dereferece the qp pointer which was actually a negative errno. The crash: crash> log|grep BUG [ 136.458121] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 crash> bt PID: 3736 TASK: ffff8808543215c0 CPU: 2 COMMAND: "kworker/u64:2" #0 [ffff88084d323340] machine_kexec at ffffffff8105fbb0 #1 [ffff88084d3233b0] __crash_kexec at ffffffff81116758 #2 [ffff88084d323480] crash_kexec at ffffffff8111682d #3 [ffff88084d3234b0] oops_end at ffffffff81032bd6 #4 [ffff88084d3234e0] no_context at ffffffff8106e431 #5 [ffff88084d323530] __bad_area_nosemaphore at ffffffff8106e610 #6 [ffff88084d323590] bad_area_nosemaphore at ffffffff8106e6f4 #7 [ffff88084d3235a0] __do_page_fault at ffffffff8106ebdc #8 [ffff88084d323620] do_page_fault at ffffffff8106f057 #9 [ffff88084d323660] page_fault at ffffffff816e3148 [exception RIP: ib_create_qp+427] RIP: ffffffffa02554fb RSP: ffff88084d323718 RFLAGS: 00010246 RAX: 0000000000000004 RBX: fffffffffffffff4 RCX: 000000018020001f RDX: ffff880830997fc0 RSI: 0000000000000001 RDI: ffff88085f407200 RBP: ffff88084d323778 R8: 0000000000000001 R9: ffffea0020bae210 R10: ffffea0020bae218 R11: 0000000000000001 R12: ffff88084d3237c8 R13: 00000000fffffff4 R14: ffff880859fa5000 R15: ffff88082eb89800 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff88084d323780] rdma_create_qp at ffffffffa0782681 [rdma_cm] #11 [ffff88084d3237b0] nvmet_rdma_create_queue_ib at ffffffffa07c43f3 [nvmet_rdma] #12 [ffff88084d323860] nvmet_rdma_alloc_queue at ffffffffa07c5ba9 [nvmet_rdma] #13 [ffff88084d323900] nvmet_rdma_queue_connect at ffffffffa07c5c96 [nvmet_rdma] #14 [ffff88084d323980] nvmet_rdma_cm_handler at ffffffffa07c6450 [nvmet_rdma] #15 [ffff88084d3239b0] iw_conn_req_handler at ffffffffa0787480 [rdma_cm] #16 [ffff88084d323a60] cm_conn_req_handler at ffffffffa0775f06 [iw_cm] #17 [ffff88084d323ab0] process_event at ffffffffa0776019 [iw_cm] #18 [ffff88084d323af0] cm_work_handler at ffffffffa0776170 [iw_cm] #19 [ffff88084d323cb0] process_one_work at ffffffff810a1483 #20 [ffff88084d323d90] worker_thread at ffffffff810a211d #21 [ffff88084d323ec0] kthread at ffffffff810a6c5c #22 [ffff88084d323f50] ret_from_fork at ffffffff816e1ebf Fixes: 632bc3f65081 ("IB/core, RDMA RW API: Do not exceed QP SGE send limit") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: stable@vger.kernel.org Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-07IB/core: Improve ib_map_mr_sg() documentationBart Van Assche1-4/+4
Document that ib_map_mr_sg() is able to map physically discontiguous sg-lists as a single MR. Change IB_MR_TYPE_SG_GAPS_REG into IB_MR_TYPE_SG_GAPS. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: remove ib_get_dma_mrChristoph Hellwig1-26/+8
We now only use it from ib_alloc_pd to create a local DMA lkey if the device doesn't provide one, or a global rkey if the ULP requests it. This patch removes ib_get_dma_mr and open codes the functionality in ib_alloc_pd so that we can simplify the code and prevent abuse of the functionality. As a side effect we can also simplify things by removing the valid access bit check, and the PD refcounting. In the future I hope to also remove the per-PD global MR entirely by shifting this work into the HW drivers, as one step towards avoiding the struct ib_mr overload for various different use cases. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig1-5/+22
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: rename pd->local_mr to pd->__internal_mrChristoph Hellwig1-6/+6
This has two reasons: a) to clearly mark that drivers don't have any business using it, and b) because we're going to use it for the (dangerous) global rkey soon, so that drivers don't create on themselves. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-0/+9
Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-2/+161
Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-02IB/core, RDMA RW API: Do not exceed QP SGE send limitBart Van Assche1-0/+9
Compute the SGE limit for RDMA READ and WRITE requests in ib_create_qp(). Use that limit in the RDMA RW API implementation. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Parav Pandit <pandit.parav@gmail.com> Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Laurence Oberman <loberman@redhat.com> Cc: <stable@vger.kernel.org> #v4.7+ Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Extend create QP to get indirection tableYishai Hadas1-2/+17
Extend create QP to get Receive Work Queue (WQ) indirection table. QP can be created with external Receive Work Queue indirection table, in that case it is ready to receive immediately. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Introduce Receive Work Queue indirection tableYishai Hadas1-0/+62
Introduce Receive Work Queue (WQ) indirection table. This object can be used to spread incoming traffic to different receive Work Queues. A Receive WQ indirection table points to variable size of WQs. This table is given to a QP in downstream patches. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimerg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Introduce Work Queue object and its verbsYishai Hadas1-0/+82
Introduce Work Queue object and its create/destroy/modify verbs. QP can be created without internal WQs "packaged" inside it, this QP can be configured to use "external" WQ object as its receive/send queue. WQ is a necessary component for RSS technology since RSS mechanism is supposed to distribute the traffic between multiple Receive Work Queues. WQ associated (many to one) with Completion Queue and it owns WQ properties (PD, WQ size, etc.). WQ has a type, this patch introduces the IB_WQT_RQ (i.e.receive queue), it may be extend to others such as IB_WQT_SQ. (send queue). WQ from type IB_WQT_RQ contains receive work requests. PD is an attribute of a work queue (i.e. send/receive queue), it's used by the hardware for security validation before scattering to a memory region which is pointed by the WQ. For that, an external WQ object needs a PD, letting the hardware makes that validation. When accessing a memory region that is pointed by the WQ its PD is used and not the QP's PD, this behavior is similar to a SRQ and a QP. WQ context is subject to a well-defined state transitions done by the modify_wq verb. When WQ is created its initial state becomes IB_WQS_RESET. >From IB_WQS_RESET it can be modified to itself or to IB_WQS_RDY. >From IB_WQS_RDY it can be modified to itself, to IB_WQS_RESET or to IB_WQS_ERR. >From IB_WQS_ERR it can be modified to IB_WQS_RESET. Note: transition to IB_WQS_ERR might occur implicitly in case there was some HW error. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUIDEli Cohen1-6/+10
When virtualziation is supported, VFs may send SA MADs to a GID formed by the concatenation of the subnet prefix with the IB_SA_WELL_KNOWN_GUID. When a response is required, the current code will search the local HCA's port for the received GID to figure out the GID index of the entry containing this GID. However, since this is not a real GID it will not be found and error will be printed. We change the logic to check if the destination GID is this special GID and avoid lookup in this case and use GID index 0. Fixes: a0c1b2a35087 ('IB/core: Support accessing SA in virtualized environment') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>