aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/sw (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-08-22IB/rdmvat: Fix double vfree() in rvt_create_qp() error pathMike Marciniszyn1-1/+2
The unwind logic for creating a user QP has a double vfree of the non-shared receive queue when handling a "too many qps" failure. The code unwinds the mmmap info by decrementing a reference count which will call rvt_release_mmap_info() which in turn does the vfree() of the r_rq.wq. The unwind code then does the same free. Fix by guarding the vfree() with the same test that is done in close and only do the vfree() if qp->ip is NULL. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds5-68/+315
Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...
2016-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds35-1/+12884
Pull base rdma updates from Doug Ledford: "Round one of 4.8 code: while this is mostly normal, there is a new driver in here (the driver was hosted outside the kernel for several years and is actually a fairly mature and well coded driver). It amounts to 13,000 of the 16,000 lines of added code in here. Summary: - Updates/fixes for iw_cxgb4 driver - Updates/fixes for mlx5 driver - Add flow steering and RSS API - Add hardware stats to mlx4 and mlx5 drivers - Add firmware version API for RDMA driver use - Add the rxe driver (this is a software RoCE driver that makes any Ethernet device a RoCE device) - Fixes for i40iw driver - Support for send only multicast joins in the cma layer - Other minor fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (72 commits) Soft RoCE driver IB/core: Support for CMA multicast join flags IB/sa: Add cached attribute containing SM information to SA port IB/uverbs: Fix race between uverbs_close and remove_one IB/mthca: Clean up error unwind flow in mthca_reset() IB/mthca: NULL arg to pci_dev_put is OK IB/hfi1: NULL arg to sc_return_credits is OK IB/mlx4: Add diagnostic hardware counters net/mlx4: Query performance and diagnostics counters net/mlx4: Add diagnostic counters capability bit Use smaller 512 byte messages for portmapper messages IB/ipoib: Report SG feature regardless of HW UD CSUM capability IB/mlx4: Don't use GFP_ATOMIC for CQ resize struct IB/hfi1: Disable by default IB/rdmavt: Disable by default IB/mlx5: Fix port counter ID association to QP offset IB/mlx5: Fix iteration overrun in GSI qps i40iw: Add NULL check for puda buffer i40iw: Change dup_ack_thresh to u8 i40iw: Remove unnecessary check for moving CQ head ...
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford35-1/+12884
2016-08-04Soft RoCE driverMoni Shoua34-0/+12884
Soft RoCE (RXE) - The software RoCE driver ib_rxe implements the RDMA transport and registers to the RDMA core device as a kernel verbs provider. It also implements the packet IO layer. On the other hand ib_rxe registers to the Linux netdev stack as a udp encapsulating protocol, in that case RDMA, for sending and receiving packets over any Ethernet device. This yields a RDMA transport over the UDP/Ethernet network layer forming a RoCEv2 compatible device. The configuration procedure of the Soft RoCE drivers requires binding to any existing Ethernet network device. This is done with /sys interface. A userspace Soft RoCE library (librxe) provides user applications the ability to run with Soft RoCE devices. The use of rxe verbs ins user space requires the inclusion of librxe as a device specifics plug-in to libibverbs. librxe is packaged separately. Architecture: +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ +---------------------------------------------------------------+ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Soft RoCE resources: [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in Github [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE Wiki page [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-03IB/rdmavt: Disable by defaultBart Van Assche1-1/+0
There is a strict policy in the Linux kernel that new drivers must be disabled by default. Hence leave out the "default m" line from Kconfig. Fixes: 0194621b2253 ("IB/rdmavt: Create module framework and handle driver registration") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Jubin John <jubin.john@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: <stable@vger.kernel.org> # v4.6+ Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Eliminate redundant opcode test in mr ref clearIra Weiny1-2/+1
The use of the specific opcode test is redundant since all ack entry users correctly manipulate the mr pointer to selectively trigger the reference clearing. The overly specific test hinders the use of implementation specific operations. The change needs to get rid of the union to insure that an atomic value is not seen as an MR pointer. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabledJianxin Xiong1-16/+32
Hanging has been observed while writing a file over NFSoRDMA. Dmesg on the server contains messages like these: [ 931.992501] svcrdma: Error -22 posting RDMA_READ [ 952.076879] svcrdma: Error -22 posting RDMA_READ [ 982.154127] svcrdma: Error -22 posting RDMA_READ [ 1012.235884] svcrdma: Error -22 posting RDMA_READ [ 1042.319194] svcrdma: Error -22 posting RDMA_READ Here is why: With the base memory management extension enabled, FRMR is used instead of FMR. The xprtrdma server issues each RDMA read request as the following bundle: (1)IB_WR_REG_MR, signaled; (2)IB_WR_RDMA_READ, signaled; (3)IB_WR_LOCAL_INV, signaled & fencing. These requests are signaled. In order to generate completion, the fast register work request is processed by the hfi1 send engine after being posted to the work queue, and the corresponding lkey is not valid until the request is processed. However, the rdmavt driver validates lkey when the RDMA read request is posted and thus it fails immediately with error -EINVAL (-22). This patch changes the work flow of local operations (fast register and local invalidate) so that fast register work requests are always processed immediately to ensure that the corresponding lkey is valid when subsequent work requests are posted. Local invalidate requests are processed immediately if fencing is not required and no previous local invalidate request is pending. To allow completion generation for signaled local operations that have been processed before posting to the work queue, an internal send flag RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag and only generates completion for such requests. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/hfi1: Add the capability for reserved operationsMike Marciniszyn1-23/+62
This fix allows for support of in-kernel reserved operations without impacting the ULP user. The low level driver can register a non-zero value which will be transparently added to the send queue size and hidden from the ULP in every respect. ULP post sends will never see a full queue due to a reserved post send and reserved operations will never exceed that registered value. The s_avail will continue to track the ULP swqe availability and the difference between the reserved value and the reserved in use will track reserved availabity. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Add missing spin_lock_init call for rdi->n_cqs_lockJianxin Xiong1-0/+1
This fixes the following warning with PROV_LOCKING enabled kernel: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 15 PID: 12286 Comm: modprobe Not tainted 4.7.0-rc5.prove_rcu+ #1 Hardware name: Intel Corporation S2600WT2R/S2600WT2R, ...... Call Trace: [<ffffffff8139ec0d>] dump_stack+0x85/0xc8 [<ffffffff810eb765>] register_lock_class+0x415/0x4b0 [<ffffffff810ede1c>] ? __lock_acquire+0x40c/0x1960 [<ffffffff810edaa9>] __lock_acquire+0x99/0x1960 [<ffffffff8120ab62>] ? find_vmap_area+0x42/0x60 [<ffffffff8120ab39>] ? find_vmap_area+0x19/0x60 [<ffffffff810ef9d3>] lock_acquire+0xd3/0x200 [<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffff81763391>] _raw_spin_lock+0x31/0x40 [<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffffa049d598>] rvt_create_cq+0xc8/0x250 [rdmavt] [<ffffffff810ead46>] ? static_obj+0x36/0x50 [<ffffffffa0469e39>] ib_alloc_cq+0x49/0x180 [ib_core] [<ffffffffa047bed4>] ib_mad_init_device+0x204/0x6d0 [ib_core] [<ffffffff810e968f>] ? up_write+0x1f/0x40 [<ffffffffa046e2c0>] ib_register_device+0x3d0/0x510 [ib_core] [<ffffffffa0752410>] ? read_cc_setting_bin+0x200/0x200 [hfi1] [<ffffffff810ead46>] ? static_obj+0x36/0x50 [<ffffffff810eb888>] ? lockdep_init_map+0x88/0x200 [<ffffffffa049cbff>] rvt_register_device+0x17f/0x320 [rdmavt] [<ffffffffa0766caa>] hfi1_register_ib_device+0x6ca/0x7c0 [hfi1] [<ffffffffa0733de4>] init_one+0x2b4/0x430 [hfi1] [<ffffffff813e40a5>] local_pci_probe+0x45/0xa0 [<ffffffff813e5110>] ? pci_match_device+0xe0/0x110 [<ffffffff813e550c>] pci_device_probe+0xfc/0x140 [<ffffffff814daee9>] driver_probe_device+0x239/0x460 [<ffffffff814db1dd>] __driver_attach+0xcd/0xf0 [<ffffffff814db110>] ? driver_probe_device+0x460/0x460 [<ffffffff814d89b3>] bus_for_each_dev+0x73/0xc0 [<ffffffff814da74e>] driver_attach+0x1e/0x20 [<ffffffff814da1b3>] bus_add_driver+0x1d3/0x290 [<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1] [<ffffffff814dbf60>] driver_register+0x60/0xe0 [<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1] [<ffffffff813e39d0>] __pci_register_driver+0x60/0x70 [<ffffffffa04cc2aa>] hfi1_mod_init+0x196/0x1fe [hfi1] [<ffffffff81002190>] do_one_initcall+0x50/0x190 [<ffffffff8110be72>] ? rcu_read_lock_sched_held+0x62/0x70 [<ffffffff8122d4aa>] ? kmem_cache_alloc_trace+0x23a/0x2a0 [<ffffffff811c1881>] ? do_init_module+0x27/0x1dc [<ffffffff811c18ba>] do_init_module+0x60/0x1dc [<ffffffff811360cc>] load_module+0x132c/0x1ac0 [<ffffffff81132c40>] ? __symbol_put+0x60/0x60 [<ffffffff8133e50d>] ? ima_post_read_file+0x3d/0x80 Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Handle local operations in post sendJianxin Xiong1-5/+40
Some work requests are local operations, such as IB_WR_REG_MR and IB_WR_LOCAL_INV. They differ from non-local operations in that: (1) Local operations can be processed immediately without being posted to the send queue if neither fencing nor completion generation is needed. However, to ensure correct ordering, once a local operation is posted to the work queue due to fencing or completion requiement, all subsequent local operations must also be posted to the work queue until all the local operations on the work queue have completed. (2) Local operations don't send packets over the wire and thus don't need (and shouldn't update) the packet sequence numbers. Define a new a flag bit for the post send table to identify local operations. Add a new field to the QP structure to track the number of local operations on the send queue to determine if direct processing of new local operations should be enabled/disabled. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Add mechanism to invalidate MR keysJianxin Xiong1-2/+71
In order to support extended memory management, add the mechanism to invalidate MR keys. This includes a flag "lkey_invalid" in the MR data structure that is to be checked when validating access to the MR via the associated key, and two utility functions to perform fast memory registration and memory key invalidate operations. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Add support for ib_map_mr_sgJianxin Xiong3-0/+60
This implements the device specific function needed by the verbs API function ib_map_mr_sg(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Use new driver specific post send tableMike Marciniszyn2-39/+10
Change rvt_post_one_wr to use the new table mechanism for post send. Validate that each low level driver specifies the table. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/rdmavt: Add data structures and routines for table driven post sendMike Marciniszyn1-5/+62
Add flexibility for driver dependent operations in post send because different drivers will have differing post send operation support. This includes data structure definitions to support a table driven scheme along with the necessary validation routine using the new table. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/rdmavt: Correct qp_priv_alloc() return value testMike Marciniszyn1-1/+3
The current drivers return errors from this calldown wrapped in an ERR_PTR(). The rdmavt code incorrectly tests for NULL. The code is fixed to use IS_ERR() and change ret according to the driver return value. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qpAshutosh Dixit1-6/+0
Since rvt_reset_qp already zero's out qp->s_ack_queue head and tail pointers, there is no need to zero out qp->s_ack_queue itself. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-17IB/rdmavt: Correct warning during QPN allocationBrian Welty1-2/+2
Correct calculation of the low order bits which should be unset based on use of qos_shift parameter when assigning QPN. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-17IB/rdmavt: Correct required callback functions for MODIFY_QPBrian Welty1-3/+1
Functions required for MODIFY_PORT were incorrectly being required for MODIFY_QP. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/rdmavt: Annotate rvt_reset_qp()Bart Van Assche1-0/+6
This patch avoids that sparse reports the following warning: rdmavt/qp.c:507:17: warning: context imbalance in 'rvt_reset_qp' - unexpected unlock Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/rdamvt: Fix rdmavt s_ack_queue sizingMike Marciniszyn1-2/+20
rdmavt allows the driver to specify the size of the ack queue, but only uses it for the modify QP limit testing for setting the atomic limit value. The driver dependent size is now used to size the s_ack_queue ring dynamicially. Since the driver knows its size, the driver will use its define for any ring size dependent code. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/rdmavt: Use kzalloc_nodeJubin John1-1/+3
Use kzalloc_node instead of kzalloc for rdmavt memory region segment allocation to optimize for performance on NUMA platforms. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/rdmavt: Insure QP vmalloc variants zero memoryMike Marciniszyn1-4/+4
The usage of the various vmalloc APIs do not consistently zero memory when allocating the swqe. Insure zeroing variants are used. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/rdmavt: Increase CQ callback thread priorityMike Marciniszyn1-0/+1
The priority of the send engines is higher than the CQ completion thread potentially causing completions to be starved for very fast interfaces. Change the CQ kthread to match the send engine threads to minimize this delay for ULP completion processing. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/hfi1: Use global defines for upper bits in opcodeMike Marciniszyn1-3/+3
The awkward coding for setting the allowed_ops field was tripping an smatch warning. This patch uses the more appropriate defines from include/rdma to avoid the issue. As part of the patch remove a mask that was duplicated in rdmavt include files and use that mask as appropriate. Fixes: 8bea6b1cfe6f ("IB/rdmavt: Add create queue pair functionality") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/rdmavt,hfi1,qib: Fix memory leakJubin John1-0/+13
rdi->ports has memory allocated in rvt_alloc_device(), but does not get freed because the hfi1 and qib drivers drivers call ib_dealloc_device() directly instead of going through rdmavt. Add a rvt_dealloc_device() that frees rdi->ports and then calls ib_dealloc_device(). Switch hfi1 and qib drivers to calling rvt_dealloc_device() instead of ib_dealloc_device() directly. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/rdmavt: Fix send schedulingJubin John1-2/+2
call_send is used to determine whether to send immediately or schedule a send for later. The current logic in rdmavt is inverted and has a negative impact on the latency of the hfi1 and qib drivers. Fix this regression by correctly calling send immediately when call_send is set. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/rdmavt: Post receive for QP in ERR stateAlex Estrin1-9/+24
Accordingly IB Spec post WR to receive queue must complete with error if QP is in Error state. Please refer to C10-42, C10-97.2.1 Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17IB/hfi1: Report pid in qp_stats to aid debugMike Marciniszyn1-0/+1
Tracking user/QP ownership is needed to debug issues with user ULPs like OpenMPI. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Check lkey_table_size value before useJubin John1-2/+1
The lkey_table_size driver specific parameter value is used before its value is sanity checked and restricted to RVT_MAX_LKEY_TABLE_BITS. This causes a vmalloc allocation failure for large values. Fix this by moving the value check before the first usage of the value. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdamvt: fix cross build with rdmavtMike Marciniszyn1-1/+1
The new check routine causes a larger than supported frame size on s390. Changing the check routine to noinline fixes the issue. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/qib, staging/rdma/hfi1, IB/rdmavt: progress selection changesMike Marciniszyn1-3/+7
The non-rdamvt versions of qib and hfi1 allow for a differing heuristic to override a schedule progress in favor of a direct call the the progress routine. This patch adds that for both drivers and rdmavt. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Remove unnecessary exported functionsDennis Dalessandro5-138/+128
Remove exported functions which are no longer required as the functionality has moved into rdmavt. This also requires re-ordering some of the functions since their prototype no longer appears in a header file. Rather than add forward declarations it is just cleaner to re-order some of the functions. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Remove RVT_FLAGsDennis Dalessandro3-18/+0
While hfi1 and qib were still supporting bits and pieces of core verbs components there needed to be a way to convey if rdmavt should handle allocation and initialize of resources like the queue pair table. Now that all of this is moved into rdmavt there is no need for these flags. They are no longer used in the drivers. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add per verb driver callback checkingDennis Dalessandro1-61/+380
For each verb validate that all requirements for driver callbacks are met. If a function is called without checking for a valid pointer, it is a required function. Also document what each callback function does. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Clean up comments and add more documentationDennis Dalessandro10-63/+280
Add, remove, and otherwise clean up existing comments that are leftover from the initial code postings of rdmavt. Many of the comments were added to provide an idea on the direction we were thinking of going. Now that the design is solidified make a pass over and clean everything up. Also add details where lacking. Ensure all non static functions have nano comments. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add trace and error print statements in post_one_wrHarish Chegondi2-1/+77
These trace and error print statements would help in debugging issues which are caused due to messed up QP ring buffer pointers. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/qib, staging/rdma/hfi1: add s_hlock for use in post sendMike Marciniszyn1-21/+76
This patch adds an additional lock to reduce contention on the s_lock. This lock is used in post_send() so that the post_send is not serialized with the send engine and other send related processing. To do this the s_next_psn is now maintained on post_send() while post_send() related fields are moved to a new cache line. There is an s_avail maintained for the post_send() to mitigate trading cache lines with the send engine. The lock is released/acquired around releasing the just built packet to the egress mechanism. Reviewed-by: Jubin John <jubin.john@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt, staging/rdma/hfi1: use qps to dynamically scale timeout valueVennila Megavannan1-0/+17
A busy_jiffies variable is maintained and updated when rc qps are created and deleted. busy_jiffies is a scaled value of the number of rc qps in the device. busy_jiffies is incremented every rc qp scaling interval. busy_jiffies is added to the rc timeout in add_retry_timer and mod_retry_timer. The rc qp scaling interval is selected based on extensive performance evaluation of targeted workloads. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10staging/rdma/hfi1: use new RNR timerMike Marciniszyn1-1/+0
Use the new RNR timer for hfi1. For qib, this timer doesn't exist, so exploit driver callbacks to use the new timer as appropriate. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10staging/rdma/hfi1: Remove modify queue pair from hfi1Dennis Dalessandro1-7/+0
In addition to removing the modify queue pair verb from hfi1 we also remove ancillary functions which existed only for modify queue pair and are also already present in hfi1. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10staging/rdma/hfi1: Clean up return handlingDennis Dalessandro1-3/+1
Return directly from rvt_resize_cq rather than use a goto/label. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Properly pass gfp to hw driver functionIra Weiny1-2/+1
alloc_qpn must use GFP and the hardware drivers should use it as well. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add support for query_port, modify_port and get_port_immutableHarish Chegondi1-22/+55
rvt_query_port calls into the driver through a call back function query_port_state to populate the rest of ib_port_attr elements. rvt_modify_port calls into the driver if needed through a call back function shut_down_port() Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add query gid support.Dennis Dalessandro1-3/+14
Addin query gid support. Rdmavt still relies on the driver to maintain the gid table. Rdmavt simply calls into the driver to retrive the guid for a particular port. Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Clean up distinction between port number and indexDennis Dalessandro5-23/+52
IB core uses 1 relative indexing for ports. All of our data structures use 0 based indexing. Add an inline function that we can use whenever we need to validate a legal value and try to convert a port number to a port index at the entrance into rdmavt. Try to follow the policy that when we are talking about a port from IB core point of view we refer to it as a port number. When port is an index into our arrays refer to it as a port index. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add Mem affinity supportMitko Haralanov3-11/+14
Change verbs memory allocations to the device numa node. This keeps memory close to the device for optimal performance. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add hardware driver send work request checkIra Weiny1-0/+4
Some hardware drivers requires additional checks on send WRs. Create an optional call back to allow hardware drivers to reject a send WR. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add srq functionality to rdmavtJubin John4-6/+290
Fill in srq function stubs with code derived from hfi1 and qib. Move necessary functions and data structure members as well. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-10IB/rdmavt: Add support for rvt_query_qpHarish Chegondi1-1/+46
Drivers using rdmavt can rely on rvt_query_qp instead of defining their own query_qp functions. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>