aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-22bnxt_re: fix the regression due to changes in alloc_pblDevesh Sharma3-22/+14
While adding the use of for_each_sg_dma_page iterator for Brodcom's rdma driver, there was a regression added in the __alloc_pbl path. The change left bnxt_re in DOA state in for-next branch. Fixing the regression to avoid the host crash when a user space object is created. Restricting the unconditional access to hwq.pg_arr when hwq is initialized for user space objects. Fixes: 161ebe2498d4 ("RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL") Reported-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22IB/mlx4: Increase the timeout for CM cacheHåkon Bugge1-1/+1
Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted form the cache. But life is not perfect, remote peers may die or be rebooted. Hence, it's a timeout to wipe out a cache entry, when the PF driver assumes the remote peer has gone. During workloads where a high number of QPs are destroyed concurrently, excessive amount of CM DREQ retries has been observed The problem can be demonstrated in a bare-metal environment, where two nodes have instantiated 8 VFs each. This using dual ported HCAs, so we have 16 vPorts per physical server. 64 processes are associated with each vPort and creates and destroys one QP for each of the remote 64 processes. That is, 1024 QPs per vPort, all in all 16K QPs. The QPs are created/destroyed using the CM. When tearing down these 16K QPs, excessive CM DREQ retries (and duplicates) are observed. With some cat/paste/awk wizardry on the infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the nodes: cm_rx_duplicates: dreq 2102 cm_rx_msgs: drep 1989 dreq 6195 rep 3968 req 4224 rtu 4224 cm_tx_msgs: drep 4093 dreq 27568 rep 4224 req 3968 rtu 3968 cm_tx_retries: dreq 23469 Note that the active/passive side is equally distributed between the two nodes. Enabling pr_debug in cm.c gives tons of: [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave: 1,sl_cm_id: 0xd393089f} is NULL! By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the tear-down phase of the application is reduced from approximately 90 to 50 seconds. Retries/duplicates are also significantly reduced: cm_rx_duplicates: dreq 2460 [] cm_tx_retries: dreq 3010 req 47 Increasing the timeout further didn't help, as these duplicates and retries stems from a too short CMA timeout, which was 20 (~4 seconds) on the systems. By increasing the CMA timeout to 22 (~17 seconds), the numbers fell down to about 10 for both of them. Adjustment of the CMA timeout is not part of this commit. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21IB/mlx5: Validate correct PD before prefetch MRMoni Shoua1-16/+30
When prefetching odp mr it is required to verify that pd of the mr is identical to the pd for which the advise_mr request arrived with. This check was missing from synchronous flow and is added now. Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support") Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21IB/mlx5: Protect against prefetch of invalid MRMoni Shoua4-23/+102
When deferring a prefetch request we need to protect against MR or PD being destroyed while the request is still enqueued. The first step is to validate that PD owns the lkey that describes the MR and that the MR that the lkey refers to is owned by that PD. The second step is to dequeue all requests when MR is destroyed. Since PD can't be destroyed while it owns MRs it is guaranteed that when a worker wakes up the request it refers to is still valid. Now, it is possible to refrain from taking a reference on the device since it is assured to be present as pd. While that, replace the dedicated ordered workqueue with the system unbound workqueue to reuse an existing resource and improve performance. This will also fix a bug of queueing to the wrong workqueue. Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support") Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21IB/hfi1: Add missing break in switch statementGustavo A. R. Silva1-0/+1
Fix the following warning by adding a missing break: drivers/infiniband/hw/hfi1/tid_rdma.c: In function ‘hfi1_tid_rdma_wqe_interlock’: drivers/infiniband/hw/hfi1/tid_rdma.c:3251:3: warning: this statement may fall through [-Wimplicit-fallthrough=] switch (prev->wr.opcode) { ^~~~~~ drivers/infiniband/hw/hfi1/tid_rdma.c:3259:2: note: here case IB_WR_RDMA_READ: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Fixes: c6c231175ccd ("IB/hfi1: Add interlock between TID RDMA WRITE and other requests") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kaike Wan <Kaike.wan@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21Merge branch 'mlx5-next' into rdma.git for-nextJason Gunthorpe3-77/+101
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux To resolve conflicts with net-next and pick up the first patch. * branch 'mlx5-next': net/mlx5: Factor out HCA capabilities functions IB/mlx5: Add support for 50Gbps per lane link modes net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register net/mlx5: Add new fields to Port Type and Speed register net/mlx5: Refactor queries to speed fields in Port Type and Speed register net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode net/mlx5: Relocate vport macros to the vport header file net/mlx5: E-Switch, Normalize the name of uplink vport number net/mlx5: Provide an alternative VF upper bound for ECPF net/mlx5: Add host params change event net/mlx5: Add query host params command net/mlx5: Update enable HCA dependency net/mlx5: Introduce Mellanox SmartNIC and modify page management logic IB/mlx5: Use unified register/load function for uplink and VF vports net/mlx5: Use consistent vport num argument type net/mlx5: Use void pointer as the type in address_of macro net/mlx5: Align ODP capability function with netdev coding style mlx5: use RCU lock in mlx5_eq_cq_get() Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-20drivers/IB,qib: Fix pinned/locked limit check in qib_get_user_pages()Davidlohr Bueso1-1/+1
The current check does not take into account the previous value of pinned_vm; thus it is quite bogus as is. Fix this by checking the new value after the (optimistic) atomic inc. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19iw_cxgb4: Make function read_tcb() staticWei Yongjun1-1/+1
Fixes the following sparse warning: drivers/infiniband/hw/cxgb4/cm.c:658:6: warning: symbol 'read_tcb' was not declared. Should it be static? Fixes: 11a27e2121a5 ("iw_cxgb4: complete the cached SRQ buffers") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/hns: Bugfix for set hem of SCCYangyang Li1-0/+3
The method of set hem for scc context is different from other contexts. It should notify the hardware with the detailed idx in bt0 for scc, while for other contexts, it only need to notify the bt step and the hardware will calculate the idx. Here fixes the following error when unloading the hip08 driver: [ 123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 123.579023] {1}[Hardware Error]: event severity: recoverable [ 123.584670] {1}[Hardware Error]: Error 0, type: recoverable [ 123.590317] {1}[Hardware Error]: section_type: PCIe error [ 123.595877] {1}[Hardware Error]: version: 4.0 [ 123.600395] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 123.606562] {1}[Hardware Error]: device_id: 0000:7d:00.0 [ 123.612034] {1}[Hardware Error]: slot: 0 [ 123.616120] {1}[Hardware Error]: secondary_bus: 0x00 [ 123.621245] {1}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa222 [ 123.627847] {1}[Hardware Error]: class_code: 000002 [ 123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000 [ 123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID [ 123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000 [ 123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!! [ 123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified [ 123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error [ 123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error [ 123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5 [ 123.683147] hns3 0000:7d:00.0: AER: Device recovery successful [ 123.688978] hns3 0000:7d:00.0: PF Reset requested [ 123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF [ 123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5! Fixes: 6a157f7d1b14 ("RDMA/hns: Add SCC context allocation support for hip08") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Reviewed-by: Yixian Liu <liuyixian@huawei.com> Reviewed-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/hns: Modify qp&cq&pd specification according to UMLijun Ou1-5/+5
Accroding to hip08's limitation, qp&cq specification is 1M, mtpt specification 1M in kernel space. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19IB/usnic: Fix deadlockParvi Kaustubhi3-28/+19
There is a dead lock in usnic ib_register and netdev_notify path. usnic_ib_discover_pf() | mutex_lock(&usnic_ib_ibdev_list_lock); | usnic_ib_device_add(); | ib_register_device() | usnic_ib_query_port() | mutex_lock(&us_ibdev->usdev_lock); | ib_get_eth_speed() | rtnl_lock() order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock rtnl_lock() | usnic_ib_netdevice_event() | mutex_lock(&usnic_ib_ibdev_list_lock); order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock Solution is to use the core's lock-free ib_device_get_by_netdev() scheme to lookup ib_dev while handling netdev & inet events. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Reviewed-by: Tanmay Inamdar <tinamdar@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/cxgb4: Remove kref accounting for sync operationLeon Romanovsky3-29/+3
Ucontext allocation and release aren't async events and don't need kref accounting. The common layer of RDMA subsystem ensures that dealloc ucontext will be called after all other objects are released. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/nes: Remove useless usecnt variable and redundant memsetLeon Romanovsky2-9/+1
The internal design of RDMA/core ensures that there dealloc ucontext will be called only if alloc_ucontext succeeded, hence there is no need to manage internal variable to mark validity of ucontext. As part of this change, remove redundant memeset too. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/hfi1: Fix a build warning for TID RDMA READKaike Wan1-0/+1
The following build warning was produced for the TID RDMA READ patch ("IB/hfi1: Enable TID RDMA READ protocol"): drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe': drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall through [-Wimplicit-fallthrough=] hfi1_setup_tid_rdma_wqe(qp, wqe); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/qp.c:329:2: note: here case IB_QPT_UC: ^~~~ This patch will fix the issue by adding the "fall through" comment. Fixes: f1ab4efa6d32 ("IB/hfi1: Enable TID RDMA READ protocol") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIsShamir Rabinovitch17-144/+172
Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Configure capacity of hns deviceLijun Ou1-0/+5
This patch adds new device capability for IB_DEVICE_MEM_MGT_EXTENSIONS to indicate device support for the following features: 1. Fast register memory region. 2. send with remote invalidate by frmr 3. local invalidate memory regsion As well as adds the max depth of frmr page list len. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Delete useful prints for aeq subtype eventYixian Liu1-51/+6
Current all messages printed for aeq subtype event are wrong. Thus, delete them and only the value of subtype event is printed. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Set allocated memory to zero for wridYixian Liu1-4/+4
The memory allocated for wrid should be initialized to zero. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Fix the state of rereg mrYixian Liu1-0/+3
The state of mr after reregister operation should be set to valid state. Otherwise, it will keep the same as the state before reregistered. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14RDMA/hns: Limit minimum ROCE CQ depth to 64chenglang1-0/+1
This patch modifies the minimum CQ depth specification of hip08 and is consistent with the processing of hip06. Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-14IB/mlx5: Add support for 50Gbps per lane link modesAya Levin1-5/+61
Driver now supports new link modes: 50Gbps per lane support for 50G/100G/200G. This patch reads the correct field (legacy vs. extended) based on a FW indication bit, and adds a translation function (link modes to IB width and speed) to the new link modes. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Add support to ext_* fields introduced in Port Type and Speed registerAya Levin1-1/+2
This patch exposes new link modes (including 50Gbps per lane), and ext_* fields which describes the new link modes in Port Type and Speed register (PTYS). Access functions, translation functions (speed <-> HW bits) and link max speed function were modified. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Refactor queries to speed fields in Port Type and Speed registerAya Levin1-2/+4
This patch fascicles queries to speed related fields in Port Type and Speed register (PTYS) into a single API. I addition, this patch refactors functions which serves only Ethernet driver: remove the protocol type as an input parameter, move code from 'core' directory into 'en' directory and add 'eth' prefix to the function's name. The patch also encapsulates functions that are not used outside the Ethernet driver removes redundant include files. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: E-Switch, Normalize the name of uplink vport numberBodong Wang1-2/+2
Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to comply with the same naming convention along with the introduction of other vports. Use MLX5_VPORT as the prefix for such vports and relocate the uplink vport definition to public header file for the benefits of both net and IB drivers. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14IB/mlx5: Use unified register/load function for uplink and VF vportsBodong Wang3-72/+37
IB driver maintains different registration and load function calls for uplink and VF vports. This is not necessary as they only differ with each other on their profiles. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13RDMA/nes: Use for_each_sg_dma_page iterator for umem SGLShiraz, Saleem1-131/+89
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13RDMA: Fix allocation failure on pointer pdColin Ian King1-1/+1
The null check on an allocation failure on pd is currently checking if pd is non-null rather than null. Fix this by adding the missing ! operator. Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-13Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma into for-nextDoug Ledford38-612/+588
I had merged the hfi1-tid code into my local copy of for-next, but was waiting on 0day testing before pushing it (I pushed it to my wip branch). Having waited several days for 0day testing to show up, I'm finally just going to push it out. In the meantime, though, Jason pushed other stuff to for-next, so I needed to merge up the branches before pushing. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-11RDMA/bnxt_re: fix or'ing of data into an uninitialized struct memberColin Ian King1-1/+1
The struct member comp_mask has not been initialized however a bit pattern is being bitwise or'd into the member and hence other bit fields in comp_mask may contain any garbage from the stack. Fix this by making the bitwise or into an assignment. Fixes: 95b86d1c91ad ("RDMA/bnxt_re: Update kernel user abi to pass chip context") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/mlx5: Fix memory leak in case we fail to add an IB deviceMark Bloch1-1/+3
Make sure the IB device is freed on failure. Fixes: b5ca15ad7e61 ("IB/mlx5: Add proper representors support") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11IB/mlx5: Fix bad flow upon DEVX mkey creationYishai Hadas1-2/+3
Fix bad flow upon DEVX mkey creation to prevent deleting the indirect mkey from the radix tree in case there was a previous failure to insert it. Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/ocrdma: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem1-32/+23
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/qedr: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem1-37/+30
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/vmw_pvrdma: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem1-13/+8
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/cxgb3: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem1-17/+12
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/cxgb4: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem1-19/+13
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/hns: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem4-74/+56
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/i40iw: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem1-19/+16
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/mthca: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem1-20/+16
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGLShiraz, Saleem2-18/+14
Use the for_each_sg_dma_page iterator variant to walk the umem DMA-mapped SGL and get the page DMA address. This avoids the extra loop to iterate pages in the SGE when for_each_sg iterator is used. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-09Merge branch 'wip/dl-for-next' into for-nextDoug Ledford36-399/+9595
Due to concurrent work by myself and Jason, a normal fast forward merge was not possible. This brings in a number of hfi1 changes, mainly the hfi1 TID RDMA support (roughly 10,000 LOC change), which was reviewed and integrated over a period of days. Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-02-08iw_cxgb4: fix srqidx leak during connection abortRaju Rangoju1-2/+3
When an application aborts the connection by moving QP from RTS to ERROR, then iw_cxgb4's modify_rc_qp() RTS->ERROR logic sets the *srqidxp to 0 via t4_set_wq_in_error(&qhp->wq, 0), and aborts the connection by calling c4iw_ep_disconnect(). c4iw_ep_disconnect() does the following: 1. sends up a close_complete_upcall(ep, -ECONNRESET) to libcxgb4. 2. sends abort request CPL to hw. But, since the close_complete_upcall() is sent before sending the ABORT_REQ to hw, libcxgb4 would fail to release the srqidx if the connection holds one. Because, the srqidx is passed up to libcxgb4 only after corresponding ABORT_RPL is processed by kernel in abort_rpl(). This patch handle the corner-case by moving the call to close_complete_upcall() from c4iw_ep_disconnect() to abort_rpl(). So that libcxgb4 is notified about the -ECONNRESET only after abort_rpl(), and libcxgb4 can relinquish the srqidx properly. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08iw_cxgb4: complete the cached SRQ buffersRaju Rangoju3-8/+157
If TP fetches an SRQ buffer but ends up not using it before the connection is aborted, then it passes the index of that SRQ buffer to the host in ABORT_REQ_RSS or ABORT_RPL CPL message. But, if the srqidx field is zero in the received ABORT_RPL or ABORT_REQ_RSS CPL, then we need to read the tcb.rq_start field to see if it really did have an RQE cached. This works around a case where HW does not include the srqidx in the ABORT_RPL/ABORT_REQ_RSS CPL. The final value of rq_start is the one present in TCB with the TF_RX_PDU_OUT bit cleared. So, we need to read the TCB, examine the TF_RX_PDU_OUT (bit 49 of t_flags) in order to determine if there's a rx PDU feedback event pending. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08RDMA: Handle PD allocations by IB/coreLeon Romanovsky27-342/+218
The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08IB/usnic: Fix locking when unregisteringParvi Kaustubhi1-2/+4
Move the call to usnic_ib_device_remove after usnic_ib_ibdev_list_lock has been released. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08iw_cxgb4: use tos when finding ipv6 routesSteve Wise1-2/+3
When IPv6 support was added, the correct tos was not passed to cxgb_find_route6(). This potentially results in the wrong route entry. Fixes: 830662f6f032 ("RDMA/cxgb4: Add support for active and passive open connection with IPv6 address") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08iw_cxgb4: use tos when importing the endpointSteve Wise1-1/+1
import_ep() is passed the correct tos, but doesn't use it correctly. Fixes: ac8e4c69a021 ("cxgb4/iw_cxgb4: TOS support") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08iw_cxgb4: use listening ep tos when accepting new connectionsSteve Wise1-2/+7
If the parent listening endpoint has a service type set, then use that when setting up the connection. This allows server-side applications to mandate the tos for passive side connections via rdma_set_service_type() on the listening endpoints. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Update kernel user abi to pass chip contextDevesh Sharma1-3/+14
User space verbs provider library would need chip context. Changing the ABI to add chip version details in structure. Furthermore, changing the kernel driver ucontext allocation code to initialize the abi structure with appropriate values. As suggested by community, appended the new fields at the bottom of the ABI structure and retaining to older fields as those were in the older versions. Keeping the ABI version at 1 and adding a new field in the ucontext response structure to hold the component mask. The user space library should check pre-defined flags to figure out if a certain feature is supported on not. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-07RDMA/bnxt_re: Add extended psn structure for 57500 adaptersDevesh Sharma5-18/+70
The new 57500 series of adapter has bigger psn search structure. The size of new structure is 16B. Changing the control path memory allocation and fast path code to accommodate the new psn structure while maintaining the backward compatibility. There are few additional changes listed below: - For 57500 chip max-sge are limited to 6 for now. - For 57500 chip max-receive-sge should be set to 6 for now. - Add driver/hardware interface structure for new chip. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>