aboutsummaryrefslogtreecommitdiffstats
path: root/include/rdma (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-11-12RDMA/core: Make FD destroy callback voidLeon Romanovsky1-2/+2
All FD object destroy implementations return 0, so declare this callback void. Link: https://lore.kernel.org/r/20201104144556.3809085-3-leon@kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12RDMA/core: Postpone uobject cleanup on failure till FD closeLeon Romanovsky1-42/+2
Remove the ib_is_destroyable_retryable() concept. The idea here was to allow the drivers to forcibly clean the HW object even if they otherwise didn't want to (eg because of usecnt). This was an attempt to clean up in a world where drivers were not allowed to fail HW object destruction. Now that we are going back to allowing HW objects to fail destroy this doesn't make sense. Instead if a uobject's HW object can't be destroyed it is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to clean it. Multiple passes over the uobject list allow hidden dependencies to be resolved. If that fails the HW driver is broken, throw a WARN_ON and leak the HW object memory. All the other tricky failure paths (eg on creation error unwind) have already been updated to this new model. Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28IB/verbs: avoid nested container_of()Arnd Bergmann1-6/+9
Nested container_of() calls work correctly but cause a warning when building with W=2. Invoking it from an inline function like in drivers/infiniband/hw/mlx5/mlx5_ib.h means we get hundreds of warnings like: include/linux/kernel.h:852:8: warning: declaration of '__mptr' shadows a previous local [-Wshadow] 852 | void *__mptr = (void *)(ptr); \ | ^~~~~~ include/rdma/uverbs_ioctl.h:651:11: note: in expansion of macro 'container_of' 651 | (udata ? container_of(container_of(udata, struct uverbs_attr_bundle, \ | ^~~~~~~~~~~~ include/rdma/uverbs_ioctl.h:651:24: note: in expansion of macro 'container_of' 651 | (udata ? container_of(container_of(udata, struct uverbs_attr_bundle, \ | ^~~~~~~~~~~~ drivers/infiniband/hw/mthca/mthca_qp.c:564:35: note: in expansion of macro 'rdma_udata_to_drv_context' 564 | struct mthca_ucontext *context = rdma_udata_to_drv_context( | ^~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/kernel.h:852:8: note: shadowed declaration is here 852 | void *__mptr = (void *)(ptr); \ | ^~~~~~ include/rdma/uverbs_ioctl.h:651:11: note: in expansion of macro 'container_of' 651 | (udata ? container_of(container_of(udata, struct uverbs_attr_bundle, \ | ^~~~~~~~~~~~ drivers/infiniband/hw/mthca/mthca_qp.c:564:35: note: in expansion of macro 'rdma_udata_to_drv_context' 564 | struct mthca_ucontext *context = rdma_udata_to_drv_context( | ^~~~~~~~~~~~~~~~~~~~~~~~~ In file included from <command-line>: include/linux/kernel.h:852:8: warning: declaration of '__mptr' shadows a previous local [-Wshadow] 852 | void *__mptr = (void *)(ptr); \ | ^~~~~~ Rewrite the macro to use an inline function internally, which makes it more readable and reduces the amount of useless output from make W=2. Fixes: 730623f4a56f ("IB/verbs: Add helper function rdma_udata_to_drv_context") Link: https://lore.kernel.org/r/20201026161549.3709175-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28RDMA: Add rdma_connect_locked()Jason Gunthorpe1-12/+2
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the handler triggers a completion and another thread does rdma_connect() or the handler directly calls rdma_connect(). In all cases rdma_connect() needs to hold the handler_mutex, but when handler's are invoked this is already held by the core code. This causes ULPs using the 2nd method to deadlock. Provide a rdma_connect_locked() and have all ULPs call it from their handlers. Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.com Reported-and-tested-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Fixes: 2a7cec538169 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Remove AH from uverbs_cmd_maskJason Gunthorpe1-0/+2
Drivers that need a uverbs AH should instead set the create_user_ah() op similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never implemented so are just deleted. Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA/core Remove uverbs_ex_cmd_maskJason Gunthorpe1-1/+0
No driver sets it, and the core code sets a maximum mask, simply remove it. Disabled operations are now handled either by having a NULL ops pointer, or by having the common driver callbacks check for unsupported extended attributes. Link: https://lore.kernel.org/r/9-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Check attr_mask during modify_qpJason Gunthorpe1-0/+2
Each driver should check that it can support the provided attr_mask during modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block modify_qp_ex because the driver didn't check RATE_LIMIT. Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16RDMA: Explicitly pass in the dma_device to ib_register_deviceJason Gunthorpe1-1/+2
The code in setup_dma_device has become rather convoluted, move all of this to the drivers. Drives now pass in a DMA capable struct device which will be used to setup DMA, or drivers must fully configure the ibdev for DMA and pass in NULL. Other than setting the masks in rvt all drivers were doing this already anyhow. mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for DMA based on their hardweare limits in: __mthca_init_one() dma_set_max_seg_size (1G) __mlx4_init_one() dma_set_max_seg_size (1G) mlx5_pci_init() set_dma_caps() dma_set_max_seg_size (2G) Other non software drivers (except usnic) were extended to UINT_MAX [1, 2] instead of 2G as was before. [1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/ [2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/ Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/uverbs: Expose the new GID query API to user spaceAvihai Horon1-3/+3
Expose the query GID table and entry API to user space by adding two new methods and method handlers to the device object. This API provides a faster way to query a GID table using single call and will be used in libibverbs to improve current approach that requires multiple calls to open, close and read multiple sysfs files for a single GID table entry. Link: https://lore.kernel.org/r/20200923165015.2491894-5-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/core: Introduce new GID table query APIAvihai Horon1-0/+3
Introduce rdma_query_gid_table which enables querying all the GID tables of a given device and copying the attributes of all valid GID entries to a provided buffer. This API provides a faster way to query a GID table using single call and will be used in libibverbs to improve current approach that requires multiple calls to open, close and read multiple sysfs files for a single GID table entry. Link: https://lore.kernel.org/r/20200923165015.2491894-4-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/core: Modify enum ib_gid_type and enum rdma_network_typeAvihai Horon1-7/+10
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so enum ib_gid_type will match the gid types of the new query GID table API which will be introduced in the following patches. This change in enum ib_gid_type requires to change also enum rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1 values. Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01IB/core: Enable ODP sync without faultingYishai Hadas1-1/+1
Enable ODP sync without faulting, this improves performance by reducing the number of page faults in the system. The gain from this option is that the device page table can be aligned with the presented pages in the CPU page table without causing page faults. As of that, the overhead on data path from hardware point of view to trigger a fault which end-up by calling the driver to bring the pages will be dropped. Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01IB/core: Improve ODP to use hmm_range_fault()Yishai Hadas1-13/+8
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve performance in a few aspects: This includes: - Dropping the need to allocate and free memory to hold its output - No need any more to use put_page() to unpin the pages - The logic to detect contiguous pages is done based on the returned order, no need to run per page and evaluate. In addition, moving to use hmm_range_fault() enables to reduce page faults in the system with it's snapshot mode, this will be introduced in next patches from this series. As part of this, cleanup some flows and use the required data structures to work with hmm_range_fault(). Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-30RDMA/core: Remove ucontext->closingJason Gunthorpe1-6/+0
Nothing reads this any more, and the reason for its existence has passed due to the deferred fput() scheme. Fixes: 8ea1f989aa07 ("drivers/IB,usnic: reduce scope of mmap_sem") Link: https://lore.kernel.org/r/0-v1-df64ff042436+42-uctx_closing_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22RDMA/restrack: Improve readability in task name managementLeon Romanovsky2-45/+15
Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd(). This uniformly makes all restracks add'd using rdma_restrack_add(). Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22RDMA/restrack: Simplify restrack tracking in kernel flowsLeon Romanovsky1-1/+0
Have a single rdma_restrack_add() that adds an entry, there is no reason to split the user/kernel here, the rdma_restrack_set_task() is responsible for this difference. This patch prepares the code to the future requirement of making restrack is mandatory for managing ib objects. Link: https://lore.kernel.org/r/20200922091106.2152715-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22RDMA/restrack: Count references to the verbs objectsLeon Romanovsky1-7/+0
Refactor the restrack code to make sure the kref inside the restrack entry properly kref's the object in which it is embedded. This slight change is needed for future conversions of MR and QP which are refcounted before the release and kfree. The ideal flow from ib_core perspective as follows: * Allocate ib_* structure with rdma_zalloc_*. * Set everything that is known to ib_core to that newly created object. * Initialize kref with restrack help * Call to driver specific allocation functions. * Insert into restrack DB .... * Return and release restrack with restrack_put. Largely this means a rdma_restrack_new() should be called near allocating the containing structure. Link: https://lore.kernel.org/r/20200922091106.2152715-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18Merge branch 'mlx5_active_speed' into rdma.git for-nextJason Gunthorpe1-3/+4
Leon Romanovsky says: ==================== IBTA declares speed as 16 bits, but kernel stores it in u8. This series fixes in-kernel declaration while keeping external interface intact. ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies. * branch 'mlx5_active_speed': RDMA: Fix link active_speed size RDMA/mlx5: Delete duplicated mlx5_ptys_width enum net/mlx5: Refactor query port speed functions
2020-09-18RDMA: Fix link active_speed sizeAharon Landau1-3/+4
According to the IB spec active_speed size should be u16 and not u8 as before. Changing it to allow further extensions in offered speeds. Link: https://lore.kernel.org/r/20200917090223.1018224-4-leon@kernel.org Signed-off-by: Aharon Landau <aharonl@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17RDMA: Convert RWQ table logic to ib_core allocation schemeLeon Romanovsky1-5/+4
Move struct ib_rwq_ind_table allocation to ib_core. Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17RDMA: Clean MW allocation and free flowsLeon Romanovsky1-2/+2
Move allocation and destruction of memory windows under ib_core responsibility and clean drivers to ensure that no updates to MW ib_core structures are done in driver layer. Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11RDMA/core: Added missing WR and WC opcodesBob Pearson1-7/+9
Add work completion opcodes to a new ib_uverbs_wc_opcode enum in ib_user_verbs.h. This plays the same role as ib_uverbs_wr_opcode documenting the opcodes in the user space API. Assigned the IB_WC_XXX opcodes in ib_verbs.h to the IB_UVERBS_WC_XXX where they are defined. This follows the same pattern as the IB_WR_XXX opcodes. This fixes an incorrect value for LSO that had crept in but is not currently being used. Added a missing IB_WR_BIND_MW opcode in ib_verbs.h. Link: https://lore.kernel.org/r/20200903224039.437391-2-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11RDMA/mlx4: Use ib_umem_num_dma_blocks()Jason Gunthorpe1-2/+0
For the calls linked to mlx4_ib_umem_calc_optimal_mtt_size() use ib_umem_num_dma_blocks() inside the function, it is just some weird static default. All other places are just using it with PAGE_SIZE, switch to ib_umem_num_dma_blocks(). As this is the last call site, remove ib_umem_num_count(). Link: https://lore.kernel.org/r/15-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-11RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()Jason Gunthorpe1-3/+12
ib_umem_num_pages() should only be used by things working with the SGL in CPU pages directly. Drivers building DMA lists should use the new ib_num_dma_blocks() which returns the number of blocks rdma_umem_for_each_block() will return. To make this general for DMA drivers requires a different implementation. Computing DMA block count based on umem->address only works if the requested page size is < PAGE_SIZE and/or the IOVA == umem->address. Instead the number of DMA pages should be computed in the IOVA address space, not umem->address. Thus the IOVA has to be stored inside the umem so it can be used for these calculations. For now set it to umem->address by default and fix it up if ib_umem_find_best_pgsz() was called. This allows drivers to be converted to ib_umem_num_dma_blocks() safely. Link: https://lore.kernel.org/r/6-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/umem: Add rdma_umem_for_each_dma_block()Jason Gunthorpe1-0/+20
This helper does the same as rdma_for_each_block(), except it works on a umem. This simplifies most of the call sites. Link: https://lore.kernel.org/r/4-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/umem: Use simpler logic for ib_umem_find_best_pgsz()Jason Gunthorpe1-24/+0
The calculation in rdma_find_pg_bit() is fairly complicated, and the function is never called anywhere else. Inline a simpler version into ib_umem_find_best_pgsz() Link: https://lore.kernel.org/r/3-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Make counters destroy symmetricalLeon Romanovsky1-1/+1
Change counters to return failure like any other verbs destroy, however this flow shouldn't return error at all. Link: https://lore.kernel.org/r/20200907120921.476363-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to return error for destroy WQLeon Romanovsky1-2/+2
Make this interface symmetrical to other destroy paths. Fixes: a49b1dc7ae44 ("RDMA: Convert destroy_wq to be void") Link: https://lore.kernel.org/r/20200907120921.476363-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Change XRCD destroy return valueLeon Romanovsky1-1/+1
Update XRCD destroy flow to allow command failure. Fixes: 28ad5f65c314 ("RDMA: Move XRCD to be under ib_core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Allow fail of destroy CQLeon Romanovsky1-2/+4
Like any other verbs objects, CQ shouldn't fail during destroy, but mlx5_ib didn't follow this contract with mixed IB verbs objects with DEVX. Such mix causes to the situation where FW and kernel are fully interdependent on the reference counting of each side. Kernel verbs and drivers that don't have DEVX flows shouldn't fail. Fixes: e39afe3d6dbd ("RDMA: Convert CQ allocations to be under core responsibility") Link: https://lore.kernel.org/r/20200907120921.476363-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/core: Delete function indirection for alloc/free kernel CQLeon Romanovsky1-56/+6
The ib_alloc_cq*() and ib_free_cq*() are solely kernel verbs to manage CQs and doesn't need extra indirection just to call same functions with constant parameter NULL as udata. Link: https://lore.kernel.org/r/20200907120921.476363-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on SRQ destroyLeon Romanovsky1-3/+5
In similar way to other IB objects, restore the ability to return error on SRQ destroy. Strictly speaking, this change is not necessary, and provided here to ensure a symmetrical interface like other destroy functions. Fixes: 68e326dea1db ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on AH destroyLeon Romanovsky1-3/+5
Like any other IB verbs objects, AH are refcounted by ib_core. The release of those objects are controlled by ib_core with promise that AH destroy can't fail. Being SW object for now, this change makes dealloc_ah() to behave like any other destroy IB flows. Fixes: d345691471b4 ("RDMA: Handle AH allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA: Restore ability to fail on PD deallocateLeon Romanovsky1-8/+5
The IB verbs objects are counted by the kernel and ib_core ensures that deallocate PD will success so it will be called once all other objects that depends on PD will be released. This is achieved by managing various reference counters on such objects. The mlx5 driver didn't follow this standard flow when allowed DEVX objects that are not managed by ib_core to be interleaved with the ones under ib_core responsibility. In such interleaved scenarios deallocate command can fail and ib_core will leave uobject in internal DB and attempt to clean it later to free resources anyway. This change partially restores returned value from dealloc_pd() for all drivers, but keeping in mind that non-DEVX devices and kernel verbs paths shouldn't fail. Fixes: 21a428a019c9 ("RDMA: Handle PD allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/core: Change how failing destroy is handled during uobj abortJason Gunthorpe1-5/+0
Currently it triggers a WARN_ON and then goes ahead and destroys the uobject anyhow, leaking any driver memory. The only place that leaks driver memory should be during FD close() in uverbs_destroy_ufile_hw(). Drivers are only allowed to fail destroy uobjects if they guarantee destroy will eventually succeed. uverbs_destroy_ufile_hw() provides the loop to give the driver that chance. Link: https://lore.kernel.org/r/20200902081708.746631-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/umem: Fix signature of stub ib_umem_find_best_pgsz()Jason Gunthorpe1-4/+5
The original function returns unsigned long and 0 on failure. Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR") Link: https://lore.kernel.org/r/0-v1-982a13cc5c6d+501ae-fix_best_pgsz_stub_jgg@nvidia.com Reviewed-by: Gal Pressman <galpress@amazon.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/hns: Get udp sport num dynamically instead of using a fixed valueWeihang Li1-0/+1
The UDP source port number in RoCE v2 is used to create entropy for network routers (ECMP), load balancers and 802.3ad link aggregation switching that are not aware of RoCE IB headers. Considering that the IB core has achieved a new interface to get a hashed value of it, the fixed value of it in QPC and UD WQE in hns driver could be fixed and the port number is to be set dynamically now. For QPC of RC, the value could be hashed from flow_lable if the user pass it in or from remote qpn and local qpn. For WQE of UD, it is set according to fl or as a random value. Link: https://lore.kernel.org/r/1598002289-8611-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/cma: Add missing locking to rdma_accept()Jason Gunthorpe1-0/+5
In almost all cases rdma_accept() is called under the handler_mutex by ULPs from their handler callbacks. The one exception was ucma which did not get the handler_mutex. To improve the understand-ability of the locking scheme obtain the mutex for ucma as well. This improves how ucma works by allowing it to directly use handler_mutex for some of its internal locking against the handler callbacks intead of the global file->mut lock. There does not seem to be a serious bug here, other than a DISCONNECT event can be delivered concurrently with accept succeeding. Link: https://lore.kernel.org/r/20200818120526.702120-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA/cm: Remove unused cm_classJason Gunthorpe1-3/+0
Previous commits removed all references to the /sys/class/infiniband_cm/ directory represented by the cm_class symbol. Remove the directory and cm_class. Fixes: a1a8e4a85cf7 ("rdma: Delete the ib_ucm module") Link: https://lore.kernel.org/r/0-v1-90096a98c476+205-remove_cm_leftovers_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA: Remove constant domain argument from flow creation callLeon Romanovsky1-12/+1
The "domain" argument is constant and modern device (mlx5) doesn't support anything except IB_FLOW_DOMAIN_USER, so delete this extra parameter and simplify code. Link: https://lore.kernel.org/r/20200730081235.1581127-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds31-986/+103
Pull rdma updates from Jason Gunthorpe: "A quiet cycle after the larger 5.8 effort. Substantially cleanup and driver work with a few smaller features this time. - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re - Removal of dead or redundant code across the drivers - RAW resource tracker dumps to include a device specific data blob for device objects to aide device debugging - Further advance the IOCTL interface, remove the ability to turn it off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands - Remove stubs related to devices with no pkey table - A shared CQ scheme to allow multiple ULPs to share the CQ rings of a device to give higher performance - Several more static checker, syzkaller and rare crashers fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits) RDMA/mlx5: Fix flow destination setting for RDMA TX flow table RDMA/rxe: Remove pkey table RDMA/umem: Add a schedule point in ib_umem_get() RDMA/hns: Fix the unneeded process when getting a general type of CQE error RDMA/hns: Fix error during modify qp RTS2RTS RDMA/hns: Delete unnecessary memset when allocating VF resource RDMA/hns: Remove redundant parameters in set_rc_wqe() RDMA/hns: Remove support for HIP08_A RDMA/hns: Refactor hns_roce_v2_set_hem() RDMA/hns: Remove redundant hardware opcode definitions RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP RDMA/include: Replace license text with SPDX tags RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting RDMA/cma: Execute rdma_cm destruction from a handler properly RDMA/cma: Remove unneeded locking for req paths RDMA/cma: Using the standard locking pattern when delivering the removal event RDMA/cma: Simplify DEVICE_REMOVAL for internal_id RDMA/efa: Add EFA 0xefa1 PCI ID RDMA/efa: User/kernel compatibility handshake mechanism ...
2020-07-29IB/rdmavt: Fix RQ counting issues causing use of an invalid RWQEMike Marciniszyn1-0/+19
The lookaside count is improperly initialized to the size of the Receive Queue with the additional +1. In the traces below, the RQ size is 384, so the count was set to 385. The lookaside count is then rarely refreshed. Note the high and incorrect count in the trace below: rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9008 wr_id 55c7206d75a0 qpn c qpt 2 pid 3018 num_sge 1 head 1 tail 0, count 385 rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1 The head,tail indicate there is only one RWQE posted although the count says 385 and we correctly return the element 0. The next call to rvt_get_rwqe with the decremented count: rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9058 wr_id 0 qpn c qpt 2 pid 3018 num_sge 0 head 1 tail 1, count 384 rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1 Note that the RQ is empty (head == tail) yet we return the RWQE at tail 1, which is not valid because of the bogus high count. Best case, the RWQE has never been posted and the rc logic sees an RWQE that is too small (all zeros) and puts the QP into an error state. In the worst case, a server slow at posting receive buffers might fool rvt_get_rwqe() into fetching an old RWQE and corrupt memory. Fix by deleting the faulty initialization code and creating an inline to fetch the posted count and convert all callers to use new inline. Fixes: f592ae3c999f ("IB/rdmavt: Fracture single lock used for posting and processing RWQEs") Link: https://lore.kernel.org/r/20200728183848.22226.29132.stgit@awfm-01.aw.intel.com Reported-by: Zhaojuan Guo <zguo@redhat.com> Cc: <stable@vger.kernel.org> # 5.4.x Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/include: Replace license text with SPDX tagsLeon Romanovsky31-946/+59
The header files in RDMA subsystem are dual licensed and can be described by simple SPDX tag, so replace all of them at once together with making them use the same coding style for header guard defines. Link: https://lore.kernel.org/r/20200719072521.135260-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/core: Align abort/commit object scheme for write() and ioctl() pathsLeon Romanovsky2-0/+15
Create the same logic flow for the write() interface as we have for the ioctl() path by making sure that the object is committed or aborted automatically after HW object creation. Link: https://lore.kernel.org/r/20200719052223.75245-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA: Move XRCD to be under ib_core responsibilityLeon Romanovsky1-3/+3
Update the code to allocate and free ib_xrcd structure in the ib_core instead of inside drivers. Link: https://lore.kernel.org/r/20200630101855.368895-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Create and destroy counters in the ib_coreLeon Romanovsky1-3/+4
Move allocation and destruction of counters under ib_core responsibility Link: https://lore.kernel.org/r/20200630101855.368895-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06IB/uverbs: Expose UAPI to query ucontextYishai Hadas1-0/+4
Expose UAPI to query ucontext, this will let user space application that didn't allocate the ucontext but has access to by owning the matching command FD to retrieve the ucontext information. Link: https://lore.kernel.org/r/20200630093916.332097-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Optimize XRC target lookupMaor Gottlieb1-3/+2
Replace the mutex with read write semaphore and use xarray instead of linked list for XRC target QPs. This will give faster XRC target lookup. In addition, when QP is closed, don't insert it back to the xarray if the destroy command failed. Link: https://lore.kernel.org/r/20200706122716.647338-4-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA/core: Clean ib_alloc_xrcd() and reuse it to allocate XRC domainMaor Gottlieb1-15/+3
ib_alloc_xrcd() already does the required initialization, so move the uverbs to call it and save code duplication, while cleaning the function argument lists of that function. Link: https://lore.kernel.org/r/20200706122716.647338-3-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-06RDMA: Remove the udata parameter from alloc_mr callbackGal Pressman1-1/+1
Allocating an MR flow can only be initiated by kernel users, and not from userspace so a udata parameter is redundant. Link: https://lore.kernel.org/r/20200706120343.10816-4-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>