2022-10-28RDMA/qedr: clean up work queue on failure in qedr_alloc_resources()Dan Carpenter1-1/+8
Add a check for if create_singlethread_workqueue() fails and also destroy the work queue on failure paths. Fixes: e411e0587e0d ("RDMA/qedr: Add iWARP connection management functions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Y1gBkDucQhhWj5YM@kili Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28RDMA/core: Fix null-ptr-deref in ib_core_cleanup()Chen Zhongjin2-2/+10
KASAN reported a null-ptr-deref error: KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] CPU: 1 PID: 379 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:destroy_workqueue+0x2f/0x740 RSP: 0018:ffff888016137df8 EFLAGS: 00000202 ... Call Trace: ib_core_cleanup+0xa/0xa1 [ib_core] __do_sys_delete_module.constprop.0+0x34f/0x5b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fa1a0d221b7 ... It is because the fail of roce_gid_mgmt_init() is ignored: ib_core_init() roce_gid_mgmt_init() gid_cache_wq = alloc_ordered_workqueue # fail ... ib_core_cleanup() roce_gid_mgmt_cleanup() destroy_workqueue(gid_cache_wq) # destroy an unallocated wq Fix this by catching the fail of roce_gid_mgmt_init() in ib_core_init(). Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Link: https://lore.kernel.org/r/20221025024146.109137-1-chenzhongjin@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-25RDMA/rxe: Fix mr leak in RESPST_ERR_RNRLi Zhijian1-1/+3
rxe_recheck_mr() will increase mr's ref_cnt, so we should call rxe_put(mr) to drop mr's ref_cnt in RESPST_ERR_RNR to avoid below warning: WARNING: CPU: 0 PID: 4156 at drivers/infiniband/sw/rxe/rxe_pool.c:259 __rxe_cleanup+0x1df/0x240 [rdma_rxe] ... Call Trace: rxe_dereg_mr+0x4c/0x60 [rdma_rxe] ib_dereg_mr_user+0xa8/0x200 [ib_core] ib_mr_pool_destroy+0x77/0xb0 [ib_core] nvme_rdma_destroy_queue_ib+0x89/0x240 [nvme_rdma] nvme_rdma_free_queue+0x40/0x50 [nvme_rdma] nvme_rdma_teardown_io_queues.part.0+0xc3/0x120 [nvme_rdma] nvme_rdma_error_recovery_work+0x4d/0xf0 [nvme_rdma] process_one_work+0x582/0xa40 ? pwq_dec_nr_in_flight+0x100/0x100 ? rwlock_bug.part.0+0x60/0x60 worker_thread+0x2a9/0x700 ? process_one_work+0xa40/0xa40 kthread+0x168/0x1a0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Link: https://lore.kernel.org/r/20221024052049.20577-1-lizhijian@fujitsu.com Fixes: 8a1a0be894da ("RDMA/rxe: Replace mr by rkey in responder resources") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/hns: Fix NULL pointer problem in free_mr_init()Yixing Liu1-0/+4
Lock grab occurs in a concurrent scenario, resulting in stepping on a NULL pointer. It should be init mutex_init() first before use the lock. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: __mutex_lock.constprop.0+0xd0/0x5c0 __mutex_lock_slowpath+0x1c/0x2c mutex_lock+0x44/0x50 free_mr_send_cmd_to_hw+0x7c/0x1c0 [hns_roce_hw_v2] hns_roce_v2_dereg_mr+0x30/0x40 [hns_roce_hw_v2] hns_roce_dereg_mr+0x4c/0x130 [hns_roce_hw_v2] ib_dereg_mr_user+0x54/0x124 uverbs_free_mr+0x24/0x30 destroy_hw_idr_uobject+0x38/0x74 uverbs_destroy_uobject+0x48/0x1c4 uobj_destroy+0x74/0xcc ib_uverbs_cmd_verbs+0x368/0xbb0 ib_uverbs_ioctl+0xec/0x1a4 __arm64_sys_ioctl+0xb4/0x100 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x58/0x190 do_el0_svc+0x30/0x90 el0_svc+0x2c/0xb4 el0t_64_sync_handler+0x1a4/0x1b0 el0t_64_sync+0x19c/0x1a0 Fixes: 70f92521584f ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT") Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Link: https://lore.kernel.org/r/20221024083814.1089722-3-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/hns: Disable local invalidate operationYangyang Li2-13/+0
When function reset and local invalidate are mixed, HNS RoCEE may hang. Before introducing the cause of the problem, two hardware internal concepts need to be introduced: 1. Execution queue: The queue of hardware execution instructions, function reset and local invalidate are queued for execution in this queue. 2.Local queue: A queue that stores local operation instructions. The instructions in the local queue will be sent to the execution queue for execution. The instructions in the local queue will not be removed until the execution is completed. The reason for the problem is as follows: 1. There is a function reset instruction in the execution queue, which is currently being executed. A necessary condition for the successful execution of function reset is: the hardware pipeline needs to empty the instructions that were not completed before; 2. A local invalidate instruction at the head of the local queue is sent to the execution queue. Now there are two instructions in the execution queue, the first is the function reset instruction, and the second is the local invalidate instruction, which will be executed in se quence; 3. The user has issued many local invalidate operations, causing the local queue to be filled up. 4. The user still has a new local operation command and is queuing to enter the local queue. But the local queue is full and cannot receive new instructions, this instruction is temporarily stored at the hardware pipeline. 5. The function reset has been waiting for the instruction before the hardware pipeline stage is drained. The hardware pipeline stage also caches a local invalidate instruction, so the function reset cannot be completed, and the instructions after it cannot be executed. These factors together cause the execution logic deadlock of the hardware, and the consequence is that RoCEE will not have any response. Considering that the local operation command may potentially cause RoCEE to hang, this feature is no longer supported. Fixes: e93df0108579 ("RDMA/hns: Support local invalidate for hip08 in kernel space") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Link: https://lore.kernel.org/r/20221024083814.1089722-2-xuhaoyue1@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24RDMA/efa: Add EFA 0xefa2 PCI IDMichael Margolin1-1/+3
Add support for 0xefa2 devices. Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Link: https://lore.kernel.org/r/20221020151949.1768-1-mrgolin@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-19IB/hfi1: Correctly move list in sc_disable()Dean Luick1-2/+1
Commit 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") incorrectly tries to move a list from one list head to another. The result is a kernel crash. The crash is triggered when a link goes down and there are waiters for a send to complete. The following signature is seen: BUG: kernel NULL pointer dereference, address: 0000000000000030 [...] Call Trace: sc_disable+0x1ba/0x240 [hfi1] pio_freeze+0x3d/0x60 [hfi1] handle_freeze+0x27/0x1b0 [hfi1] process_one_work+0x1b0/0x380 ? process_one_work+0x380/0x380 worker_thread+0x30/0x360 ? process_one_work+0x380/0x380 kthread+0xd7/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 The fix is to use the correct call to move the list. Fixes: 13bac861952a ("IB/hfi1: Fix abba locking issue with sc_disable()") Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Link: https://lore.kernel.org/r/166610327042.674422.6146908799669288976.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-19RDMA/cma: Use output interface for net_dev checkHåkon Bugge1-1/+1
Commit 27cfde795a96 ("RDMA/cma: Fix arguments order in net device validation") swapped the src and dst addresses in the call to validate_net_dev(). As a consequence, the test in validate_ipv4_net_dev() to see if the net_dev is the right one, is incorrect for port 1 <-> 2 communication when the ports are on the same sub-net. This is fixed by denoting the flowi4_oif as the device instead of the incoming one. The bug has not been observed using IPv6 addresses. Fixes: 27cfde795a96 ("RDMA/cma: Fix arguments order in net device validation") Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Link: https://lore.kernel.org/r/20221012141542.16925-1-haakon.bugge@oracle.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld4-5/+5
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-11treewide: use prandom_u32_max() when possible, part 1Jason A. Donenfeld4-8/+6
Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Pull rdma updates from Jason Gunthorpe:

"Not a big list of changes this cycle, mostly small things. The new MANA rdma driver should come next cycle along with a bunch of work on rxe. Summary:

- Small bug fixes in mlx5, efa, rxe, hns, irdma, erdma, siw
- rts tracing improvements
- Code improvements: strlscpy conversion, unused parameter, spelling mistakes, unused variables, flex arrays
- restrack device details report for hns
- Simplify struct device initialization in SRP
- Eliminate the never-used service_mask support in IB CM
- Make rxe not print to the console for some kinds of network packets
- Asymetric paths and router support in the CM through netlink messages
- DMABUF importer support for mlx5devx umem's"
Merge tag 'v6.0' into rdma.git for-next
Trvial merge conflicts against rdma.git for-rc resolved matching linux-next: drivers/infiniband/hw/hns/hns_roce_hw_v2.c drivers/infiniband/hw/hns/hns_roce_main.c https://lore.kernel.org/r/20220929124005.105149-1-broonie@kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-29RDMA/rxe: Remove error/warning messages from packet receiver pathDaisuke Matsuda3-91/+28
Incoming packets to rxe are passed from UDP layer using an encapsulation socket. If there are any clients reachable to a node, they can invoke the encapsulation handler arbitrarily by sending malicious or irrelevant packets. This can potentially cause a message overflow and a subsequent slowdown on the node. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20220929080023.304242-1-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-09-29RDMA/usnic: fix set-but-not-unused variable 'flags' warningZeng Heng1-3/+0
Remove unused local variable 'flag' without any logic changes. Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Link: https://lore.kernel.org/r/20220929031200.4060891-1-zengheng4@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Saeed Mahameed says:

==================
updates from mlx5-next 2022-09-24

Updates form mlx5-next including[1]:

1) HW definitions and support for NPPS clock settings.
2) various cleanups
3) Enable hash mode by default for all NICs
4) page tracker and advanced virtualization HW definitions for vfio

[1] https://lore.kernel.org/netdev/20220907233636.388475-1-saeed@kernel.org/

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Remove from FPGA IFC file not-needed definitions
  net/mlx5: Remove unused structs
  net/mlx5: Remove unused functions
  net/mlx5: detect and enable bypass port select flow table
  net/mlx5: Lag, enable hash mode by default for all NICs
  net/mlx5: Lag, set active ports if support bypass port select flow table
  RDMA/mlx5: Don't set tx affinity when lag is in hash mode
  net/mlx5: add IFC bits for bypassing port select flow table
  net/mlx5: Add support for NPPS with real time mode
  net/mlx5: Expose NPPS related registers
  net/mlx5: Query ADV_VIRTUALIZATION capabilities
  net/mlx5: Introduce ifc bits for page tracker
  RDMA/mlx5: Move function mlx5_core_query_ib_ppcnt() to mlx5_ib
==================
2022-09-27RDMA/mlx5: Don't set tx affinity when lag is in hash modeLiu, Changcheng1-0/+12
In hash mode, without setting tx affinity explicitly, the port select flow table decides which port is used for the traffic. If port_select_flow_table_bypass capability is supported and tx affinity is set explicitly for QP/TIS, they will be added into the explicit affinity table in FW to check which port is used for the traffic. 1. The overloaded explicit affinity table may affect performance. To avoid this, do not set tx affinity explicitly by default. 2. The packets of the same flow need to be transmitted on the same port. Because the packets of the same flow use different QPs in slow & fast path, it shouldn't set tx affinity explicitly for these QPs. Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-09-27IB/hfi1: Use skb_put_data() instead of skb_put/memcpy pairShang XiaoJing1-4/+1
Use skb_put_data() instead of skb_put() and memcpy(), which is shorter and clear. Drop the tmp variable that is not needed any more. Link: https://lore.kernel.org/r/20220927022919.16902-1-shangxiaojing@huawei.com Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Unified Log Printing StyleGuofeng Yue6-55/+55
The first letter of the log information is changed to lowercase to keep the same style. Link: https://lore.kernel.org/r/20220922123315.3732205-13-xuhaoyue1@hisilicon.com Signed-off-by: Guofeng Yue <yueguofeng@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Replacing magic number with macros in apply_func_caps()Yixing Liu1-2/+4
Replacing magic number with macros in function apply_func_caps(). Link: https://lore.kernel.org/r/20220922123315.3732205-12-xuhaoyue1@hisilicon.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Repacing 'dseg_len' by macros in fill_ext_sge_inl_data()Luoyouming1-4/+3
The sge size is known to be constant, so it's unnecessary to use sizeof to calculate. Link: https://lore.kernel.org/r/20220922123315.3732205-11-xuhaoyue1@hisilicon.com Signed-off-by: Luoyouming <luoyouming@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Remove redundant 'max_srq_desc_sz' in capsYangyang Li3-5/+2
The max_srq_desc_sz is defined in the code, but never used, so delete this redundant variable. Link: https://lore.kernel.org/r/20220922123315.3732205-10-xuhaoyue1@hisilicon.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Remove redundant 'num_mtt_segs' and 'max_extend_sg'Yangyang Li3-8/+3
The num_mtt_segs and max_extend_sg used to be used for HIP06, remove them since the HIP06 code has been removed. Link: https://lore.kernel.org/r/20220922123315.3732205-9-xuhaoyue1@hisilicon.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Remove redundant 'phy_addr' in hns_roce_hem_list_find_mtt()Chengchang Tang3-9/+4
This parameter has never been used. Remove it to simplify the function. Link: https://lore.kernel.org/r/20220922123315.3732205-8-xuhaoyue1@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Remove redundant 'use_lowmem' argument from hns_roce_init_hem_table()Yunsheng Lin4-22/+14
As hns_roce_init_hem_table() is always called with use_lowmem being '1', and table->lowmem is set according to that argument, so remove table->lowmem too. Also, as the table->lowmem is used to indicate a dma buffer is allocated with GFP_HIGHUSER or GFP_KERNEL, and calling dma_alloc_coherent() with GFP_KERNEL seems like a common pattern. Link: https://lore.kernel.org/r/20220922123315.3732205-7-xuhaoyue1@hisilicon.com Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Remove redundant 'bt_level' for hem_list_alloc_item()Yunsheng Lin1-4/+4
The 'bt_level' parameter is not used in hem_list_alloc_item(), so remove it. Link: https://lore.kernel.org/r/20220922123315.3732205-6-xuhaoyue1@hisilicon.com Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Remove redundant 'attr_mask' in modify_qp_init_to_init()Yixing Liu1-6/+3
The attr_mask variable is not used in the function, so remove it. Link: https://lore.kernel.org/r/20220922123315.3732205-5-xuhaoyue1@hisilicon.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Remove unnecessary brackets when getting pointGuofeng Yue2-3/+4
Delete () when using & to obtain an address. Link: https://lore.kernel.org/r/20220922123315.3732205-4-xuhaoyue1@hisilicon.com Signed-off-by: Guofeng Yue <yueguofeng@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Remove unnecessary braces for single statement blocksGuofeng Yue1-2/+1
Braces {} are not necessary for single statement blocks. Link: https://lore.kernel.org/r/20220922123315.3732205-3-xuhaoyue1@hisilicon.com Signed-off-by: Guofeng Yue <yueguofeng@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/hns: Cleanup for a spelling error of AsynchronousGuofeng Yue1-1/+1
Fixed a spelling error for Asynchronous. Link: https://lore.kernel.org/r/20220922123315.3732205-2-xuhaoyue1@hisilicon.com Signed-off-by: Guofeng Yue <yueguofeng@hisilicon.com> Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27IB/rdmavt: Add __init/__exit annotations to module init/exit funcsXiu Jianfeng1-2/+2
Add missing __init/__exit annotations to module init/exit funcs. Fixes: 0194621b2253 ("IB/rdmavt: Create module framework and handle driver registration") Link: https://lore.kernel.org/r/20220924091457.52446-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/rxe: Remove redundant num_sge fieldsBob Pearson1-2/+0
In include/uapi/rdma/rdma_user_rxe.h there are redundant copies of num_sge in the rxe_send_wr, rxe_recv_wqe, and rxe_dma_info. Only the ones in rxe_dma_info are actually used by the rxe kernel driver. The userspace would set these values, but the kernel never read them. This change has no affect on the current ABI and new or old versions of rdma-core operate correctly with new or old versions of the kernel rxe driver. Link: https://lore.kernel.org/r/20220913222716.18335-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/mlx5: Enable ATS support for MRs and umemsJason Gunthorpe3-17/+61
For mlx5 if ATS is enabled in the PCI config then the device will use ATS requests for only certain DMA operations. This has to be opted in by the SW side based on the mkey or umem settings. ATS slows down the PCI performance, so it should only be set in cases when it is needed. All of these cases revolve around optimizing PCI P2P transfers and avoiding bad cases where the bus just doesn't work. Link: https://lore.kernel.org/r/4-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/mlx5: Add support for dmabuf to devx umemJason Gunthorpe1-3/+21
This is modeled after the similar EFA enablement in commit 66f4817b5712 ("RDMA/efa: Add support for dmabuf memory regions"). Like EFA there is no support for revocation so we simply call the ib_umem_dmabuf_get_pinned() to obtain a umem instead of the normal ib_umem_get(). Everything else stays the same. Link: https://lore.kernel.org/r/3-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/core: Add UVERBS_ATTR_RAW_FDJason Gunthorpe1-0/+8
This uses the same passing protocol as UVERBS_ATTR_FD (eg len = 0 data_s64 = fd), except that the FD is not required to be a uverbs object and the core code does not covert the FD to an object handle automatically. Access to the int fd is provided by uverbs_get_raw_fd(). Link: https://lore.kernel.org/r/2-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/rxe: Fix resize_finish() in rxe_queue.cBob Pearson1-5/+7
Currently in resize_finish() in rxe_queue.c there is a loop which copies the entries in the original queue into a newly allocated queue. The termination logic for this loop is incorrect. The call to queue_next_index() updates cons but has no effect on whether the queue is empty. So if the queue starts out empty nothing is copied but if it is not then the loop will run forever. This patch changes the loop to compare the value of cons to the original producer index. Fixes: ae6e843fe08d0 ("RDMA/rxe: Add memory barriers to kernel queues") Link: https://lore.kernel.org/r/20220825221446.6512-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27RDMA/rxe: Set pd early in mr alloc routinesBob Pearson3-15/+14
Move setting of pd in mr objects ahead of any possible errors so that it will always be set in rxe_mr_cleanup() to avoid seg faults when rxe_put(mr_pd(mr)) is called. Fixes: cf40367961d8 ("RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()") Link: https://lore.kernel.org/r/20220805183153.32007-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-26RDMA/rxe: Add send_common_ack() helperLi Zhijian1-26/+17
Most code in send_ack() and send_atomic_ack() are duplicate, move them to a new helper send_common_ack(). In newer IBA spec, some opcodes require acknowledge with a zero-length read response, with this new helper, we can easily implement it later. Link: https://lore.kernel.org/r/1659335010-2-1-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-22RDMA/core: Clean up a variable name in ib_create_srq_user()Dan Carpenter1-1/+1
"&srq->pd->usecnt" and "&pd->usecnt" are different names for the same reference count. Use "&pd->usecnt" consistently for both the increment and decrement. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YyxFe3Pm0uzRuBkQ@kili Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22RDMA/srp: Support more than 255 rdma portsMikhael Goikhman2-7/+7
Currently ib_srp module does not support devices with more than 256 ports. Switch from u8 to u32 to fix the problem. Fixes: 1fb7f8973f51 ("RDMA: Support more than 255 rdma ports") Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Link: https://lore.kernel.org/r/7d80d8844f1abb3a54170b7259f0a02be38080a6.1663747327.git.leonro@nvidia.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22RDMA/rxe: Use members of generic struct in rxe_mrDaisuke Matsuda4-14/+8
rxe_mr and ib_mr have interchangeable members. Remove device specific members and use ones in the generic struct. Both 'iova' and 'length' are filled in ib_uverbs or ib_core layer after MR registration. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20220921080844.1616883-2-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22IB: Set IOVA/LENGTH on IB_MR in core/uverbs layersDaisuke Matsuda4-3/+6
Set 'iova' and 'length' on ib_mr in ib_uverbs and ib_core layers to let all drivers have the members filled. Also, this commit removes redundancy in the respective drivers. Previously, commit 04c0a5fcfcf65 ("IB/uverbs: Set IOVA on IB MR in uverbs layer") changed to set 'iova', but seems to have missed 'length' and the ib_core layer at that time. Fixes: 04c0a5fcfcf65 ("IB/uverbs: Set IOVA on IB MR in uverbs layer") Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://lore.kernel.org/r/20220921080844.1616883-1-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22RDMA/cm: Use DLID from inbound/outbound PathRecords as the datapath DLIDMark Zhang2-2/+25
In inter-subnet cases, when inbound/outbound PRs are available, outbound_PR.dlid is used as the requestor's datapath DLID and inbound_PR.dlid is used as the responder's DLID. The inbound_PR.dlid is passed to responder side with the "ConnectReq.Primary_Local_Port_LID" field. With this solution the PERMISSIVE_LID is no longer used in Primary Local LID field. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/b3f6cac685bce9dde37c610be82e2c19d9e51d9e.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22RDMA/cm: Use SLID in the work completion as the DLID in responder sideMark Zhang1-7/+7
The responder should always use WC's SLID as the dlid, to follow the IB SPEC section " COMMON RESPONSE ACTIONS": A responder always takes the following actions in constructing a response packet: - The SLID of the received packet is used as the DLID in the response packet. Fixes: ac3a949fb2ff ("IB/CM: Set appropriate slid and dlid when handling CM request") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/cd17c240231e059d2fc07c17dfe555d548b917eb.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22RDMA/cma: Multiple path records support with netlink channelMark Zhang4-86/+223
Support receiving inbound and outbound IB path records (along with GMP PathRecord) from user-space service through the RDMA netlink channel. The LIDs in these 3 PRs can be used in this way: 1. GMP PR: used as the standard local/remote LIDs; 2. DLID of outbound PR: Used as the "dlid" field for outbound traffic; 3. DLID of inbound PR: Used as the "dlid" field for outbound traffic in responder side. This is aimed to support adaptive routing. With current IB routing solution when a packet goes out it's assigned with a fixed DLID per target, meaning a fixed router will be used. The LIDs in inbound/outbound path records can be used to identify group of routers that allow communication with another subnet's entity. With them packets from an inter-subnet connection may travel through any router in the set to reach the target. As confirmed with Jason, when sending a netlink request, kernel uses LS_RESOLVE_PATH_USE_ALL so that the service knows kernel supports multiple PRs. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/2fa2b6c93c4c16c8915bac3cfc4f27be1d60519d.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22RDMA/core: Rename rdma_route.num_paths field to num_pri_alt_pathsMark Zhang2-14/+14
This fields means the total number of primary and alternative paths, i.e.,: 0 - No primary nor alternate path is available; 1 - Only primary path is available; 2 - Both primary and alternate path are available. Rename it to avoid confusion as with follow patches primary path will support multiple path records. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/cbe424de63a56207870d70c5edce7c68e45f429e.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-21RDMA/erdma: Support dynamic mtuCheng Xu5-1/+26
Hardware now support jumbo frame for RDMA. So we introduce a new CMDQ message to support mtu change notification. Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Link: https://lore.kernel.org/r/20220909093822.33868-5-chengyou@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20RDMA/siw: Fix QP destroy to wait for all references dropped.Bernard Metzler3-1/+5
Delay QP destroy completion until all siw references to QP are dropped. The calling RDMA core will free QP structure after successful return from siw_qp_destroy() call, so siw must not hold any remaining reference to the QP upon return. A use-after-free was encountered in xfstest generic/460, while testing NFSoRDMA. Here, after a TCP connection drop by peer, the triggered siw_cm_work_handler got delayed until after QP destroy call, referencing a QP which has already freed. Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Reported-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20220920082503.224189-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20RDMA/siw: Always consume all skbuf data in sk_data_ready() upcall.Bernard Metzler1-12/+15
For header and trailer/padding processing, siw did not consume new skb data until minimum amount present to fill current header or trailer structure, including potential payload padding. Not consuming any data during upcall may cause a receive stall, since tcp_read_sock() is not upcalling again if no new data arrive. A NFSoRDMA client got stuck at RDMA Write reception of unaligned payload, if the current skb did contain only the expected 3 padding bytes, but not the 4 bytes CRC trailer. Expecting 4 more bytes already arrived in another skb, and not consuming those 3 bytes in the current upcall left the Write incomplete, waiting for the CRC forever. Fixes: 8b6a361b8c48 ("rdma/siw: receive path") Reported-by: Olga Kornievskaia <kolga@netapp.com> Tested-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20220920081202.223629-1-bmt@zurich.ibm.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-20IB/hfi1: remove rc_only_opcode and uc_only_opcode declarationsGaosheng Cui1-3/+0
rc_only_opcode and uc_only_opcode have been removed since commit b374e060cc2a ("IB/hfi1: Consolidate pio control masks into single definition"), so remove them. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20220911092325.3216513-1-cuigaosheng1@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>