aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core/rdma_core.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-06-24RDMA: Correct duplicated words in commentsJiang Jian1-1/+1
There is a duplicated word 'is' and 'for' in a comment that needs to be dropped. Link: https://lore.kernel.org/r/20220622170853.3644-1-jiangjian@cdjrlc.com Link: https://lore.kernel.org/r/20220623103708.43104-1-jiangjian@cdjrlc.com Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26RDMA/core: Correct misspellings of two words in commentsYangyang Li1-2/+2
Correct the following spelling errors: 1. shold -> should 2. uncontext -> ucontext Link: https://lore.kernel.org/r/1616147749-49106-1-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07RDMA/uverbs: Allow drivers to create a new HW object during rereg_mrJason Gunthorpe1-0/+51
mlx5 has an ugly flow where it tries to allocate a new MR and replace the existing MR in the same memory during rereg. This is very complicated and buggy. Instead of trying to replace in-place inside the driver, provide support from uverbs to change the entire HW object assigned to a handle during rereg_mr. Since destroying a MR is allowed to fail (ie if a MW is pointing at it) and can't be detected in advance, the algorithm creates a completely new uobject to hold the new MR and swaps the IDR entries of the two objects. The old MR in the temporary IDR entry is destroyed, and if it fails rereg_mr succeeds and destruction is deferred to FD release. This complexity is why this cannot live in a driver safely. Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12RDMA/core: Make FD destroy callback voidLeon Romanovsky1-1/+2
All FD object destroy implementations return 0, so declare this callback void. Link: https://lore.kernel.org/r/20201104144556.3809085-3-leon@kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12RDMA/core: Postpone uobject cleanup on failure till FD closeLeon Romanovsky1-34/+17
Remove the ib_is_destroyable_retryable() concept. The idea here was to allow the drivers to forcibly clean the HW object even if they otherwise didn't want to (eg because of usecnt). This was an attempt to clean up in a world where drivers were not allowed to fail HW object destruction. Now that we are going back to allowing HW objects to fail destroy this doesn't make sense. Instead if a uobject's HW object can't be destroyed it is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to clean it. Multiple passes over the uobject list allow hidden dependencies to be resolved. If that fails the HW driver is broken, throw a WARN_ON and leak the HW object memory. All the other tricky failure paths (eg on creation error unwind) have already been updated to this new model. Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-30RDMA/core: Remove ucontext->closingJason Gunthorpe1-1/+0
Nothing reads this any more, and the reason for its existence has passed due to the deferred fput() scheme. Fixes: 8ea1f989aa07 ("drivers/IB,usnic: reduce scope of mmap_sem") Link: https://lore.kernel.org/r/0-v1-df64ff042436+42-uctx_closing_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-09RDMA/core: Change how failing destroy is handled during uobj abortJason Gunthorpe1-15/+15
Currently it triggers a WARN_ON and then goes ahead and destroys the uobject anyhow, leaking any driver memory. The only place that leaks driver memory should be during FD close() in uverbs_destroy_ufile_hw(). Drivers are only allowed to fail destroy uobjects if they guarantee destroy will eventually succeed. uverbs_destroy_ufile_hw() provides the loop to give the driver that chance. Link: https://lore.kernel.org/r/20200902081708.746631-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/core: Trigger a WARN_ON if the driver causes uobjects to become leakedJason Gunthorpe1-1/+2
Drivers that fail destroy can cause uverbs to leak uobjects. Drivers are required to always eventually destroy their ubojects, so trigger a WARN_ON to detect this driver bug. Link: https://lore.kernel.org/r/0-v1-b1e0ed400ba9+f7-warn_destroy_ufile_hw_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16RDMA/core: Fix race in rdma_alloc_commit_uobject()Leon Romanovsky1-3/+3
The FD should not be installed until all of the setup is completed as the fd_install() transfers ownership of the kref to the FD table. A thread can race a close() and trigger concurrent rdma_alloc_commit_uobject() and uverbs_uobject_fd_release() which, at least, triggers a safety WARN_ON: WARNING: CPU: 4 PID: 6913 at drivers/infiniband/core/rdma_core.c:768 uverbs_uobject_fd_release+0x202/0x230 Kernel panic - not syncing: panic_on_warn set ... CPU: 4 PID: 6913 Comm: syz-executor.3 Not tainted 5.7.0-rc2 #22 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [..] RIP: 0010:uverbs_uobject_fd_release+0x202/0x230 Code: fe 4c 89 e7 e8 af 23 fe ff e9 2a ff ff ff e8 c5 fa 61 fe be 03 00 00 00 4c 89 e7 e8 68 eb f5 fe e9 13 ff ff ff e8 ae fa 61 fe <0f> 0b eb ac e8 e5 aa 3c fe e8 50 2b 86 fe e9 6a fe ff ff e8 46 2b RSP: 0018:ffffc90008117d88 EFLAGS: 00010293 RAX: ffff88810e146580 RBX: 1ffff92001022fb1 RCX: ffffffff82d5b902 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88811951b040 RBP: ffff88811951b000 R08: ffffed10232a3609 R09: ffffed10232a3609 R10: ffff88811951b043 R11: 0000000000000001 R12: ffff888100a7c600 R13: ffff888100a7c650 R14: ffffc90008117da8 R15: ffffffff82d5b700 ? __uverbs_cleanup_ufile+0x270/0x270 ? uverbs_uobject_fd_release+0x202/0x230 ? uverbs_uobject_fd_release+0x202/0x230 ? __uverbs_cleanup_ufile+0x270/0x270 ? locks_remove_file+0x282/0x3d0 ? security_file_free+0xaa/0xd0 __fput+0x2be/0x770 task_work_run+0x10e/0x1b0 exit_to_usermode_loop+0x145/0x170 do_syscall_64+0x2d0/0x390 ? prepare_exit_to_usermode+0x17a/0x230 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x414da7 Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 f4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 36 fc ff ff 8b 44 24 RSP: 002b:00007fff39d379d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000414da7 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 00007fff39d37a3c R08: 0000000400000000 R09: 0000000400000000 R10: 00007fff39d37910 R11: 0000000000000293 R12: 0000000000000001 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003 Reorder so that fd_install() is the last thing done in rdma_alloc_commit_uobject(). Fixes: aba94548c9e4 ("IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit") Link: https://lore.kernel.org/r/20200716102059.1420681-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-06-18RDMA/core: Check that type_attrs is not NULL prior accessLeon Romanovsky1-15/+21
In disassociate flow, the type_attrs is set to be NULL, which is in an implicit way is checked in alloc_uobj() by "if (!attrs->context)". Change the logic to rely on that check, to be consistent with other alloc_uobj() places that will fix the following kernel splat. BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 2743 Comm: python3 Not tainted 5.7.0-rc6-for-upstream-perf-2020-05-23_19-04-38-5 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:alloc_begin_fd_uobject+0x18/0xf0 [ib_uverbs] Code: 89 43 48 eb 97 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 f5 41 54 55 48 89 fd 53 48 83 ec 08 48 8b 1f <48> 8b 43 18 48 8b 80 80 00 00 00 48 3d 20 10 33 a0 74 1c 48 3d 30 RSP: 0018:ffffc90001127b70 EFLAGS: 00010282 RAX: ffffffffa0339fe0 RBX: 0000000000000000 RCX: 8000000000000007 RDX: fffffffffffffffb RSI: ffffc90001127d28 RDI: ffff88843fe1f600 RBP: ffff88843fe1f600 R08: ffff888461eb06d8 R09: ffff888461eb06f8 R10: ffff888461eb0700 R11: 0000000000000000 R12: ffff88846a5f6450 R13: ffffc90001127d28 R14: ffff88845d7d6ea0 R15: ffffc90001127cb8 FS: 00007f469bff1540(0000) GS:ffff88846f980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 0000000450018003 CR4: 0000000000760ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? xa_store+0x28/0x40 rdma_alloc_begin_uobject+0x4f/0x90 [ib_uverbs] ib_uverbs_create_comp_channel+0x87/0xf0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs] ib_uverbs_cmd_verbs.isra.8+0x96d/0xae0 [ib_uverbs] ? get_page_from_freelist+0x3bb/0xf70 ? _copy_to_user+0x22/0x30 ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs] ? __wake_up_common_lock+0x87/0xc0 ib_uverbs_ioctl+0xbc/0x130 [ib_uverbs] ksys_ioctl+0x83/0xc0 ? ksys_write+0x55/0xd0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x48/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f469ac43267 Fixes: 849e149063bd ("RDMA/core: Do not allow alloc_commit to fail") Link: https://lore.kernel.org/r/20200617061826.2625359-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-5/+20
Pull rdma updates from Jason Gunthorpe: "A more active cycle than most of the recent past, with a few large, long discussed works this time. The RNBD block driver has been posted for nearly two years now, and flowing through RDMA due to it also introducing a new ULP. The removal of FMR has been a recurring discussion theme for a long time. And the usual smattering of features and bug fixes. Summary: - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa - Continuing driver cleanups in bnxt_re, hns - Big cleanup of mlx5 QP creation flows - More consistent use of src port and flow label when LAG is used and a mlx5 implementation - Additional set of cleanups for IB CM - 'RNBD' network block driver and target. This is a network block RDMA device specific to ionos's cloud environment. It brings strong multipath and resiliency capabilities. - Accelerated IPoIB for HFI1 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds - Support for exchanging the new IBTA defiend ECE data during RDMA CM exchanges - Removal of the very old and insecure FMR interface from all ULPs and drivers. FRWR should be preferred for at least a decade now" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits) RDMA/cm: Spurious WARNING triggered in cm_destroy_id() RDMA/mlx5: Return ECE DC support RDMA/mlx5: Don't rely on FW to set zeros in ECE response RDMA/mlx5: Return an error if copy_to_user fails IB/hfi1: Use free_netdev() in hfi1_netdev_free() RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr() RDMA/core: Move and rename trace_cm_id_create() IB/hfi1: Fix hfi1_netdev_rx_init() error handling RDMA: Remove 'max_map_per_fmr' RDMA: Remove 'max_fmr' RDMA/core: Remove FMR device ops RDMA/rdmavt: Remove FMR memory registration RDMA/mthca: Remove FMR support for memory registration RDMA/mlx4: Remove FMR support for memory registration RDMA/i40iw: Remove FMR leftovers RDMA/bnxt_re: Remove FMR leftovers RDMA/mlx5: Remove FMR leftovers RDMA/core: Remove FMR pool API RDMA/rds: Remove FMR support for memory registration RDMA/srp: Remove support for FMR memory registration ...
2020-05-27RDMA/core: Fix double destruction of uobjectJason Gunthorpe1-7/+13
Fix use after free when user user space request uobject concurrently for the same object, within the RCU grace period. In that case, remove_handle_idr_uobject() is called twice and we will have an extra put on the uobject which cause use after free. Fix it by leaving the uobject write locked after it was removed from the idr. Call to rdma_lookup_put_uobject with UVERBS_LOOKUP_DESTROY instead of UVERBS_LOOKUP_WRITE will do the work. refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 1381 at lib/refcount.c:28 refcount_warn_saturate+0xfe/0x1a0 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 1381 Comm: syz-executor.0 Not tainted 5.5.0-rc3 #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x94/0xce panic+0x234/0x56f __warn+0x1cc/0x1e1 report_bug+0x200/0x310 fixup_bug.part.11+0x32/0x80 do_error_trap+0xd3/0x100 do_invalid_op+0x31/0x40 invalid_op+0x1e/0x30 RIP: 0010:refcount_warn_saturate+0xfe/0x1a0 Code: 0f 0b eb 9b e8 23 f6 6d ff 80 3d 6c d4 19 03 00 75 8d e8 15 f6 6d ff 48 c7 c7 c0 02 55 bd c6 05 57 d4 19 03 01 e8 a2 58 49 ff <0f> 0b e9 6e ff ff ff e8 f6 f5 6d ff 80 3d 42 d4 19 03 00 0f 85 5c RSP: 0018:ffffc90002df7b98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88810f6a193c RCX: ffffffffba649009 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88811b0283cc RBP: 0000000000000003 R08: ffffed10236060e3 R09: ffffed10236060e3 R10: 0000000000000001 R11: ffffed10236060e2 R12: ffff88810f6a193c R13: ffffc90002df7d60 R14: 0000000000000000 R15: ffff888116ae6a08 uverbs_uobject_put+0xfd/0x140 __uobj_perform_destroy+0x3d/0x60 ib_uverbs_close_xrcd+0x148/0x170 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f759d122c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bfa8 RCX: 0000000000465b49 RDX: 000000000000000c RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f759d1236bc R13: 00000000004ca27c R14: 000000000070de40 R15: 00000000ffffffff Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: 0x39400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 7452a3c745a2 ("IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociate") Link: https://lore.kernel.org/r/20200527135534.482279-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-21RDMA/core: Allow the ioctl layer to abort a fully created uobjectJason Gunthorpe1-5/+20
While creating a uobject every create reaches a point where the uobject is fully initialized. For ioctls that go on to copy_to_user this means they need to open code the destruction of a fully created uobject - ie the RDMA_REMOVE_DESTROY sort of flow. Open coding this creates bugs, eg the CQ does not properly flush the events list when it does its error unwind. Provide a uverbs_finalize_uobj_create() function which indicates that the uobject is fully initialized and that abort should call to destroy_hw to destroy the uobj->object and related. Methods can call this function if they go on to have error cases after setting uobj->object. Once done those error cases can simply do return, without an error unwind. Link: https://lore.kernel.org/r/20200519072711.257271-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL eventJason Gunthorpe1-1/+2
The commit below moved all of the destruction to the disassociate step and cleaned up the event channel during destroy_uobj. However, when ib_uverbs_free_hw_resources() pushes IB_EVENT_DEVICE_FATAL and then immediately goes to destroy all uobjects this causes ib_uverbs_free_event_queue() to discard the queued event if userspace hasn't already read() it. Unlike all other event queues async FD needs to defer the ib_uverbs_free_event_queue() until FD release. This still unregisters the handler from the IB device during disassociation. Fixes: 3e032c0e92aa ("RDMA/core: Make ib_uverbs_async_event_file into a uobject") Link: https://lore.kernel.org/r/20200507063348.98713-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-24RDMA/core: Fix race between destroy and release FD objectLeon Romanovsky1-1/+1
The call to ->lookup_put() was too early and it caused an unlock of the read/write protection of the uobject after the FD was put. This allows a race: CPU1 CPU2 rdma_lookup_put_uobject() lookup_put_fd_uobject() fput() fput() uverbs_uobject_fd_release() WARN_ON(uverbs_try_lock_object(uobj, UVERBS_LOOKUP_WRITE)); atomic_dec(usecnt) Fix the code by changing the order, first unlock and call to ->lookup_put() after that. Fixes: 3832125624b7 ("IB/core: Add support for idr types") Link: https://lore.kernel.org/r/20200423060122.6182-1-leon@kernel.org Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22RDMA/core: Fix overwriting of uobj in case of errorLeon Romanovsky1-3/+2
In case of failure to get file, the uobj is overwritten and causes to supply bad pointer as an input to uverbs_uobject_put(). BUG: KASAN: null-ptr-deref in atomic_fetch_sub include/asm-generic/atomic-instrumented.h:199 [inline] BUG: KASAN: null-ptr-deref in refcount_sub_and_test include/linux/refcount.h:253 [inline] BUG: KASAN: null-ptr-deref in refcount_dec_and_test include/linux/refcount.h:281 [inline] BUG: KASAN: null-ptr-deref in kref_put include/linux/kref.h:64 [inline] BUG: KASAN: null-ptr-deref in uverbs_uobject_put+0x22/0x90 drivers/infiniband/core/rdma_core.c:57 Write of size 4 at addr 0000000000000030 by task syz-executor.4/1691 CPU: 1 PID: 1691 Comm: syz-executor.4 Not tainted 5.6.0 #17 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x94/0xce lib/dump_stack.c:118 __kasan_report+0x10c/0x190 mm/kasan/report.c:515 kasan_report+0x32/0x50 mm/kasan/common.c:625 check_memory_region_inline mm/kasan/generic.c:187 [inline] check_memory_region+0x16d/0x1c0 mm/kasan/generic.c:193 atomic_fetch_sub include/asm-generic/atomic-instrumented.h:199 [inline] refcount_sub_and_test include/linux/refcount.h:253 [inline] refcount_dec_and_test include/linux/refcount.h:281 [inline] kref_put include/linux/kref.h:64 [inline] uverbs_uobject_put+0x22/0x90 drivers/infiniband/core/rdma_core.c:57 alloc_begin_fd_uobject+0x1d0/0x250 drivers/infiniband/core/rdma_core.c:486 rdma_alloc_begin_uobject+0xa8/0xf0 drivers/infiniband/core/rdma_core.c:509 __uobj_alloc include/rdma/uverbs_std_types.h:117 [inline] ib_uverbs_create_comp_channel+0x16d/0x230 drivers/infiniband/core/uverbs_cmd.c:982 ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:665 __vfs_write+0x7c/0x100 fs/read_write.c:494 vfs_write+0x168/0x4a0 fs/read_write.c:558 ksys_write+0xc8/0x200 fs/read_write.c:611 do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x466479 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007efe9f6a7c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479 RDX: 0000000000000018 RSI: 0000000020000040 RDI: 0000000000000003 RBP: 00007efe9f6a86bc R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005 R13: 0000000000000bf2 R14: 00000000004cb80a R15: 00000000006fefc0 Fixes: 849e149063bd ("RDMA/core: Do not allow alloc_commit to fail") Link: https://lore.kernel.org/r/20200421082929.311931-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22RDMA/core: Prevent mixed use of FDs between shared ufilesLeon Romanovsky1-1/+1
FDs can only be used on the ufile that created them, they cannot be mixed to other ufiles. We are lacking a check to prevent it. BUG: KASAN: null-ptr-deref in atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline] BUG: KASAN: null-ptr-deref in atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline] BUG: KASAN: null-ptr-deref in fput_many+0x1a/0x140 fs/file_table.c:336 Write of size 8 at addr 0000000000000038 by task syz-executor179/284 CPU: 0 PID: 284 Comm: syz-executor179 Not tainted 5.5.0-rc5+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x94/0xce lib/dump_stack.c:118 __kasan_report+0x18f/0x1b7 mm/kasan/report.c:510 kasan_report+0xe/0x20 mm/kasan/common.c:639 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x15d/0x1b0 mm/kasan/generic.c:192 atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline] atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline] fput_many+0x1a/0x140 fs/file_table.c:336 rdma_lookup_put_uobject+0x85/0x130 drivers/infiniband/core/rdma_core.c:692 uobj_put_read include/rdma/uverbs_std_types.h:96 [inline] _ib_uverbs_lookup_comp_file drivers/infiniband/core/uverbs_cmd.c:198 [inline] create_cq+0x375/0xba0 drivers/infiniband/core/uverbs_cmd.c:1006 ib_uverbs_create_cq+0x114/0x140 drivers/infiniband/core/uverbs_cmd.c:1089 ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:769 __vfs_write+0x7c/0x100 fs/read_write.c:494 vfs_write+0x168/0x4a0 fs/read_write.c:558 ksys_write+0xc8/0x200 fs/read_write.c:611 do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x44ef99 Code: 00 b8 00 01 00 00 eb e1 e8 74 1c 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c4 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc0b74c028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007ffc0b74c030 RCX: 000000000044ef99 RDX: 0000000000000040 RSI: 0000000020000040 RDI: 0000000000000005 RBP: 00007ffc0b74c038 R08: 0000000000401830 R09: 0000000000401830 R10: 00007ffc0b74c038 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00000000006be018 R15: 0000000000000000 Fixes: cf8966b3477d ("IB/core: Add support for fd objects") Link: https://lore.kernel.org/r/20200421082929.311931-2-leon@kernel.org Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/core: Remove ucontext_lock from the uverbs_destry_ufile_hw() pathJason Gunthorpe1-20/+1
This lock only serializes ucontext creation. Instead of checking the ucontext_lock during destruction hold the existing hw_destroy_rwsem during creation, which is the standard pattern for object creation. The simplification of locking is needed for the next patch. Link: https://lore.kernel.org/r/1578506740-22188-4-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Remove the ufile arg from rdma_alloc_begin_uobjectJason Gunthorpe1-17/+19
Now that all callers provide a non-NULL attrs the ufile is redundant. Adjust things so that the context handling is done inside alloc_uobj, and the ib_uverbs_get_ucontext_file() is avoided if we already have the context. Link: https://lore.kernel.org/r/1578504126-9400-13-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not allow alloc_commit to failJason Gunthorpe1-62/+56
This is a left over from an earlier version that creates a lot of complexity for error unwind, particularly for FD uobjects. The only reason this was done is so that anon_inode_get_file() could be called with the final fops and a fully setup uobject. Both need to be setup since unwinding anon_inode_get_file() via fput will call the driver's release(). Now that the driver does not provide release, we no longer need to worry about this complicated sequence, simply create the struct file at the start and allow the core code's release function to deal with the abort case. This allows all the confusing error paths around commit to be removed. Link: https://lore.kernel.org/r/1578504126-9400-5-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Simplify destruction of FD uobjectsJason Gunthorpe1-13/+21
FD uobjects have a weird split between the struct file and uobject world. Simplify this to make them pure uobjects and use a generic release method for all struct file operations. This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always called before erasing the linked list contents to make the concurrancy simpler to understand. For this to work the uobject destruction must fence anything that it is cleaning up - the design must not rely on struct file lifetime. Only deliver_event() relies on the struct file to when adding new events to the queue, add a is_destroyed check under lock to block it. Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/mlx5: Use RCU and direct refcounts to keep memory aliveJason Gunthorpe1-5/+6
dispatch_event_fd() runs from a notifier with minimal locking, and relies on RCU and a file refcount to keep the uobject and eventfd alive. As the next patch wants to remove the file_operations release function from the drivers, re-organize things so that the devx_event_notifier() path uses the existing RCU to manage the lifetime of the uobject and eventfd. Move the refcount puts to a call_rcu so that the objects are guaranteed to exist and remove the indirect file refcount. Link: https://lore.kernel.org/r/1578504126-9400-2-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/uverbs: Remove needs_kfree_rcu from uverbs_obj_type_classJason Gunthorpe1-22/+1
After device disassociation the uapi_objects are destroyed and freed, however it is still possible that core code can be holding a kref on the uobject. When it finally goes to uverbs_uobject_free() via the kref_put() it can trigger a use-after-free on the uapi_object. Since needs_kfree_rcu is a micro optimization that only benefits file uobjects, just get rid of it. There is no harm in using kfree_rcu even if it isn't required, and the number of involved objects is small. Link: https://lore.kernel.org/r/20200113143306.GA28717@ziepe.ca Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/core: Create mmap database and cookie helper functionsMichal Kalderon1-0/+1
Create some common API's for adding entries to a xa_mmap. Searching for an entry and freeing one. The general approach is copied from the EFA driver and improved to be more general and do more to help the drivers. Integration with the core allows a reference counted scheme with a free function so that the driver can know when its mmaps are all gone. This significant new functionality will be helpful for drivers to have the correct lifetime model for mmap objects. Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25uverbs: Convert idr to XArrayMatthew Wilcox1-52/+24
The word 'idr' is scattered throughout the API, so I haven't changed it, but the 'idr' variable is now an XArray. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08IB: When attrs.udata/ufile is available use that instead of uobjectJason Gunthorpe1-3/+3
The ucontext and ufile should not be accessed via the uobject, all these cases have an attrs so use that instead. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass uverbs_attr_bundle down uobject destroy pathShamir Rabinovitch1-18/+30
Pass uverbs_attr_bundle down the uobject destroy path. The next patch will use this to eliminate the dependecy of the drivers in ib_x->uobject pointers. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: ucontext should be set properly for all cmd & ioctl pathsShamir Rabinovitch1-49/+21
the Attempt to use the below commit to initialize the ucontext for the uobject destroy path has shown that the below commit is incomplete. Parts were reverted and the ucontext set up in the uverbs_attr_bundle was moved to rdma_lookup_get_uobject which is called from the uobj_get_XXX macros and rdma_alloc_begin_uobject which is called when uobject is created. Fixes: 3d9dfd060391 ("IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows") Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22RDMA: Handle ucontext allocations by IB/coreLeon Romanovsky1-7/+2
Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flowsShamir Rabinovitch1-0/+32
Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows as soon as the flow has ib_uobject. In addition, remove rdma_get_ucontext helper function that is only used by ib_umem_get. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FDYishai Hadas1-0/+1
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation. This object is from type class FD and will be used to read DEVX async commands completion. The core layer should allow the driver to set object from type FD in a safe mode, this option was added with a matching comment in place. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-12RDMA: Start use ib_device_opsKamal Heib1-3/+3
Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-04IB/core: Introduce UVERBS_IDR_ANY_OBJECTYishai Hadas1-10/+17
Introduce the UVERBS_IDR_ANY_OBJECT type to match any IDR object. Once used, the infrastructure skips checking for the IDR type, it becomes the driver handler responsibility. This enables drivers to get in a given method an object from various of types. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Annotate alloc/deallloc paths with context trackingLeon Romanovsky1-0/+2
Add restrack annotations to track allocations of ucontexts. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-11-26RDMA/uverbs: Make write() handlers return 0 on successJason Gunthorpe1-6/+4
Currently they return the command length, while all other handlers return 0. This makes the write path closer to the write_ex and ioctl path. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Replace ib_uverbs_file with uverbs_attr_bundle for writeJason Gunthorpe1-5/+8
Now that we can add meta-data to the description of write() methods we need to pass the uverbs_attr_bundle into all write based handlers so future patches can use it as a container for any new data transferred out of the core. This is the first step to bringing the write() and ioctl() methods to a common interface signature. This is a simple search/replace, and we push the attr down into the uobj and other APIs to keep changes minimal. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-09-21RDMA/uverbs: Get rid of ucontext->tgidJason Gunthorpe1-1/+0
Nothing uses this now, just delete it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20RDMA/ucontext: Get rid of the old disassociate flowJason Gunthorpe1-41/+10
The disassociate_ucontext function in every driver is now empty, so we don't need this ugly and wrong code that was messing with tgids. rdma_user_mmap_io does this same work in a better way. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-20RDMA/ucontext: Add a core API for mmaping driver IO memoryJason Gunthorpe1-1/+3
To support disassociation and PCI hot unplug, we have to track all the VMAs that refer to the device IO memory. When disassociation occurs the VMAs have to be revised to point to the zero page, not the IO memory, to allow the physical HW to be unplugged. The three drivers supporting this implemented three different versions of this algorithm, all leaving something to be desired. This new common implementation has a few differences from the driver versions: - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of the private data allocation to the lifetime of the vma. This avoids any tricks with setting vm_ops which Linus didn't like. (see link) - Support multiple mms, and support properly tracking mmaps triggered by processes other than the one first opening the uverbs fd. This makes fork behavior of disassociation enabled drivers the same as fork support in normal drivers. - Don't use crazy get_task stuff. - Simplify the approach for to racing between vm_ops close and disassociation, fixing the related bugs most of the driver implementations had. Since we are in core code the tracking list can be placed in struct ib_uverbs_ufile, which has a lifetime strictly longer than any VMAs created by mmap on the uverbs FD. Link: https://www.spinics.net/lists/stable/msg248747.html Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.com Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-04IB/core: Release object lock if destroy failedArtemy Kovalyov1-0/+2
The object lock was supposed to always be released during destroy, but when the destruction retry series was integrated with the destroy series it created a failure path that missed the unlock. Keep with convention, if destroy fails the caller must undo all locking. Fixes: 87ad80abc70d ("IB/uverbs: Consolidate uobject destruction") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16Merge branch 'linus/master' into rdma.git for-nextJason Gunthorpe1-1/+1
rdma.git merge resolution for the 4.19 merge window Conflicts: drivers/infiniband/core/rdma_core.c - Use the rdma code and revise with the new spelling for atomic_fetch_add_unless drivers/nvme/host/rdma.c - Replace max_sge with max_send_sge in new blk code drivers/nvme/target/rdma.c - Use the blk code and revise to use NULL for ib_post_recv when appropriate - Replace max_sge with max_recv_sge in new blk code net/rds/ib_send.c - Use the net code and revise to use NULL for ib_post_recv when appropriate Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13IB/uverbs: Remove struct uverbs_root_spec and all supporting codeJason Gunthorpe1-45/+0
Everything now uses the uverbs_uapi data structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-10IB/uverbs: Use uverbs_api to manage the object type inside the uobjectJason Gunthorpe1-45/+55
Currently the struct uverbs_obj_type stored in the ib_uobject is part of the .rodata segment of the module that defines the object. This is a problem if drivers define new uapi objects as we will be left with a dangling pointer after device disassociation. Switch the uverbs_obj_type for struct uverbs_api_object, which is allocated memory that is part of the uverbs_api and is guaranteed to always exist. Further this moves the 'type_class' into this memory which means access to the IDR/FD function pointers is also guaranteed. Drivers cannot define new types. This makes it safe to continue to use all uobjects, including driver defined ones, after disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Allow all DESTROY commands to succeed after disassociateJason Gunthorpe1-11/+55
The disassociate function was broken by design because it failed all commands. This prevents userspace from calling destroy on a uobject after it has detected a device fatal error and thus reclaiming the resources in userspace is prevented. This fix is now straightforward, when anything destroys a uobject that is not the user the object remains on the IDR with a NULL context and object pointer. All lookup locking modes other than DESTROY will fail. When the user ultimately calls the destroy function it is simply dropped from the IDR while any related information is returned. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Lower the test for ongoing disassociationJason Gunthorpe1-0/+11
Commands that are reading/writing to objects can test for an ongoing disassociation during their initial call to rdma_lookup_get_uobject. This directly prevents all of these commands from conflicting with an ongoing disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Allow uobject allocation to work concurrently with disassociateJason Gunthorpe1-11/+26
After all the recent structural changes this is now straightforward, hold the hw_destroy_rwsem across the entire uobject creation. We already take this semaphore on the success path, so holding it a bit longer is not going to change the performance. After this change none of the create callbacks require the disassociate_srcu lock to be correct. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociateJason Gunthorpe1-22/+49
After all the recent structural changes this is now straightfoward, hoist the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around the uobject write lock as well as the destroy. This is necessary as obtaining a write lock concurrently with uverbs_destroy_ufile_hw() will cause malfunction. After this change none of the destroy callbacks require the disassociate_srcu lock to be correct. This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the IOCTL interface needs to hold an unlocked kref until all command verification is completed. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Convert 'bool exclusive' into an enumJason Gunthorpe1-37/+57
This is more readable, and future patches will need a 3rd lookup type. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Consolidate uobject destructionJason Gunthorpe1-129/+122
There are several flows that can destroy a uobject and each one is minimized and sprinkled throughout the code base, making it difficult to understand and very hard to modify the destroy path. Consolidate all of these into uverbs_destroy_uobject() and call it in all cases where a uobject has to be destroyed. This makes one change to the lifecycle, during any abort (eg when alloc_commit is not called) we always call out to alloc_abort, even if remove_commit needs to be called to delete a HW object. This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to clarify its actual usage and revises some of the comments to reflect what the life cycle is for the type implementation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Make the write path destroy methods use the same flow as ioctlJason Gunthorpe1-24/+29
The ridiculous dance with uobj_remove_commit() is not needed, the write path can follow the same flow as ioctl - lock and destroy the HW object then use the data left over in the uobject to form the response to userspace. Two helpers are introduced to make this flow straightforward for the caller. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>