aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-03-04RDMA/iwcm: Fix iwcm work deallocationBernard Metzler1-1/+3
The dealloc_work_entries() function must update the work_free_list pointer while freeing its entries, since potentially called again on same list. A second iteration of the work list caused system crash. This happens, if work allocation fails during cma_iw_listen() and free_cm_id() tries to free the list again during cleanup. Fixes: 922a8e9fb2e0 ("RDMA: iWARP Connection Manager.") Link: https://lore.kernel.org/r/20200302181614.17042-1-bmt@zurich.ibm.com Reported-by: syzbot+cb0c054eabfba4342146@syzkaller.appspotmail.com Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04RDMA/nldev: Fix crash when set a QP to a new counter but QPN is missingMark Zhang1-0/+2
This fixes the kernel crash when a RDMA_NLDEV_CMD_STAT_SET command is received, but the QP number parameter is not available. iwpm_register_pid: Unable to send a nlmsg (client = 2) infiniband syz1: RDMA CMA: cma_listen_on_dev, error -98 general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 9754 Comm: syz-executor069 Not tainted 5.6.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nla_get_u32 include/net/netlink.h:1474 [inline] RIP: 0010:nldev_stat_set_doit+0x63c/0xb70 drivers/infiniband/core/nldev.c:1760 Code: fc 01 0f 84 58 03 00 00 e8 41 83 bf fb 4c 8b a3 58 fd ff ff 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 6d RSP: 0018:ffffc900068bf350 EFLAGS: 00010247 RAX: dffffc0000000000 RBX: ffffc900068bf728 RCX: ffffffff85b60470 RDX: 0000000000000000 RSI: ffffffff85b6047f RDI: 0000000000000004 RBP: ffffc900068bf750 R08: ffff88808c3ee140 R09: ffff8880a25e6010 R10: ffffed10144bcddc R11: ffff8880a25e6ee3 R12: 0000000000000000 R13: ffff88809acb0000 R14: ffff888092a42c80 R15: 000000009ef2e29a FS: 0000000001ff0880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4733e34000 CR3: 00000000a9b27000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rdma_nl_rcv_msg drivers/infiniband/core/netlink.c:195 [inline] rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline] rdma_nl_rcv+0x5d9/0x980 drivers/infiniband/core/netlink.c:259 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:672 ____sys_sendmsg+0x753/0x880 net/socket.c:2343 ___sys_sendmsg+0x100/0x170 net/socket.c:2397 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430 __do_sys_sendmsg net/socket.c:2439 [inline] __se_sys_sendmsg net/socket.c:2437 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4403d9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc0efbc5c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004403d9 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004 RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8 R10: 000000000000004a R11: 0000000000000246 R12: 0000000000401c60 R13: 0000000000401cf0 R14: 0000000000000000 R15: 0000000000000000 Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink") Link: https://lore.kernel.org/r/20200227125111.99142-1-leon@kernel.org Reported-by: syzbot+bd4af81bc51ee0283445@syzkaller.appspotmail.com Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04RDMA/odp: Ensure the mm is still alive before creating an implicit childJason Gunthorpe1-5/+19
Registration of a mmu_notifier requires the caller to hold a mmget() on the mm as registration is not permitted to race with exit_mmap(). There is a BUG_ON inside the mmu_notifier to guard against this. Normally creating a umem is done against current which implicitly holds the mmget(), however an implicit ODP child is created from a pagefault work queue and is not guaranteed to have a mmget(). Call mmget() around this registration and abort faulting if the MM has gone to exit_mmap(). Before the patch below the notifier was registered when the implicit ODP parent was created, so there was no chance to register a notifier outside of current. Fixes: c571feca2dc9 ("RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'") Link: https://lore.kernel.org/r/20200227114118.94736-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-04RDMA/core: Fix protection fault in ib_mr_pool_destroyMaor Gottlieb3-19/+14
Fix NULL pointer dereference in the error flow of ib_create_qp_user when accessing to uninitialized list pointers - rdma_mrs and sig_mrs. The following crash from syzkaller revealed it. kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 23167 Comm: syz-executor.1 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:ib_mr_pool_destroy+0x81/0x1f0 Code: 00 00 fc ff df 49 c1 ec 03 4d 01 fc e8 a8 ea 72 fe 41 80 3c 24 00 0f 85 62 01 00 00 48 8b 13 48 89 d6 4c 8d 6a c8 48 c1 ee 03 <42> 80 3c 3e 00 0f 85 34 01 00 00 48 8d 7a 08 4c 8b 02 48 89 fe 48 RSP: 0018:ffffc9000951f8b0 EFLAGS: 00010046 RAX: 0000000000040000 RBX: ffff88810f268038 RCX: ffffffff82c41628 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffc9000951f850 RBP: ffff88810f268020 R08: 0000000000000004 R09: fffff520012a3f0a R10: 0000000000000001 R11: fffff520012a3f0a R12: ffffed1021e4d007 R13: ffffffffffffffc8 R14: 0000000000000246 R15: dffffc0000000000 FS: 00007f54bc788700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000116920002 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rdma_rw_cleanup_mrs+0x15/0x30 ib_destroy_qp_user+0x674/0x7d0 ib_create_qp_user+0xb01/0x11c0 create_qp+0x1517/0x2130 ib_uverbs_create_qp+0x13e/0x190 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f54bc787c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49 RDX: 0000000000000040 RSI: 0000000020000540 RDI: 0000000000000003 RBP: 00007f54bc787c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f54bc7886bc R13: 00000000004ca2ec R14: 000000000070ded0 R15: 0000000000000005 Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API") Link: https://lore.kernel.org/r/20200227112708.93023-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-28RDMA/core: Fix pkey and port assignment in get_new_ppsMaor Gottlieb1-4/+8
When port is part of the modify mask, then we should take it from the qp_attr and not from the old pps. Same for PKEY. Otherwise there are panics in some configurations: RIP: 0010:get_pkey_idx_qp_list+0x50/0x80 [ib_core] Code: c7 18 e8 13 04 30 ef 0f b6 43 06 48 69 c0 b8 00 00 00 48 03 85 a0 04 00 00 48 8b 50 20 48 8d 48 20 48 39 ca 74 1a 0f b7 73 04 <66> 39 72 10 75 08 eb 10 66 39 72 10 74 0a 48 8b 12 48 39 ca 75 f2 RSP: 0018:ffffafb3480932f0 EFLAGS: 00010203 RAX: ffff98059ababa10 RBX: ffff980d926e8cc0 RCX: ffff98059ababa30 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff98059ababa28 RBP: ffff98059b940000 R08: 00000000000310c0 R09: ffff97fe47c07480 R10: 0000000000000036 R11: 0000000000000200 R12: 0000000000000071 R13: ffff98059b940000 R14: ffff980d87f948a0 R15: 0000000000000000 FS: 00007f88deb31740(0000) GS:ffff98059f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000853e26001 CR4: 00000000001606e0 Call Trace: port_pkey_list_insert+0x3d/0x1b0 [ib_core] ? kmem_cache_alloc_trace+0x215/0x220 ib_security_modify_qp+0x226/0x3a0 [ib_core] _ib_modify_qp+0xcf/0x390 [ib_core] ipoib_init_qp+0x7f/0x200 [ib_ipoib] ? rvt_modify_port+0xd0/0xd0 [rdmavt] ? ib_find_pkey+0x99/0xf0 [ib_core] ipoib_ib_dev_open_default+0x1a/0x200 [ib_ipoib] ipoib_ib_dev_open+0x96/0x130 [ib_ipoib] ipoib_open+0x44/0x130 [ib_ipoib] __dev_open+0xd1/0x160 __dev_change_flags+0x1ab/0x1f0 dev_change_flags+0x23/0x60 do_setlink+0x328/0xe30 ? __nla_validate_parse+0x54/0x900 __rtnl_newlink+0x54e/0x810 ? __alloc_pages_nodemask+0x17d/0x320 ? page_fault+0x30/0x50 ? _cond_resched+0x15/0x30 ? kmem_cache_alloc_trace+0x1c8/0x220 rtnl_newlink+0x43/0x60 rtnetlink_rcv_msg+0x28f/0x350 ? kmem_cache_alloc+0x1fb/0x200 ? _cond_resched+0x15/0x30 ? __kmalloc_node_track_caller+0x24d/0x2d0 ? rtnl_calcit.isra.31+0x120/0x120 netlink_rcv_skb+0xcb/0x100 netlink_unicast+0x1e0/0x340 netlink_sendmsg+0x317/0x480 ? __check_object_size+0x48/0x1d0 sock_sendmsg+0x65/0x80 ____sys_sendmsg+0x223/0x260 ? copy_msghdr_from_user+0xdc/0x140 ___sys_sendmsg+0x7c/0xc0 ? skb_dequeue+0x57/0x70 ? __inode_wait_for_writeback+0x75/0xe0 ? fsnotify_grab_connector+0x45/0x80 ? __dentry_kill+0x12c/0x180 __sys_sendmsg+0x58/0xa0 do_syscall_64+0x5b/0x200 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f88de467f10 Link: https://lore.kernel.org/r/20200227125728.100551-1-leon@kernel.org Cc: <stable@vger.kernel.org> Fixes: 1dd017882e01 ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27RMDA/cm: Fix missing ib_cm_destroy_id() in ib_cm_insert_listen()Jason Gunthorpe1-0/+1
The algorithm pre-allocates a cm_id since allocation cannot be done while holding the cm.lock spinlock, however it doesn't free it on one error path, leading to a memory leak. Fixes: 067b171b8679 ("IB/cm: Share listening CM IDs") Link: https://lore.kernel.org/r/20200221152023.GA8680@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-20RDMA/rw: Fix error flow during RDMA context initializationMax Gurtovoy1-11/+20
In case the SGL was mapped for P2P DMA operation, we must unmap it using pci_p2pdma_unmap_sg during the error unwind of rdma_rw_ctx_init() Fixes: 7f73eac3a713 ("PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()") Link: https://lore.kernel.org/r/20200220100819.41860-1-maxg@mellanox.com Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19RDMA/core: Fix use of logical OR in get_new_ppsNathan Chancellor1-1/+1
Clang warns: ../drivers/infiniband/core/security.c:351:41: warning: converting the enum constant to a boolean [-Wint-in-bool-context] if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) { ^ 1 warning generated. A bitwise OR should have been used instead. Fixes: 1dd017882e01 ("RDMA/core: Fix protection fault in get_pkey_idx_qp_list") Link: https://lore.kernel.org/r/20200217204318.13609-1-natechancellor@gmail.com Link: https://github.com/ClangBuiltLinux/linux/issues/889 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"Parav Pandit1-4/+11
This reverts commit 219d2e9dfda9431b808c28d5efc74b404b95b638. The call chain below requires the cm_id_priv's destination address to be setup before performing rdma_bind_addr(). Otherwise source port allocation fails as cma_port_is_unique() no longer sees the correct tuple to allow duplicate users of the source port. rdma_resolve_addr() cma_bind_addr() rdma_bind_addr() cma_get_port() cma_alloc_any_port() cma_port_is_unique() <- compared with zero daddr This can result in false failures to connect, particularly if the source port range is restricted. Fixes: 219d2e9dfda9 ("RDMA/cma: Simplify rdma_resolve_addr() error flow") Link: https://lore.kernel.org/r/20200212072635.682689-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/core: Fix protection fault in get_pkey_idx_qp_listLeon Romanovsky1-15/+9
We don't need to set pkey as valid in case that user set only one of pkey index or port number, otherwise it will be resulted in NULL pointer dereference while accessing to uninitialized pkey list. The following crash from Syzkaller revealed it. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0 Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48 8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8 RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010 RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430 R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000 FS: 00007f20777de700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: port_pkey_list_insert+0xd7/0x7c0 ib_security_modify_qp+0x6fa/0xfc0 _ib_modify_qp+0x8c4/0xbf0 modify_qp+0x10da/0x16d0 ib_uverbs_modify_qp+0x9a/0x100 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Message-Id: <20200212080651.GB679970@unreal>
2020-02-13IB/umad: Fix kernel crash while unloading ib_umadYonatan Cohen1-2/+3
When disassociating a device from umad we must ensure that the sysfs access is prevented before blocking the fops, otherwise assumptions in syfs don't hold: CPU0 CPU1 ib_umad_kill_port() ibdev_show() port->ib_dev = NULL dev_name(port->ib_dev) The prior patch made an error in moving the device_destroy(), it should have been split into device_del() (above) and put_device() (below). At this point we already have the split, so move the device_del() back to its original place. kernel stack PF: error_code(0x0000) - not-present page Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 Call Trace: dev_attr_show+0x15/0x50 sysfs_kf_seq_show+0xb8/0x1a0 seq_read+0x12d/0x350 vfs_read+0x89/0x140 ksys_read+0x55/0xd0 do_syscall_64+0x55/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9: Fixes: cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed") Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/core: Add missing list deletion on freeing event queueMichael Guralnik1-0/+1
When the uobject file scheme was revised to allow device disassociation from the file it became possible for read() to still happen the driver destroys the uobject. The old clode code was not tolerant to concurrent read, and when it was moved to the driver destroy it creates a bug. Ensure the event_list is empty after driver destroy by adding the missing list_del(). Otherwise read() can trigger a use after free and double kfree. Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects") Link: https://lore.kernel.org/r/20200212072635.682689-6-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11RDMA/core: Fix invalid memory access in spec_filter_sizeAvihai Horon1-8/+7
Add a check that the size specified in the flow spec header doesn't cause an overflow when calculating the filter size, and thus prevent access to invalid memory. The following crash from syzkaller revealed it. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:memchr_inv+0xd3/0x330 Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85 ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 <80> 3c 01 00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01 RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04 RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820 RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009 R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004 R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040 FS: 00007f9ee0e51700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: spec_filter_size.part.16+0x34/0x50 ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770 ib_uverbs_ex_create_flow+0x9ea/0x1b40 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49 RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) Fixes: 94e03f11ad1f ("IB/uverbs: Add support for flow tag") Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds29-2037/+1712
Pull rdma updates from Jason Gunthorpe: "A very quiet cycle with few notable changes. Mostly the usual list of one or two patches to drivers changing something that isn't quite rc worthy. The subsystem seems to be seeing a larger number of rework and cleanup style patches right now, I feel that several vendors are prepping their drivers for new silicon. Summary: - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4, rxe, i40iw - Larger series doing cleanup and rework for hns and hfi1. - Some general reworking of the CM code to make it a little more understandable - Unify the different code paths connected to the uverbs FD scheme - New UAPI ioctls conversions for get context and get async fd - Trace points for CQ and CM portions of the RDMA stack - mlx5 driver support for virtio-net formatted rings as RDMA raw ethernet QPs - verbs support for setting the PCI-E relaxed ordering bit on DMA traffic connected to a MR - A couple of bug fixes that came too late to make rc7" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits) RDMA/core: Make the entire API tree static RDMA/efa: Mask access flags with the correct optional range RDMA/cma: Fix unbalanced cm_id reference count during address resolve RDMA/umem: Fix ib_umem_find_best_pgsz() IB/mlx4: Fix leak in id_map_find_del IB/opa_vnic: Spelling correction of 'erorr' to 'error' IB/hfi1: Fix logical condition in msix_request_irq RDMA/cm: Remove CM message structs RDMA/cm: Use IBA functions for complex structure members RDMA/cm: Use IBA functions for simple structure members RDMA/cm: Use IBA functions for swapping get/set acessors RDMA/cm: Use IBA functions for simple get/set acessors RDMA/cm: Add SET/GET implementations to hide IBA wire format RDMA/cm: Add accessors for CM_REQ transport_type IB/mlx5: Return the administrative GUID if exists RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence IB/mlx4: Fix memory leak in add_gid error flow IB/mlx5: Expose RoCE accelerator counters RDMA/mlx5: Set relaxed ordering when requested RDMA/core: Add the core support field to METHOD_GET_CONTEXT ...
2020-01-31mm, tree-wide: rename put_user_page*() to unpin_user_page*()John Hubbard1-1/+1
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODPJohn Hubbard2-8/+7
Convert infiniband to use the new pin_user_pages*() calls. Also, revert earlier changes to Infiniband ODP that had it using put_user_page(). ODP is "Case 3" in Documentation/core-api/pin_user_pages.rst, which is to say, normal get_user_pages() and put_page() is the API to use there. The new pin_user_pages*() calls replace corresponding get_user_pages*() calls, and set the FOLL_PIN flag. The FOLL_PIN flag requires that the caller must return the pages via put_user_page*() calls, but infiniband was already doing that as part of an earlier commit. Link: http://lkml.kernel.org/r/20200107224558.2362728-14-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31IB/umem: use get_user_pages_fast() to pin DMA pagesJohn Hubbard1-11/+6
And get rid of the mmap_sem calls, as part of that. Note that get_user_pages_fast() will, if necessary, fall back to __gup_longterm_unlocked(), which takes the mmap_sem as needed. Link: http://lkml.kernel.org/r/20200107224558.2362728-10-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-30RDMA/core: Make the entire API tree staticJason Gunthorpe1-17/+0
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates the following error. on x86_64: ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC': main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file' ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class' ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler' This is happening because some parts of the UAPI description are not static. This is a hold over from earlier code that relied on struct pointers to refer to object types, now object types are referenced by number. Remove the unused globals and add statics to the remaining UAPI description elements. Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete mlx5_ib_get_devx_tree(). The compiler now trims alot more unused code, including the above problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS. Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods") Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28RDMA/cma: Fix unbalanced cm_id reference count during address resolveParav Pandit1-0/+2
Below commit missed the AF_IB and loopback code flow in rdma_resolve_addr(). This leads to an unbalanced cm_id refcount in cma_work_handler() which puts the refcount which was not incremented prior to queuing the work. A call trace is observed with such code flow: BUG: unable to handle kernel NULL pointer dereference at (null) [<ffffffff96b67e16>] __mutex_lock_slowpath+0x166/0x1d0 [<ffffffff96b6715f>] mutex_lock+0x1f/0x2f [<ffffffffc0beabb5>] cma_work_handler+0x25/0xa0 [<ffffffff964b9ebf>] process_one_work+0x17f/0x440 [<ffffffff964baf56>] worker_thread+0x126/0x3c0 Hence, hold the cm_id reference when scheduling the resolve work item. Fixes: 722c7b2bfead ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()") Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28RDMA/umem: Fix ib_umem_find_best_pgsz()Artemy Kovalyov1-3/+6
Except for the last entry, the ending iova alignment sets the maximum possible page size as the low bits of the iova must be zero when starting the next chunk. Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR") Link: https://lore.kernel.org/r/20200128135612.174820-1-leon@kernel.org Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Remove CM message structsJason Gunthorpe2-288/+0
All accesses now use the new IBA acessor scheme, so delete the structs entirely and generate the structures from the schema file. Link: https://lore.kernel.org/r/20200116170037.30109-8-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Use IBA functions for complex structure membersJason Gunthorpe1-64/+107
Use a Coccinelle spatch to replace CM structure members used as structures, arrays, or pointers with IBA_GET/SET versions. Applied with $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c The spatch file was generated using the template pattern: @@ expression src; expression len; {struct} *msg; @@ - memcpy(msg->{old_name}, src, len) + IBA_SET_MEM({new_name}, msg, src, len) @@ {struct} *msg; identifier x; @@ - msg->{old_name}.x + IBA_GET_MEM_PTR({new_name}, msg)->x @@ {struct} *msg; @@ - &msg->{old_name} + IBA_GET_MEM_PTR({new_name}, msg) For GIDs: @@ {struct} *msg; @@ - msg->{old_name} + *IBA_GET_MEM_PTR({new_name}, msg) For non-GIDs: @@ {struct} *msg; @@ - msg->{old_name} + IBA_GET_MEM_PTR({new_name}, msg) Iterated for every remaining IBA_CHECK_OFF()/IBA_CHECK_GET() pairing. Touched up with clang-format after. Link: https://lore.kernel.org/r/20200116170037.30109-7-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Use IBA functions for simple structure membersJason Gunthorpe1-139/+245
Use a Coccinelle spatch script to replace use of simple CM structure members with IBA_GET/SET versions. Applied with $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c The spatch file was generated using the template pattern: @@ expression val; {struct} *msg; @@ - msg->{old_name} = val + IBA_SET({new_name}, msg, be{bits}_to_cpu(val)) @@ {struct} *msg; @@ - msg->{old_name} + cpu_to_be{bits}(IBA_GET({new_name}, msg)) Iterated for every IBA_CHECK_OFF that isn't a CM_FIELD_MLOC. And the below iterated over all byte sizes to remove doubled byte swaps: @@ expression val; @@ -be{bits}_to_cpu(cpu_to_be{bits}(val)) +val (and __be_to_cpu and ntoh varients) Touched up with clang-format after. Link: https://lore.kernel.org/r/20200116170037.30109-6-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Use IBA functions for swapping get/set acessorsJason Gunthorpe2-269/+36
Use a Coccinelle spatch script to replace CM helper functions that return/accept BE values with IBA_GET/SET versions. Applied with $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c The spatch file was generated using the template pattern: @@ expression val; {struct} *msg; @@ - {old_setter}(msg, val) + IBA_SET({new_name}, msg, be{bits}_to_cpu(val)) @@ {struct} *msg; @@ - {old_getter}(msg) + cpu_to_be{bits}(IBA_GET({new_name}, msg)) Iterated for every IBA_CHECK_GET_BE()/IBA_CHECK_SET_BE() pairing. And the below iterated over all byte sizes to remove doubled byte swaps: @@ expression val; @@ -be{bits}_to_cpu(cpu_to_be{bits}(val)) +val (and __be_to_cpu and ntoh varients) Touched up with clang-format after. Link: https://lore.kernel.org/r/20200116170037.30109-5-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Use IBA functions for simple get/set acessorsJason Gunthorpe2-509/+104
Use a Coccinelle spatch to replace CM helper functions with IBA_GET/SET versions. Applied with $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c The spatch file was generated using the template pattern: @@ expression val; {struct} *msg; @@ - {old_setter} + IBA_SET({new_name}, msg, val) @@ {struct} *msg; @@ - {old_getter} + IBA_GET({new_name}, msg) Iterated for every IBA_CHECK_GET()/IBA_CHECK_GET() pairing. Touched up with clang-format after. Link: https://lore.kernel.org/r/20200116170037.30109-4-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Add SET/GET implementations to hide IBA wire formatLeon Romanovsky2-0/+268
There is no separation between RDMA-CM wire format as it is declared in IBTA and kernel logic which implements needed support. Such situation causes to many mistakes in conversion between big-endian (wire format) and CPU format used by kernel. It also mixes RDMA core code with combination of uXX and beXX variables. The idea that all accesses to IBA definitions will go through special GET/SET macros to ensure that no conversion mistakes are made. The shifting and masking required to read the value is automatically deduced using the field offset description from the tables in the IBA specification. This starts with the CM MADs described in IBTA release 1.3 volume 1. To confirm that the new macros behave the same as the old accessors a self-test is included in this patch. Each macro replacing a straightforward struct field compile-time tests that the new field has the same offsetof() and width as the old field. For the fields with accessor functions a runtime test, the 'all ones' value is placed in a dummy message and read back in several ways to confirm that both approaches give identical results. Later patches in this series delete the self test. This creates a tested table of new field name, old field name(s) and some meta information like BE coding for the functions which will be used in the next patches. Link: https://lore.kernel.org/r/20200116170037.30109-3-jgg@ziepe.ca Link: https://lore.kernel.org/r/20191212093830.316934-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Add accessors for CM_REQ transport_typeJason Gunthorpe1-12/+29
Access the two fields through wrappers, like all other fields, to make it clearer what is happening. Link: https://lore.kernel.org/r/20200116170037.30109-2-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fenceJason Gunthorpe1-0/+2
The set of entry->driver_removed is missing locking, protect it with xa_lock() which is held by the only reader. Otherwise readers may continue to see driver_removed = false after rdma_user_mmap_entry_remove() returns and may continue to try and establish new mmaps. Fixes: 3411f9f01b76 ("RDMA/core: Create mmap database and cookie helper functions") Link: https://lore.kernel.org/r/20200115202041.GA17199@ziepe.ca Reviewed-by: Gal Pressman <galpress@amazon.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21Merge tag 'rds-odp-for-5.5' into rdma.git for-nextJason Gunthorpe3-40/+57
From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== * tag 'rds-odp-for-5.5': net/rds: Use prefetch for On-Demand-Paging MR net/rds: Handle ODP mr registration/unregistration net/rds: Detect need of On-Demand-Paging memory registration RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs RDMA/mlx5: Don't fake udata for kernel path IB/mlx5: Add ODP WQE handlers for kernel QPs IB/core: Add interface to advise_mr for kernel users IB/core: Introduce ib_reg_user_mr IB: Allow calls to ib_umem_get from kernel ULPs Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/core: Add the core support field to METHOD_GET_CONTEXTMichael Guralnik1-0/+8
Add the core support field to METHOD_GET_CONTEXT, this field should represent capabilities that are not device-specific. Return support for optional access flags for memory regions. User-space will use this capability to mask the optional access flags for unsupporting kernels. Link: https://lore.kernel.org/r/1578506740-22188-10-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/uverbs: Add ioctl command to get a device contextJason Gunthorpe5-62/+114
Allow future extensions of the get context command through the uverbs ioctl kabi. Unlike the uverbs version this does not return an async_fd as well, that has to be done with another command. Link: https://lore.kernel.org/r/1578506740-22188-5-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/core: Remove ucontext_lock from the uverbs_destry_ufile_hw() pathJason Gunthorpe2-21/+5
This lock only serializes ucontext creation. Instead of checking the ucontext_lock during destruction hold the existing hw_destroy_rwsem during creation, which is the standard pattern for object creation. The simplification of locking is needed for the next patch. Link: https://lore.kernel.org/r/1578506740-22188-4-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/core: Add UVERBS_METHOD_ASYNC_EVENT_ALLOCJason Gunthorpe2-2/+25
Allow the async FD to be allocated separately from the context. This is necessary to introduce the ioctl to create a context, as an ioctl should only ever create a single uobject at a time. If multiple async FDs are created then the first one is used to deliver affiliated events from any ib_uevent_object, with all subsequent ones will receive only unaffiliated events. Link: https://lore.kernel.org/r/1578506740-22188-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB/core: Add interface to advise_mr for kernel usersMoni Shoua1-0/+11
Allow ULPs to call advise_mr, so they can control ODP regions in the same way as user space applications. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB/core: Introduce ib_reg_user_mrMoni Shoua1-0/+30
Add ib_reg_user_mr() for kernel ULPs to register user MRs. The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP that acts as an agent for the userspace application. This function is intended to be used without a user context so vendor drivers need to be aware of calling reg_user_mr() device operation with udata equal to NULL. Among all drivers, i40iw is the only driver which relies on presence of udata, so check udata existence for that driver. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB: Allow calls to ib_umem_get from kernel ULPsMoni Shoua2-40/+16
So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-13RDMA/core: Use READ_ONCE for ib_ufile.async_fileJason Gunthorpe5-26/+18
The writer for async_file holds the ucontext_lock, while the readers are left unlocked. Most readers rely on an implicit locking, either by having a uobject (which cannot be created before a context) or by holding the ib_ufile kref. However ib_uverbs_free_hw_resources() has no implicit lock and has a possible race. Make this all clear and sane by using READ_ONCE consistently. Link: https://lore.kernel.org/r/1578504126-9400-15-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Make ib_uverbs_async_event_file into a uobjectJason Gunthorpe8-143/+95
This makes async events aligned with completion events as both are full uobjects of FD type and use the same uobject lifecycle. A bunch of duplicate code is consolidated and the general flow between the two FDs is now very similar. Link: https://lore.kernel.org/r/1578504126-9400-14-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Remove the ufile arg from rdma_alloc_begin_uobjectJason Gunthorpe1-17/+19
Now that all callers provide a non-NULL attrs the ufile is redundant. Adjust things so that the context handling is done inside alloc_uobj, and the ib_uverbs_get_ucontext_file() is avoided if we already have the context. Link: https://lore.kernel.org/r/1578504126-9400-13-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Simplify type usage for ib_uverbs_async_handler()Jason Gunthorpe3-56/+34
This function works on an ib_uverbs_async_file. Accept that as a parameter instead of the struct ib_uverbs_file. Consoldiate all the callers working from an ib_uevent_object to a single function and locate the async_file directly from the struct ib_uobject instead of using context_ptr. Link: https://lore.kernel.org/r/1578504126-9400-11-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_wq.uobjectJason Gunthorpe2-7/+9
This is a struct ib_uwq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-10-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_srq.uobjectJason Gunthorpe2-8/+12
This is a struct ib_usrq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-9-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_qp.uobjectJason Gunthorpe3-16/+23
This is a struct ib_uqp_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-8-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not erase the type of ib_cq.uobjectJason Gunthorpe4-21/+31
This is a struct ib_ucq_object pointer, instead of using container_of() all over the place just store it with its actual type. Link: https://lore.kernel.org/r/1578504126-9400-7-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Make ib_ucq_object use ib_uevent_objectJason Gunthorpe4-35/+25
Any uobject that sends events into the async_event_file should be using ib_uevent_object so it can use the standard uevent based helper functions. CQ pushes events into both the async_event and the comp_channel in an open coded way. Move the async events related stuff to ib_uevent_object. Link: https://lore.kernel.org/r/1578504126-9400-6-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Do not allow alloc_commit to failJason Gunthorpe4-128/+96
This is a left over from an earlier version that creates a lot of complexity for error unwind, particularly for FD uobjects. The only reason this was done is so that anon_inode_get_file() could be called with the final fops and a fully setup uobject. Both need to be setup since unwinding anon_inode_get_file() via fput will call the driver's release(). Now that the driver does not provide release, we no longer need to worry about this complicated sequence, simply create the struct file at the start and allow the core code's release function to deal with the abort case. This allows all the confusing error paths around commit to be removed. Link: https://lore.kernel.org/r/1578504126-9400-5-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Simplify destruction of FD uobjectsJason Gunthorpe5-54/+40
FD uobjects have a weird split between the struct file and uobject world. Simplify this to make them pure uobjects and use a generic release method for all struct file operations. This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always called before erasing the linked list contents to make the concurrancy simpler to understand. For this to work the uobject destruction must fence anything that it is cleaning up - the design must not rely on struct file lifetime. Only deliver_event() relies on the struct file to when adding new events to the queue, add a is_destroyed check under lock to block it. Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/mlx5: Use RCU and direct refcounts to keep memory aliveJason Gunthorpe2-20/+6
dispatch_event_fd() runs from a notifier with minimal locking, and relies on RCU and a file refcount to keep the uobject and eventfd alive. As the next patch wants to remove the file_operations release function from the drivers, re-organize things so that the devx_event_notifier() path uses the existing RCU to manage the lifetime of the uobject and eventfd. Move the refcount puts to a call_rcu so that the objects are guaranteed to exist and remove the indirect file refcount. Link: https://lore.kernel.org/r/1578504126-9400-2-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/uverbs: Remove needs_kfree_rcu from uverbs_obj_type_classJason Gunthorpe1-22/+1
After device disassociation the uapi_objects are destroyed and freed, however it is still possible that core code can be holding a kref on the uobject. When it finally goes to uverbs_uobject_free() via the kref_put() it can trigger a use-after-free on the uapi_object. Since needs_kfree_rcu is a micro optimization that only benefits file uobjects, just get rid of it. There is no harm in using kfree_rcu even if it isn't required, and the number of involved objects is small. Link: https://lore.kernel.org/r/20200113143306.GA28717@ziepe.ca Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10RDMA/core: Remove err in iw_query_portGuoqing Jiang1-6/+1
Since we can return device->ops.query_port directly, so no need to keep those lines. Link: https://lore.kernel.org/r/20200109134043.15568-1-guoqing.jiang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>