aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-01-30RDMA/nldev: Factor out the PID namespace checkLeon Romanovsky1-10/+12
The PID namespace is going to be used in the .doit callback, so generalize its implementation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30RDMA/nldev: Dynamically generate restrack dumpit callbacksLeon Romanovsky1-28/+11
There is no need to manually write same callbacks, automatically generate them using C-macro language. This macro is going to be extended to generate doit callbacks too, so use general name for this macro. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30RDMA: Add indication for in kernel API support to IB deviceGal Pressman2-4/+7
Drivers that do not provide kernel verbs support should not be used by ib kernel clients at all. In case a device does not implement all mandatory verbs for kverbs usage mark it as a non kverbs provider and prevent its usage for all clients except for uverbs. The device is marked as a non kverbs provider using the 'kverbs_provider' flag which should only be set by the core code. The clients can choose whether kverbs are requested for its usage using the 'no_kverbs_req' flag which is currently set for uverbs only. This patch allows drivers to remove mandatory verbs stubs and simply set the callbacks to NULL. The IB device will be registered as a non-kverbs provider. Note that verbs that are required for the device registration process must be implemented. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30RDMA: Provide safe ib_alloc_device() functionLeon Romanovsky1-3/+3
All callers to ib_alloc_device() provide a larger size than struct ib_device and rely on the fact that struct ib_device is embedded in their driver specific structure as the first member. Provide a safer variant of ib_alloc_device() that checks and enforces this approach to make sure the drivers are using it right. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociateYishai Hadas1-5/+13
The vma->vm_mm can become impossible to get before rdma_umap_close() is called, in this case we must not try to get an mm that is already undergoing process exit. In this case there is no need to wait for anything as the VMA will be destroyed by another thread soon and is already effectively 'unreachable' by userspace. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 2050 Comm: bash Tainted: G W OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__rb_erase_color+0xb9/0x280 Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89 58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d 10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246 RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101 RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828 RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838 R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000 FS: 00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0 Call Trace: unlink_file_vma+0x3b/0x50 free_pgtables+0xa1/0x110 exit_mmap+0xca/0x1a0 ? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib] mmput+0x54/0x140 uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs] uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: <stable@vger.kernel.org> # 4.19 Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29Merge branch 'devx-async' into k.o/for-nextJason Gunthorpe2-5/+11
Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-29IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FDYishai Hadas2-5/+11
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD and its initial implementation. This object is from type class FD and will be used to read DEVX async commands completion. The core layer should allow the driver to set object from type FD in a safe mode, this option was added with a matching comment in place. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25IB/uverbs: Fix ioctl query port to consider device disassociationYishai Hadas1-1/+7
Methods cannot peak into the ufile, the only way to get a ucontext and hence a device is via the ib_uverbs_get_ucontext() call or inspecing a locked uobject. Otherwise during/after disassociation the pointers may be null or free'd. BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 PGD 800000005ece6067 P4D 800000005ece6067 PUD 5ece7067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 10631 Comm: ibv_ud_pingpong Tainted: GW OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_PORT+0x53/0x191 [ib_uverbs] Code: 80 00 00 00 31 c0 48 8b 47 40 48 8d 5c 24 38 48 8d 6c 24 08 48 89 df 48 8b 40 08 4c 8b a0 18 03 00 00 31 c0 f3 48 ab 48 89 ef <49> 83 7c 24 78 00 b1 06 f3 48 ab 0f 84 89 00 00 00 45 31 c9 31 d2 RSP: 0018:ffffb54802ccfb10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffb54802ccfb48 RCX:0000000000000000 RDX: fffffffffffffffa RSI: ffffb54802ccfcf8 RDI:ffffb54802ccfb18 RBP: ffffb54802ccfb18 R08: ffffb54802ccfd18 R09:0000000000000000 R10: 0000000000000000 R11: 00000000000000d0 R12:0000000000000000 R13: ffffb54802ccfcb0 R14: ffffb54802ccfc48 R15:ffff9f736e0059a0 FS: 00007f55a6bd7740(0000) GS:ffff9f737ba00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000078 CR3: 0000000064214000 CR4:00000000000006f0 Call Trace: ib_uverbs_cmd_verbs.isra.5+0x94d/0xa60 [ib_uverbs] ? copy_port_attr_to_resp+0x120/0x120 [ib_uverbs] ? arch_tlb_finish_mmu+0x16/0xc0 ? tlb_finish_mmu+0x1f/0x30 ? unmap_region+0xd9/0x120 ib_uverbs_ioctl+0xbc/0x120 [ib_uverbs] do_vfs_ioctl+0xa9/0x620 ? __do_munmap+0x29f/0x3a0 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f55a62cb567 Fixes: 641d1207d2ed ("IB/core: Move query port to ioctl") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25IB/uverbs: Fix OOPs upon device disassociationYishai Hadas1-4/+3
The async_file might be freed before the disassociation has been ended, causing qp shutdown to use after free on it. Since uverbs_destroy_ufile_hw is not a fence, it returns if a disassociation is ongoing in another thread. It has to be written this way to avoid deadlock. However this means that the ufile FD close cannot destroy anything that may still be used by an active kref, such as the the async_file. To fix that move the kref_put() to be in ib_uverbs_release_file(). BUG: unable to handle kernel paging request at ffffffffba682787 PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061 Oops: 0003 [#1] SMP PTI CPU: 1 PID: 32410 Comm: bash Tainted: G OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0 Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85 d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83 RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006 RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001 RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787 RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294 R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00 FS: 00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0 Call Trace: _raw_spin_lock_irq+0x27/0x30 ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs] uverbs_free_qp+0x7e/0x90 [ib_uverbs] destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs] __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs] uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fac551aac60 Cc: <stable@vger.kernel.org> # 4.2 Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-25RDMA/umem: Add missing initialization of owning_mmArtemy Kovalyov1-0/+3
When allocating a umem leaf for implicit ODP MR during page fault the field owning_mm was not set. Initialize and take a reference on this field to avoid kernel panic when trying to access this field. BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 PGD 800000022dfed067 P4D 800000022dfed067 PUD 22dfcf067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 634 Comm: kworker/u33:0 Not tainted 4.20.0-rc6+ #89 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib] RIP: 0010:ib_umem_odp_map_dma_pages+0xf3/0x710 [ib_core] Code: 45 c0 48 21 f3 48 89 75 b0 31 f6 4a 8d 04 33 48 89 45 a8 49 8b 44 24 60 48 8b 78 10 e8 66 16 a8 c5 49 8b 54 24 08 48 89 45 98 <8b> 42 58 85 c0 0f 84 8e 05 00 00 8d 48 01 48 8d 72 58 f0 0f b1 4a RSP: 0000:ffffb610813a7c20 EFLAGS: 00010202 RAX: ffff95ace6e8ac80 RBX: 0000000000000000 RCX: 000000000000000c RDX: 0000000000000000 RSI: 0000000000000850 RDI: ffff95aceaadae80 RBP: ffffb610813a7ce0 R08: 0000000000000000 R09: 0000000000080c77 R10: ffff95acfffdbd00 R11: 0000000000000000 R12: ffff95aceaa20a00 R13: 0000000000001000 R14: 0000000000001000 R15: 000000000000000c FS: 0000000000000000(0000) GS:ffff95acf7800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 000000022c834001 CR4: 00000000001606f0 Call Trace: pagefault_single_data_segment+0x1df/0xc60 [mlx5_ib] mlx5_ib_eqe_pf_action+0x7bc/0xa70 [mlx5_ib] ? __switch_to+0xe1/0x470 process_one_work+0x174/0x390 worker_thread+0x4f/0x3e0 kthread+0x102/0x140 ? drain_workqueue+0x130/0x130 ? kthread_stop+0x110/0x110 ret_from_fork+0x1f/0x30 Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24IB/core: Destroy QP if XRC QP failsYuval Avnery1-17/+24
The open-coded variant missed destroy of SELinux created QP, reuse already existing ib_detroy_qp() call and use this opportunity to clean ib_create_qp() from double prints and unclear exit paths. Reported-by: Parav Pandit <parav@mellanox.com> Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Yuval Avnery <yuvalav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24IB/mlx5: Ranges in implicit ODP MR inherit its write accessMoni Shoua1-2/+3
A sub-range in ODP implicit MR should take its write permission from the MR and not be set always to allow. Fixes: d07d1d70ce1a ("IB/umem: Update on demand page (ODP) support") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24IB/core: Declare local functions 'static'Bart Van Assche1-1/+1
This patch avoids that sparse complains about missing function declarations. Fixes: f27a0d50a4bc ("RDMA/umem: Use umem->owning_mm inside ODP") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24IB/umad: Do not check status of nonseekable_open()Parav Pandit1-23/+13
As the comment block of nonseekable_open() describes, nonseekable_open() can never fail. Several places in kernel depend on this behavior. Therefore, simplify the umad module to depend on this basic kernel functionality. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-23IB/umad: Avoid additional device reference during open()/close()Parav Pandit1-5/+4
ib_umad_init_port_dev() holds the reference of a ib_umad_device instance. ib_umad_device contains standard core device and cdev. cdev holds the reference of its parent core device. file ops holds the reference to cdev using core kernel. Therefore, there is no need to hold additional reference while opening umad related char devices. While at it, add comments to bring clarity on releasing references to ib_umd_device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21RDMA/device: Expose ib_device_try_get(()Jason Gunthorpe2-4/+10
It turns out future patches need this capability quite widely now, not just for netlink, so provide two global functions to manage the registration lock refcount. This also moves the point the lock becomes 1 to within ib_register_device() so that the semantics of the public API are very sane and clear. Calling ib_device_try_get() will fail on devices that are only allocated but not yet registered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Parav Pandit <parav@mellanox.com>
2019-01-18IB/core: Simplify rdma cgroup registrationParav Pandit3-16/+14
RDMA cgroup registration routine always returns success, so simplify function to be void and run clang formatter over whole CONFIG_CGROUP_RDMA art of core_priv.h. This reduces unwinding error path for regular registration and future net namespace change functionality for rdma device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-18RDMA/device: Use __ib_device_get_by_name() in ib_device_rename()Jason Gunthorpe1-6/+3
No reason to open code this loop. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2019-01-14RDMA/cma: Rollback source IP address if failing to acquire deviceMyungho Jung1-1/+12
If cma_acquire_dev_by_src_ip() returns error in addr_handler(), the device state changes back to RDMA_CM_ADDR_BOUND but the resolved source IP address is still left. After that, if rdma_destroy_id() is called after rdma_listen(), the device is freed without removed from listen_any_list in cma_cancel_operation(). Revert to the previous IP address if acquiring device fails. Reported-by: syzbot+f3ce716af730c8f96637@syzkaller.appspotmail.com Signed-off-by: Myungho Jung <mhjungk@gmail.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUTJason Gunthorpe4-13/+59
When the ioctl interface for the write commands was introduced it did not mark the core response with UVERBS_ATTR_F_VALID_OUTPUT. This causes rdma-core in userspace to not mark the buffers as written for valgrind. Along the same lines it turns out we have always missed marking the driver data. Fixing both of these makes valgrind work properly with rdma-core and ioctl. Fixes: 4785860e04bc ("RDMA/uverbs: Implement an ioctl that can call write and write_ex handlers") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-01-14RDMA: Introduce and use rdma_device_to_ibdev()Parav Pandit2-8/+6
Introduce and use rdma_device_to_ibdev() API for those drivers which are registering one sysfs group and also use in ib_core. In subsequent patch, device->provider_ibdev one-to-one mapping is no longer holds true during accessing sysfs entries. Therefore, introduce an API rdma_device_to_ibdev() that provides such information. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA: Rename port_callback to init_portParav Pandit3-17/+10
Most provider routines are callback routines which ib core invokes. _callback suffix doesn't convey information about when such callback is invoked. Therefore, rename port_callback to init_port. Additionally, store the init_port function pointer in ib_device_ops, so that it can be accessed in subsequent patches when binding rdma device to net namespace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udataJason Gunthorpe2-2/+31
ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
2019-01-10IB/{core,uverbs}: Move ib_umem_xxx functions from ib_core to ib_uverbsShamir Rabinovitch1-2/+2
The next patch will add dependency from ib_umem_get in to ib_uverbs so move the required ib_umem_xxx functionality to it's correct module - ib_uverbs - and avoid circular dependecy from the form of ib_core -> ib_uverbs -> ib_core in depmod. Since this now requires all drivers to be build modular if uverbs is modular, hoist the test a couple drivers had into the main kconfig and apply it to all drivers uniformly. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08RDMA/cma: Add cm_id restrack resource based on kernel or user cm_id typeSteve Wise1-1/+4
A recent regression causes a null ptr crash when dumping cm_id resources. The cma is incorrectly adding all cm_id restrack resources as kernel mode. Fixes: af8d70375d56 ("RDMA/restrack: Resource-tracker should not use uobject pointers") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08RDMA/mlx5: Embed into the code flow the ODP config optionLeon Romanovsky1-3/+0
Convert various places to more readable code, which embeds CONFIG_INFINIBAND_ON_DEMAND_PAGING into the code flow. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-08RDMA/core: Don't depend device ODP capabilities on kconfig optionLeon Romanovsky1-2/+0
Device capability bits are exposing what specific device supports from HW perspective. Those bits are not dependent on kernel configurations and RDMA/core should ensure that proper interfaces to users will be disabled if CONFIG_INFINIBAND_ON_DEMAND_PAGING is not set. Fixes: f4056bfd8ccf ("IB/core: Add on demand paging caps to ib_uverbs_ex_query_device") Fixes: 8cdd312cfed7 ("IB/mlx5: Implement the ODP capability query verb") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07RDMA/nldev: Don't expose unsafe global rkey to regular userLeon Romanovsky1-4/+0
Unsafe global rkey is considered dangerous because it exposes memory registered for all memory in the system. Only users with a QP on the same PD can use the rkey, and generally those QPs will already know the value. However, out of caution, do not expose the value to unprivleged users on the local system. Require CAP_NET_ADMIN instead. Cc: <stable@vger.kernel.org> # 4.16 Fixes: 29cf1351d450 ("RDMA/nldev: provide detailed PD information") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07IB/core: Use struct_size() in kzalloc()Gustavo A. R. Silva1-3/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07IB/cm: Use struct_size() in kmalloc()Gustavo A. R. Silva1-2/+1
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-07RDMA/uverbs: Fix post send success return value in case of errorGal Pressman1-1/+3
If get QP object fails 'ret' must be assigned with a proper error code. Fixes: 9a0738575f26 ("RDMA/uverbs: Use uverbs_response() for remaining response copying") Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-0/+1
Pull rdma fixes from Jason Gunthorpe: "Over the break a few defects were found, so this is a -rc style pull request of various small things that have been posted. - An attempt to shorten RCU grace period driven delays showed crashes during heavier testing, and has been entirely reverted - A missed merge/rebase error between the advise_mr and ib_device_ops series - Some small static analysis driven fixes from Julia and Aditya - Missed ability to create a XRC_INI in the devx verbs interop series" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: infiniband/qedr: Potential null ptr dereference of qp infiniband: bnxt_re: qplib: Check the return value of send_message IB/ipoib: drop useless LIST_HEAD IB/core: Add advise_mr to the list of known ops Revert "IB/mlx5: Fix long EEH recover time with NVMe offloads" IB/mlx5: Allow XRC INI usage via verbs in DEVX context
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds1-2/+1
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-02IB/core: Add advise_mr to the list of known opsMoni Shoua1-0/+1
We need to add advise_mr to the list of operation setters on the ib_device or otherwise callers to ib_set_device_ops() for advise_mr operation will not have their callback registered. When the advise_mr series was merged with the device ops series the SET_DEVICE_OPS() was missed. Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-28Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-12/+8
Merge misc updates from Andrew Morton: - large KASAN update to use arm's "software tag-based mode" - a few misc things - sh updates - ocfs2 updates - just about all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits) kernel/fork.c: mark 'stack_vm_area' with __maybe_unused memcg, oom: notify on oom killer invocation from the charge path mm, swap: fix swapoff with KSM pages include/linux/gfp.h: fix typo mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization memory_hotplug: add missing newlines to debugging output mm: remove __hugepage_set_anon_rmap() include/linux/vmstat.h: remove unused page state adjustment macro mm/page_alloc.c: allow error injection mm: migrate: drop unused argument of migrate_page_move_mapping() blkdev: avoid migration stalls for blkdev pages mm: migrate: provide buffer_migrate_page_norefs() mm: migrate: move migrate_page_lock_buffers() mm: migrate: lock buffers before migrate_page_move_mapping() mm: migration: factor out code to compute expected number of page references mm, page_alloc: enable pcpu_drain with zone capability kmemleak: add config to select auto scan mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init ...
2018-12-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds38-1847/+2672
Pull rdma updates from Jason Gunthorpe: "This has been a fairly typical cycle, with the usual sorts of driver updates. Several series continue to come through which improve and modernize various parts of the core code, and we finally are starting to get the uAPI command interface cleaned up. - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5, qib, rxe, usnic - Rework the entire syscall flow for uverbs to be able to run over ioctl(). Finally getting past the historic bad choice to use write() for command execution - More functional coverage with the mlx5 'devx' user API - Start of the HFI1 series for 'TID RDMA' - SRQ support in the hns driver - Support for new IBTA defined 2x lane widths - A big series to consolidate all the driver function pointers into a big struct and have drivers provide a 'static const' version of the struct instead of open coding initialization - New 'advise_mr' uAPI to control device caching/loading of page tables - Support for inline data in SRPT - Modernize how umad uses the driver core and creates cdev's and sysfs files - First steps toward removing 'uobject' from the view of the drivers" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits) RDMA/srpt: Use kmem_cache_free() instead of kfree() RDMA/mlx5: Signedness bug in UVERBS_HANDLER() IB/uverbs: Signedness bug in UVERBS_HANDLER() IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported IB/umad: Start using dev_groups of class IB/umad: Use class_groups and let core create class file IB/umad: Refactor code to use cdev_device_add() IB/umad: Avoid destroying device while it is accessed IB/umad: Simplify and avoid dynamic allocation of class IB/mlx5: Fix wrong error unwind IB/mlx4: Remove set but not used variable 'pd' RDMA/iwcm: Don't copy past the end of dev_name() string IB/mlx5: Fix long EEH recover time with NVMe offloads IB/mlx5: Simplify netdev unbinding IB/core: Move query port to ioctl RDMA/nldev: Expose port_cap_flags2 IB/core: uverbs copy to struct or zero helper IB/rxe: Reuse code which sets port state IB/rxe: Make counters thread safe IB/mlx5: Use the correct commands for UMEM and UCTX allocation ...
2018-12-28mm/mmu_notifier: use structure for invalidate_range_start/end callbackJérôme Glisse1-12/+8
Patch series "mmu notifier contextual informations", v2. This patchset adds contextual information, why an invalidation is happening, to mmu notifier callback. This is necessary for user of mmu notifier that wish to maintains their own data structure without having to add new fields to struct vm_area_struct (vma). For instance device can have they own page table that mirror the process address space. When a vma is unmap (munmap() syscall) the device driver can free the device page table for the range. Today we do not have any information on why a mmu notifier call back is happening and thus device driver have to assume that it is always an munmap(). This is inefficient at it means that it needs to re-allocate device page table on next page fault and rebuild the whole device driver data structure for the range. Other use case beside munmap() also exist, for instance it is pointless for device driver to invalidate the device page table when the invalidation is for the soft dirtyness tracking. Or device driver can optimize away mprotect() that change the page table permission access for the range. This patchset enables all this optimizations for device drivers. I do not include any of those in this series but another patchset I am posting will leverage this. The patchset is pretty simple from a code point of view. The first two patches consolidate all mmu notifier arguments into a struct so that it is easier to add/change arguments. The last patch adds the contextual information (munmap, protection, soft dirty, clear, ...). This patch (of 3): To avoid having to change many callback definition everytime we want to add a parameter use a structure to group all parameters for the mmu_notifier invalidate_range_start/end callback. No functional changes with this patch. [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc] Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Jason Gunthorpe <jgg@mellanox.com> [infiniband] Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-22IB/uverbs: Signedness bug in UVERBS_HANDLER()Dan Carpenter1-1/+1
The "num_sge" variable needs to be signed for the error handling to work. The uverbs_attr_ptr_get_array_size() returns int so this change is safe. Fixes: ad8a4496757f ("IB/uverbs: Add support to advise_mr") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21IB/umad: Start using dev_groups of classParav Pandit1-16/+12
Start using core defined dev_groups of a class which allows to add device attributes to the core kernel and simplify the umad module. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21IB/umad: Use class_groups and let core create class fileParav Pandit1-9/+14
Use class->class_groups core kernel facility to create the abi version file instead of open coding. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21IB/umad: Refactor code to use cdev_device_add()Parav Pandit1-85/+86
Refactor code to use cdev_device_add() and do other minor refactors while modifying these functions as below. 1. Instead of returning generic -1, return an actual error for ib_umad_init_port(). 2. Introduce and use ib_umad_init_port_dev() for sm and umad char devices. 3. Instead of kobj, use more light weight kref to refcount ib_umad_device. 4. Use modern cdev_device_add() single code cut down three steps of cdev_add(), device_create(). This further helps to move device sysfs files to class attributes in subsequent patch. 5. Remove few empty lines while refactoring these functions. 6. Use sizeof() instead of sizeof to avoid checkpatch warning. 7. Use struct_size() for calculation of ib_umad_port. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21IB/umad: Avoid destroying device while it is accessedParav Pandit1-9/+13
ib_umad_reg_agent2() and ib_umad_reg_agent() access the device name in dev_notice(), while concurrently, ib_umad_kill_port() can destroy the device using device_destroy(). cpu-0 cpu-1 ----- ----- ib_umad_ioctl() [...] ib_umad_kill_port() device_destroy(dev) ib_umad_reg_agent() dev_notice(dev) Therefore, first mark ib_dev as NULL, to block any further access in file ops, unregister the mad agent and destroy the device at the end after mutex is unlocked. This ensures that device doesn't get destroyed, while it may get accessed. Fixes: 0f29b46d49b0 ("IB/mad: add new ioctl to ABI to support new registration options") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-21IB/umad: Simplify and avoid dynamic allocation of classParav Pandit1-21/+21
Simplify code to have a static structure instance for umad class allocation. This will allow to have class attributes defined along with class registration in subsequent patch and allows more class methods definition similar to ib_core module. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20RDMA/iwcm: Don't copy past the end of dev_name() stringSteve Wise1-4/+8
We now use dev_name(&ib_device->dev) instead of ib_device->name in iwpm messages. The name field in struct device is a const char *, where as ib_device->name is a char array of size IB_DEVICE_NAME_MAX, and it is pre-initialized to zeros. Since iw_cm_map() was using memcpy() to copy in the device name, and copying IWPM_DEVNAME_SIZE bytes, it ends up copying past the end of the source device name string and copying random bytes. This results in iwpmd failing the REGISTER_PID request from iwcm. Thus port mapping is broken. Validate the device and if names, and use strncpy() to inialize the entire message field. Fixes: 896de0090a85 ("RDMA/core: Use dev_name instead of ibdev->name") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20IB/core: Move query port to ioctlMichael Guralnik3-53/+104
Add a method for query port under the uverbs global methods. Current ib_port_attr struct is passed as a single attribute and port_cap_flags2 is added as a new attribute to the function. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20RDMA/nldev: Expose port_cap_flags2Michael Guralnik1-3/+6
port_cap_flags2 represents IBTA PortInfo:CapabilityMask2. The field safely extends the RDMA_NLDEV_ATTR_CAP_FLAGS operand as it was exported as 64 bit to allow this kind of extension. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20IB/core: uverbs copy to struct or zero helperMichael Guralnik1-0/+11
Add a helper to zero fill fields before copying data to UVERBS_ATTR_STRUCT. As UVERBS_ATTR_STRUCT can be used as an extensible struct, we want to make sure that if the user supplies us with a struct that has new fields that we are not aware of, we return them zeroed to the user. This helper should be used when using UVERBS_ATTR_STRUCT for an extendable data structure and there is a need to make sure that extended members of the struct, that the kernel doesn't handle, are returned zeroed to the user. This is needed due to the fact that UVERBS_ATTR_STRUCT allows non-zero values for members after 'last' member. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+3
Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19RDMA: Mark if destroy address handle is in a sleepable contextGal Pressman8-16/+20
Introduce a 'flags' field to destroy address handle callback and add a flag that marks whether the callback is executed in an atomic context or not. This will allow drivers to wait for completion instead of polling for it when it is allowed. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-19RDMA: Mark if create address handle is in a sleepable contextGal Pressman3-7/+13
Introduce a 'flags' field to create address handle callback and add a flag that marks whether the callback is executed in an atomic context or not. This will allow drivers to wait for completion instead of polling for it when it is allowed. Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>