aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-08-31Merge tag 'v5.9-rc3' into rdma.git for-nextJason Gunthorpe6-13/+13
Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31RDMA/core: Trigger a WARN_ON if the driver causes uobjects to become leakedJason Gunthorpe1-1/+2
Drivers that fail destroy can cause uverbs to leak uobjects. Drivers are required to always eventually destroy their ubojects, so trigger a WARN_ON to detect this driver bug. Link: https://lore.kernel.org/r/0-v1-b1e0ed400ba9+f7-warn_destroy_ufile_hw_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Remove closing and the close_wqJason Gunthorpe1-34/+15
Use cancel_work_sync() to ensure that the wq is not running and simply assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the close_wq since flush_workqueue() is no longer needed. Link: https://lore.kernel.org/r/20200818120526.702120-15-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Rework how new connections are passed through event deliveryJason Gunthorpe1-126/+96
When a new connection is established the RDMA CM creates a new cm_id and passes it through to the event handler. However inside the UCMA the new ID is not assigned a ucma_context until the user retrieves the event from a syscall. This creates a weird edge condition where a cm_id's context can continue to point at the listening_id that created it, and a number of additional edge conditions on event list clean up related to destroying half created IDs. There is also a race condition in ucma_get_events() where the cm_id->context is being assigned without holding the handler_mutex. Simplify all of this by creating the ucma_context inside the event handler itself and eliminating the edge case of a half created cm_id. All cm_id's can be uniformly destroyed via __destroy_id() or via the close_work. Link: https://lore.kernel.org/r/20200818120526.702120-14-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Narrow file->mut in ucma_event_handler()Jason Gunthorpe1-9/+7
Since the backlog is now an atomic the file->mut is now only protecting the event_list and ctx_list. Narrow its scope to make it clear Link: https://lore.kernel.org/r/20200818120526.702120-13-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Change backlog into an atomicJason Gunthorpe1-7/+8
There is no reason to grab the file->mut just to do this inc/dec work. Use an atomic. Link: https://lore.kernel.org/r/20200818120526.702120-12-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Add missing locking around rdma_leave_multicast()Jason Gunthorpe1-0/+2
All entry points to the rdma_cm from a ULP must be single threaded, even this error unwinds. Add the missing locking. Fixes: 7c11910783a1 ("RDMA/ucma: Put a lock around every call to the rdma_cm layer") Link: https://lore.kernel.org/r/20200818120526.702120-11-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Fix locking for ctx->events_reportedJason Gunthorpe1-1/+3
This value is locked under the file->mut, ensure it is held whenever touching it. The case in ucma_migrate_id() is a race, while in ucma_free_uctx() it is already not possible for the write side to run, the movement is just for clarity. Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()") Link: https://lore.kernel.org/r/20200818120526.702120-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Fix the locking of ctx->fileJason Gunthorpe1-1/+5
ctx->file is changed under the file->mut lock by ucma_migrate_id(), which is impossible to lock correctly. Instead change ctx->file under the handler_lock and ctx_table lock and revise all places touching ctx->file to use this locking when reading ctx->file. Link: https://lore.kernel.org/r/20200818120526.702120-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Do not use file->mut to lock destroyingJason Gunthorpe1-3/+11
The only reader of destroying is inside a handler under the handler_mutex, so directly use the handler_mutex when setting it instead of the larger file->mut. As the refcount could be zero here, and the cm_id already freed, and additional refcount grab around the locking is required to touch the cm_id. Link: https://lore.kernel.org/r/20200818120526.702120-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/cma: Add missing locking to rdma_accept()Jason Gunthorpe2-7/+30
In almost all cases rdma_accept() is called under the handler_mutex by ULPs from their handler callbacks. The one exception was ucma which did not get the handler_mutex. To improve the understand-ability of the locking scheme obtain the mutex for ucma as well. This improves how ucma works by allowing it to directly use handler_mutex for some of its internal locking against the handler callbacks intead of the global file->mut lock. There does not seem to be a serious bug here, other than a DISCONNECT event can be delivered concurrently with accept succeeding. Link: https://lore.kernel.org/r/20200818120526.702120-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Remove mc_list and rely on xarrayJason Gunthorpe1-37/+22
It is not really necessary to keep a linked list of mcs associated with each context when we can just scan the xarray to find the right things. The removes another overloading of file->mut by relying on the xarray locking for mc instead. Link: https://lore.kernel.org/r/20200818120526.702120-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Fix error cases around ucma_alloc_ctx()Jason Gunthorpe1-26/+42
The store to ctx->cm_id was based on the idea that _ucma_find_context() would not return the ctx until it was fully setup. Without locking this doesn't work properly. Split things so that the xarray is allocated with NULL to reserve the ID and once everything is final set the cm_id and store. Along the way this shows that the error unwind in ucma_get_event() if a new ctx is created is wrong, fix it up. Link: https://lore.kernel.org/r/20200818120526.702120-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Consolidate the two destroy flowsJason Gunthorpe1-42/+22
ucma_close() is open coding the tail end of ucma_destroy_id(), consolidate this duplicated code into a function. Link: https://lore.kernel.org/r/20200818120526.702120-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Remove unnecessary locking of file->ctx_list in closeJason Gunthorpe1-4/+7
During the file_operations release function it is already not possible that write() can be running concurrently, remove the extra locking around the ctx_list. Link: https://lore.kernel.org/r/20200818120526.702120-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx()Jason Gunthorpe1-2/+2
Both ucma_destroy_id() and ucma_close_id() (triggered from an event via a wq) can drive the refcount to zero. ucma_get_ctx() was wrongly assuming that the refcount can only go to zero from ucma_destroy_id() which also removes it from the xarray. Use refcount_inc_not_zero() instead. Link: https://lore.kernel.org/r/20200818120526.702120-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24RDMA/cm: Add tracepoints to track MAD send operationsChuck Lever2-2/+125
Surface the operation of MAD exchanges during connection establishment. Some samples: [root@klimt ~]# trace-cmd report -F ib_cma cpus=4 kworker/0:4-123 [000] 60.677388: icm_send_rep: local_id=1965336542 remote_id=1096195961 state=REQ_RCVD lap_state=LAP_UNINIT kworker/u8:11-391 [002] 60.678808: icm_send_req: local_id=1982113758 remote_id=0 state=IDLE lap_state=LAP_UNINIT kworker/0:4-123 [000] 60.679652: icm_send_rtu: local_id=1982113758 remote_id=1079418745 state=REP_RCVD lap_state=LAP_UNINIT nfsd-1954 [001] 60.691350: icm_send_rep: local_id=1998890974 remote_id=1129750393 state=MRA_REQ_SENT lap_state=LAP_UNINIT nfsd-1954 [003] 62.017931: icm_send_drep: local_id=1998890974 remote_id=1129750393 state=TIMEWAIT lap_state=LAP_UNINIT Link: https://lore.kernel.org/r/159767240197.2968.12048458026453596018.stgit@klimt.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24RDMA/cm: Replace pr_debug() call sites with tracepointsChuck Lever4-55/+351
In the interest of converging on a common instrumentation infrastructure, modernize the pr_debug() call sites added by commit 119bf81793ea ("IB/cm: Add debug prints to ib_cm"). The new tracepoints appear in a new "ib_cma" subsystem. The conversion is somewhat mechanical. Someone more familiar with the semantics of the recorded information might suggest additional data capture. Some benefits include: - Tracepoints enable "always on" reporting of these errors - The error records are structured and compact - Tracepoints provide hooks for eBPF scripts Sample output: nfsd-1954 [003] 62.017901: icm_dreq_skipped: local_id=1998890974 remote_id=1129750393 state=DREQ_RCVD lap_state=LAP_UNINIT Link: https://lore.kernel.org/r/159767239665.2968.10613294222688696646.stgit@klimt.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-24RDMA/core: Move the rdma_show_ib_cm_event() macroChuck Lever1-40/+0
Refactor: Make it globally available in the utilities header. Link: https://lore.kernel.org/r/159767239131.2968.9520990257041764685.stgit@klimt.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva5-12/+12
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-20RDMA/core: Fix spelling mistake "Could't" -> "Couldn't"Colin Ian King1-1/+1
There is a spelling mistake in a pr_warn message. Fix it. Link: https://lore.kernel.org/r/20200810075824.46770-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA/cm: Remove unused cm_classJason Gunthorpe1-24/+0
Previous commits removed all references to the /sys/class/infiniband_cm/ directory represented by the cm_class symbol. Remove the directory and cm_class. Fixes: a1a8e4a85cf7 ("rdma: Delete the ib_ucm module") Link: https://lore.kernel.org/r/0-v1-90096a98c476+205-remove_cm_leftovers_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-18RDMA: Remove constant domain argument from flow creation callLeon Romanovsky1-2/+2
The "domain" argument is constant and modern device (mlx5) doesn't support anything except IB_FLOW_DOMAIN_USER, so delete this extra parameter and simplify code. Link: https://lore.kernel.org/r/20200730081235.1581127-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-12mm/gup: remove task_struct pointer for all gup codePeter Xu1-1/+1
After the cleanup of page fault accounting, gup does not need to pass task_struct around any more. Remove that parameter in the whole gup stack. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds20-670/+665
Pull rdma updates from Jason Gunthorpe: "A quiet cycle after the larger 5.8 effort. Substantially cleanup and driver work with a few smaller features this time. - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re - Removal of dead or redundant code across the drivers - RAW resource tracker dumps to include a device specific data blob for device objects to aide device debugging - Further advance the IOCTL interface, remove the ability to turn it off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands - Remove stubs related to devices with no pkey table - A shared CQ scheme to allow multiple ULPs to share the CQ rings of a device to give higher performance - Several more static checker, syzkaller and rare crashers fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits) RDMA/mlx5: Fix flow destination setting for RDMA TX flow table RDMA/rxe: Remove pkey table RDMA/umem: Add a schedule point in ib_umem_get() RDMA/hns: Fix the unneeded process when getting a general type of CQE error RDMA/hns: Fix error during modify qp RTS2RTS RDMA/hns: Delete unnecessary memset when allocating VF resource RDMA/hns: Remove redundant parameters in set_rc_wqe() RDMA/hns: Remove support for HIP08_A RDMA/hns: Refactor hns_roce_v2_set_hem() RDMA/hns: Remove redundant hardware opcode definitions RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP RDMA/include: Replace license text with SPDX tags RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting RDMA/cma: Execute rdma_cm destruction from a handler properly RDMA/cma: Remove unneeded locking for req paths RDMA/cma: Using the standard locking pattern when delivering the removal event RDMA/cma: Simplify DEVICE_REMOVAL for internal_id RDMA/efa: Add EFA 0xefa1 PCI ID RDMA/efa: User/kernel compatibility handshake mechanism ...
2020-08-04Merge tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-1/+5
Pull dma-mapping updates from Christoph Hellwig: - make support for dma_ops optional - move more code out of line - add generic support for a dma_ops bypass mode - misc cleanups * tag 'dma-mapping-5.9' of git://git.infradead.org/users/hch/dma-mapping: dma-contiguous: cleanup dma_alloc_contiguous dma-debug: use named initializers for dir2name powerpc: use the generic dma_ops_bypass mode dma-mapping: add a dma_ops_bypass flag to struct device dma-mapping: make support for dma ops optional dma-mapping: inline the fast path dma-direct calls dma-mapping: move the remaining DMA API calls out of line
2020-08-04Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds1-2/+2
Pull uninitialized_var() macro removal from Kees Cook: "This is long overdue, and has hidden too many bugs over the years. The series has several "by hand" fixes, and then a trivial treewide replacement. - Clean up non-trivial uses of uninitialized_var() - Update documentation and checkpatch for uninitialized_var() removal - Treewide removal of uninitialized_var()" * tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: compiler: Remove uninitialized_var() macro treewide: Remove uninitialized_var() usage checkpatch: Remove awareness of uninitialized_var() macro mm/debug_vm_pgtable: Remove uninitialized_var() usage f2fs: Eliminate usage of uninitialized_var() macro media: sur40: Remove uninitialized_var() usage KVM: PPC: Book3S PR: Remove uninitialized_var() usage clk: spear: Remove uninitialized_var() usage clk: st: Remove uninitialized_var() usage spi: davinci: Remove uninitialized_var() usage ide: Remove uninitialized_var() usage rtlwifi: rtl8192cu: Remove uninitialized_var() usage b43: Remove uninitialized_var() usage drbd: Remove uninitialized_var() usage x86/mm/numa: Remove uninitialized_var() usage docs: deprecated.rst: Add uninitialized_var()
2020-07-31RDMA/umem: Add a schedule point in ib_umem_get()Eric Dumazet1-0/+1
Mapping as little as 64GB can take more than 10 seconds, triggering issues on kernels with CONFIG_PREEMPT_NONE=y. ib_umem_get() already splits the work in 2MB units on x86_64, adding a cond_resched() in the long-lasting loop is enough to solve the issue. Note that sg_alloc_table() can still use more than 100 ms, which is also problematic. This might be addressed later in ib_umem_add_sg_table(), adding new blocks in sgl on demand. Link: https://lore.kernel.org/r/20200730015755.1827498-1-edumazet@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30RDMA/core: Free DIM memory in error unwindLeon Romanovsky1-0/+1
The memory allocated for the DIM wasn't freed in in error unwind path, fix it by calling to rdma_dim_destroy(). Fixes: da6629793aa6 ("RDMA/core: Provide RDMA DIM support for ULPs") Link: https://lore.kernel.org/r/20200730082719.1582397-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com <mailto:maxg@mellanox.com>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-30RDMA/core: Stop DIM before destroying CQLeon Romanovsky1-3/+10
HW destroy operation should be last operation after all possible CQ users completed their work, so move DIM work cancellation before such destroy call. Fixes: da6629793aa6 ("RDMA/core: Provide RDMA DIM support for ULPs") Link: https://lore.kernel.org/r/20200730082719.1582397-3-leon@kernel.org Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QPMark Zhang1-3/+0
When dumping QPs bound to a counter, raw QPs should be allowed to dump without the CAP_NET_RAW privilege. This is consistent with what "rdma res show qp" does. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/20200727095828.496195-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/cma: Execute rdma_cm destruction from a handler properlyJason Gunthorpe1-90/+84
When a rdma_cm_id needs to be destroyed after a handler callback fails, part of the destruction pattern is open coded into each call site. Unfortunately the blind assignment to state discards important information needed to do cma_cancel_operation(). This results in active operations being left running after rdma_destroy_id() completes, and the use-after-free bugs from KASAN. Consolidate this entire pattern into destroy_id_handler_unlock() and manage the locking correctly. The state should be set to RDMA_CM_DESTROYING under the handler_lock to atomically ensure no futher handlers are called. Link: https://lore.kernel.org/r/20200723070707.1771101-5-leon@kernel.org Reported-by: syzbot+08092148130652a6faae@syzkaller.appspotmail.com Reported-by: syzbot+a929647172775e335941@syzkaller.appspotmail.com Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/cma: Remove unneeded locking for req pathsJason Gunthorpe1-25/+6
The REQ flows are concerned that once the handler is called on the new cm_id the ULP can choose to trigger a rdma_destroy_id() concurrently at any time. However, this is not true, while the ULP can call rdma_destroy_id(), it immediately blocks on the handler_mutex which prevents anything harmful from running concurrently. Remove the confusing extra locking and refcounts and make the handler_mutex protecting state during destroy more clear. Link: https://lore.kernel.org/r/20200723070707.1771101-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/cma: Using the standard locking pattern when delivering the removal eventJason Gunthorpe1-26/+36
Whenever an event is delivered to the handler it should be done under the handler_mutex and upon any non-zero return from the handler it should trigger destruction of the cm_id. cma_process_remove() skips some steps here, it is not necessarily wrong since the state change should prevent any races, but it is confusing and unnecessary. Follow the standard pattern here, with the slight twist that the transition to RDMA_CM_DEVICE_REMOVAL includes a cma_cancel_operation(). Link: https://lore.kernel.org/r/20200723070707.1771101-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-29RDMA/cma: Simplify DEVICE_REMOVAL for internal_idJason Gunthorpe1-1/+5
cma_process_remove() triggers an unconditional rdma_destroy_id() for internal_id's and skips the event deliver and transition through RDMA_CM_DEVICE_REMOVAL. This is confusing and unnecessary. internal_id always has cma_listen_handler() as the handler, have it catch the RDMA_CM_DEVICE_REMOVAL event and directly consume it and signal removal. This way the FSM sequence never skips the DEVICE_REMOVAL case and the logic in this hard to test area is simplified. Link: https://lore.kernel.org/r/20200723070707.1771101-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27RDMA/core: Fix return error value in _ib_modify_qp() to negativeLi Heng1-1/+1
The error codes in _ib_modify_qp() are supposed to be negative errno. Fixes: 7a5c938b9ed0 ("IB/core: Check for rdma_protocol_ib only after validating port_num") Link: https://lore.kernel.org/r/1595645787-20375-1-git-send-email-liheng40@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Heng <liheng40@huawei.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-27RDMA/cm: Add min length checks to user structure copiesJason Gunthorpe1-0/+4
These are missing throughout ucma, it harmlessly copies garbage from userspace, but in this new code which uses min to compute the copy length it can result in uninitialized stack memory. Check for minimum length at the very start. BUG: KMSAN: uninit-value in ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 CPU: 0 PID: 8457 Comm: syz-executor069 Not tainted 5.8.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1df/0x240 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 ucma_write+0x5c5/0x630 drivers/infiniband/core/ucma.c:1764 do_loop_readv_writev fs/read_write.c:737 [inline] do_iter_write+0x710/0xdc0 fs/read_write.c:1020 vfs_writev fs/read_write.c:1091 [inline] do_writev+0x42d/0x8f0 fs/read_write.c:1134 __do_sys_writev fs/read_write.c:1207 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1204 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1204 do_syscall_64+0xb0/0x150 arch/x86/entry/common.c:386 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 34e2ab57a911 ("RDMA/ucma: Extend ucma_connect to receive ECE parameters") Fixes: 0cb15372a615 ("RDMA/cma: Connect ECE to rdma_accept") Link: https://lore.kernel.org/r/0-v1-d5b86dab17dc+28c25-ucma_syz_min_jgg@nvidia.com Reported-by: syzbot+086ab5ca9eafd2379aa6@syzkaller.appspotmail.com Reported-by: syzbot+7446526858b83c8828b2@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/uverbs: Silence shiftTooManyBitsSigned warningLeon Romanovsky1-1/+1
Fix reported by kbuild warning. drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned] BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31)); ^ Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/uverbs: Remove redundant assignmentsLeon Romanovsky1-5/+5
The kbuild reported the following warning, so clean whole uverbs_cmd.c file. drivers/infiniband/core/uverbs_cmd.c:1066:6: warning: Variable 'ret' is reassigned a value before the old one has been used. [redundantAssignment] ret = uverbs_request(attrs, &cmd, sizeof(cmd)); ^ drivers/infiniband/core/uverbs_cmd.c:1064:0: note: Variable 'ret' is reassigned a value before the old one has been used. int ret = -EINVAL; ^ Link: https://lore.kernel.org/r/20200720175627.1273096-2-leon@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/mlx5: Add missing srcu_read_lock in ODP implicit flowMaor Gottlieb2-1/+4
According to the locking scheme, mlx5_ib_update_xlt() should be called with srcu_read_lock(dev->odp->srcu). Prefetch missed this. This fixes the below WARN from lockdep_assert_held(): WARNING: CPU: 1 PID: 1130 at drivers/infiniband/hw/mlx5/odp.c:132 mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core CPU: 1 PID: 1130 Comm: kworker/u16:11 Tainted: G W 5.8.0-rc5_for_upstream_debug_2020_07_13_11_04 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib] RIP: 0010:mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib] Code: 08 e2 85 c0 0f 84 65 ff ff ff 49 8b 87 60 01 00 00 be ff ff ff ff 48 8d b8 b0 39 00 00 e8 93 e0 50 e1 85 c0 0f 85 45 ff ff ff <0f> 0b e9 3e ff ff ff 0f 0b eb c7 0f 1f 44 00 00 48 8b 87 98 0f 00 RSP: 0018:ffff88840f44fc68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88840cc9d000 RCX: ffff88840efcd940 RDX: 0000000000000000 RSI: ffff88844871b9b0 RDI: ffff88840efce100 RBP: ffff88840cc9d040 R08: 0000000000000040 R09: 0000000000000001 R10: ffff88846ced3068 R11: 0000000000000000 R12: 00000000000156ec R13: 0000000000000004 R14: 0000000000000004 R15: ffff888439941000 FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8536d12430 CR3: 0000000437a5e006 CR4: 0000000000360ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_ib_update_xlt+0x37c/0x7c0 [mlx5_ib] pagefault_mr+0x315/0x440 [mlx5_ib] mlx5_ib_prefetch_mr_work+0x56/0xa0 [mlx5_ib] process_one_work+0x215/0x5c0 worker_thread+0x3c/0x380 ? process_one_work+0x5c0/0x5c0 kthread+0x133/0x150 ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x30 Hold the SRCU during prefetch, even though it strictly isn't needed since prefetch is holding the num_deferred_work it does make it easier to reason about. Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") Link: https://lore.kernel.org/r/20200719065747.131157-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/core: Update write interface to use automatic object lifetimeLeon Romanovsky1-207/+86
The automatic object lifetime model allows us to change the write() interface to have the same logic as the ioctl() path. Update the create/alloc functions to be in the following format, so the code flow will be the same: * Allocate objects * Initialize them * Call to the drivers, this is last step that is allowed to fail * Finalize object * Return response and allow to core code to handle abort/commit respectively. Link: https://lore.kernel.org/r/20200719052223.75245-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-24RDMA/core: Align abort/commit object scheme for write() and ioctl() pathsLeon Romanovsky2-1/+10
Create the same logic flow for the write() interface as we have for the ioctl() path by making sure that the object is committed or aborted automatically after HW object creation. Link: https://lore.kernel.org/r/20200719052223.75245-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20RDMA/core: Remove query_pkey from the mandatory opsKamal Heib1-1/+3
The query_pkey() isn't mandatory for the iwarp providers, so remove this requirement from the RDMA core. Link: https://lore.kernel.org/r/20200714183414.61069-4-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20RDMA/core: Allocate the pkey cache only if the pkey_tbl_len is setKamal Heib1-16/+29
Allocate the pkey cache only if the pkey_tbl_len is set by the provider, also add checks to avoid accessing the pkey cache when it not initialized. Link: https://lore.kernel.org/r/20200714183414.61069-3-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-20RDMA/core: Expose pkeys sysfs files only if pkey_tbl_len is setKamal Heib1-20/+41
Expose the pkeys sysfs files only if the pkey_tbl_len is set by the providers. Link: https://lore.kernel.org/r/20200714183414.61069-2-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-19dma-mapping: make support for dma ops optionalChristoph Hellwig1-1/+5
Avoid the overhead of the dma ops support for tiny builds that only use the direct mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2020-07-16treewide: Remove uninitialized_var() usageKees Cook1-2/+2
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16RDMA/cm: Protect access to remote_sidr_tableMaor Gottlieb1-0/+2
cm.lock must be held while accessing remote_sidr_table. This fixes the below NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 2 PID: 7288 Comm: udaddy Not tainted 5.7.0_for_upstream_perf_2020_06_09_15_14_20_38 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:rb_erase+0x10d/0x360 Code: 00 00 00 48 89 c1 48 89 d0 48 8b 50 08 48 39 ca 74 48 f6 02 01 75 af 48 8b 7a 10 48 89 c1 48 83 c9 01 48 89 78 08 48 89 42 10 <48> 89 0f 48 8b 08 48 89 0a 48 83 e1 fc 48 89 10 0f 84 b1 00 00 00 RSP: 0018:ffffc90000f77c30 EFLAGS: 00010086 RAX: ffff8883df27d458 RBX: ffff8883df27da58 RCX: ffff8883df27d459 RDX: ffff8883d183fa58 RSI: ffffffffa01e8d00 RDI: 0000000000000000 RBP: ffff8883d62ac800 R08: 0000000000000000 R09: 00000000000000ce R10: 000000000000000a R11: 0000000000000000 R12: ffff8883df27da00 R13: ffffc90000f77c98 R14: 0000000000000130 R15: 0000000000000000 FS: 00007f009f877740(0000) GS:ffff8883f1a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000003d467e003 CR4: 0000000000160ee0 Call Trace: cm_send_sidr_rep_locked+0x15a/0x1a0 [ib_cm] ib_send_cm_sidr_rep+0x2b/0x50 [ib_cm] cma_send_sidr_rep+0x8b/0xe0 [rdma_cm] __rdma_accept+0x21d/0x2b0 [rdma_cm] ? ucma_get_ctx+0x2b/0xe0 [rdma_ucm] ? _copy_from_user+0x30/0x60 ucma_accept+0x13e/0x1e0 [rdma_ucm] ucma_write+0xb4/0x130 [rdma_ucm] vfs_write+0xad/0x1a0 ksys_write+0x9d/0xb0 do_syscall_64+0x48/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f009ef60924 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 2a ef 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 RSP: 002b:00007fff843edf38 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000055743042e1d0 RCX: 00007f009ef60924 RDX: 0000000000000130 RSI: 00007fff843edf40 RDI: 0000000000000003 RBP: 00007fff843ee0e0 R08: 0000000000000000 R09: 0000557430433090 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fff843edf40 R14: 000000000000038c R15: 00000000ffffff00 CR2: 0000000000000000 Fixes: 6a8824a74bc9 ("RDMA/cm: Allow ib_send_cm_sidr_rep() to be done under lock") Link: https://lore.kernel.org/r/20200716105519.1424266-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16RDMA/core: Fix race in rdma_alloc_commit_uobject()Leon Romanovsky1-3/+3
The FD should not be installed until all of the setup is completed as the fd_install() transfers ownership of the kref to the FD table. A thread can race a close() and trigger concurrent rdma_alloc_commit_uobject() and uverbs_uobject_fd_release() which, at least, triggers a safety WARN_ON: WARNING: CPU: 4 PID: 6913 at drivers/infiniband/core/rdma_core.c:768 uverbs_uobject_fd_release+0x202/0x230 Kernel panic - not syncing: panic_on_warn set ... CPU: 4 PID: 6913 Comm: syz-executor.3 Not tainted 5.7.0-rc2 #22 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [..] RIP: 0010:uverbs_uobject_fd_release+0x202/0x230 Code: fe 4c 89 e7 e8 af 23 fe ff e9 2a ff ff ff e8 c5 fa 61 fe be 03 00 00 00 4c 89 e7 e8 68 eb f5 fe e9 13 ff ff ff e8 ae fa 61 fe <0f> 0b eb ac e8 e5 aa 3c fe e8 50 2b 86 fe e9 6a fe ff ff e8 46 2b RSP: 0018:ffffc90008117d88 EFLAGS: 00010293 RAX: ffff88810e146580 RBX: 1ffff92001022fb1 RCX: ffffffff82d5b902 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88811951b040 RBP: ffff88811951b000 R08: ffffed10232a3609 R09: ffffed10232a3609 R10: ffff88811951b043 R11: 0000000000000001 R12: ffff888100a7c600 R13: ffff888100a7c650 R14: ffffc90008117da8 R15: ffffffff82d5b700 ? __uverbs_cleanup_ufile+0x270/0x270 ? uverbs_uobject_fd_release+0x202/0x230 ? uverbs_uobject_fd_release+0x202/0x230 ? __uverbs_cleanup_ufile+0x270/0x270 ? locks_remove_file+0x282/0x3d0 ? security_file_free+0xaa/0xd0 __fput+0x2be/0x770 task_work_run+0x10e/0x1b0 exit_to_usermode_loop+0x145/0x170 do_syscall_64+0x2d0/0x390 ? prepare_exit_to_usermode+0x17a/0x230 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x414da7 Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f f3 c3 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 f4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 36 fc ff ff 8b 44 24 RSP: 002b:00007fff39d379d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000414da7 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 00007fff39d37a3c R08: 0000000400000000 R09: 0000000400000000 R10: 00007fff39d37910 R11: 0000000000000293 R12: 0000000000000001 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003 Reorder so that fd_install() is the last thing done in rdma_alloc_commit_uobject(). Fixes: aba94548c9e4 ("IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit") Link: https://lore.kernel.org/r/20200716102059.1420681-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-10RDMA/counter: Allow manually bind QPs with different pids to same counterMark Zhang1-1/+1
In manual mode allow bind user QPs with different pids to same counter, since this is allowed in auto mode. Bind kernel QPs and user QPs to the same counter are not allowed. Fixes: 1bd8e0a9d0fd ("RDMA/counter: Allow manual mode configuration support") Link: https://lore.kernel.org/r/20200702082933.424537-4-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>