aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core/ucma.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-05-28RDMA/core: Use the DEVICE_ATTR_RO macroYueHaibing1-4/+3
Use the DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(), which makes the code a bit shorter and easier to read. Link: https://lore.kernel.org/r/20210526132949.20184-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-11RDMA/ucma: Cleanup to reduce duplicate codeXiaofei Tan1-2/+2
The lable "err1" does the same thing as the branch of copy_to_user() failed in the function ucma_create_id(). Just jump to the label directly to reduce duplicate code. Link: https://lore.kernel.org/r/1620291106-3675-1-git-send-email-tanxiaofei@huawei.com Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/core: Add necessary spacesWenpeng Liang1-2/+2
Space is required before '(' of switch statements and around '='. Link: https://lore.kernel.org/r/1617783353-48249-4-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-04-12RDMA/core: Print the function name by __func__ instead of an fixed stringWenpeng Liang1-2/+2
It's better to use __func__ than a fixed string to print a function's name. Link: https://lore.kernel.org/r/1617783353-48249-2-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-06RDMA/ucma: Do not miss ctx destruction steps in some casesJason Gunthorpe1-63/+72
The destruction flow is very complicated here because the cm_id can be destroyed from the event handler at any time if the device is hot-removed. This leaves behind a partial ctx with no cm_id in the xarray, and will let user space leak memory. Make everything consistent in this flow in all places: - Return the xarray back to XA_ZERO_ENTRY before beginning any destruction. The thread that reaches this first is responsible to kfree, everyone else does nothing. - Test the xarray during the special hot-removal case to block the queue_work, this has much simpler locking and doesn't require a 'destroying' - Fix the ref initialization so that it is only positive if cm_id != NULL, then rely on that to guide the destruction process in all cases. Now the new ucma_destroy_private_ctx() can be called in all places that want to free the ctx, including all the error unwinds, and none of the details are missed. Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery") Link: https://lore.kernel.org/r/20210105111327.230270-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26RDMA: Convert sysfs device * show functions to use sysfs_emit()Joe Perches1-1/+1
Done with cocci script: @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16RDMA/ucma: Fix use after free in destroy id flowMaor Gottlieb1-5/+6
ucma_free_ctx() should call to __destroy_id() on all the connection requests that have not been delivered to user space. Currently it calls on the context itself and cause to use after free. Fixes the trace: BUG: Unable to handle kernel data access on write at 0x5deadbeef0000108 Faulting instruction address: 0xc0080000002428f4 Oops: Kernel access of bad area, sig: 11 [#1] Call Trace: [c000000207f2b680] [c00800000024280c] .__destroy_id+0x28c/0x610 [rdma_ucm] (unreliable) [c000000207f2b750] [c0080000002429c4] .__destroy_id+0x444/0x610 [rdma_ucm] [c000000207f2b820] [c008000000242c24] .ucma_close+0x94/0xf0 [rdma_ucm] [c000000207f2b8c0] [c00000000046fbdc] .__fput+0xac/0x330 [c000000207f2b960] [c00000000015d48c] .task_work_run+0xbc/0x110 [c000000207f2b9f0] [c00000000012fb00] .do_exit+0x430/0xc50 [c000000207f2bae0] [c0000000001303ec] .do_group_exit+0x5c/0xd0 [c000000207f2bb70] [c000000000144a34] .get_signal+0x194/0xe30 [c000000207f2bc60] [c00000000001f6b4] .do_notify_resume+0x124/0x470 [c000000207f2bd60] [c000000000032484] .interrupt_exit_user_prepare+0x1b4/0x240 [c000000207f2be20] [c000000000010034] interrupt_return+0x14/0x1c0 Rename listen_ctx to conn_req_ctx as the poor name was the cause of this bug. Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery") Link: https://lore.kernel.org/r/20201012045600.418271-4-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22RDMA/restrack: Improve readability in task name managementLeon Romanovsky1-4/+3
Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd(). This uniformly makes all restracks add'd using rdma_restrack_add(). Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/ucma: Rework ucma_migrate_id() to avoid races with destroyJason Gunthorpe1-50/+29
ucma_destroy_id() assumes that all things accessing the ctx will do so via the xarray. This assumption violated only in the case the FD is being closed, then the ctx is reached via the ctx_list. Normally this is OK since ucma_destroy_id() cannot run concurrenty with release(), however with ucma_migrate_id() is involved this can violated as the close of the 2nd FD can run concurrently with destroy on the first: CPU0 CPU1 ucma_destroy_id(fda) ucma_migrate_id(fda -> fdb) ucma_get_ctx() xa_lock() _ucma_find_context() xa_erase() xa_unlock() xa_lock() ctx->file = new_file list_move() xa_unlock() ucma_put_ctx() ucma_close(fdb) _destroy_id() kfree(ctx) _destroy_id() wait_for_completion() // boom, ctx was freed The ctx->file must be modified under the handler and xa_lock, and prior to modification the ID must be rechecked that it is still reachable from cur_file, ie there is no parallel destroy or migrate. To make this work remove the double locking and streamline the control flow. The double locking was obsoleted by the handler lock now directly preventing new uevents from being created, and the ctx_list cannot be read while holding fgets on both files. Removing the double locking also removes the need to check for the same file. Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()") Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvidia.com Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-02RDMA/ucma: Fix resource leak on error pathAlex Dewar1-7/+8
In ucma_process_join(), if the call to xa_alloc() fails, the function will return without freeing mc. Fix this by jumping to the correct line. In the process I renamed the jump labels to something more memorable for extra clarity. Link: https://lore.kernel.org/r/20200902162454.332828-1-alex.dewar90@gmail.com Addresses-Coverity-ID: 1496814 ("Resource leak") Fixes: 95fe51096b7a ("RDMA/ucma: Remove mc_list and rely on xarray") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31Merge tag 'v5.9-rc3' into rdma.git for-nextJason Gunthorpe1-2/+2
Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Remove closing and the close_wqJason Gunthorpe1-34/+15
Use cancel_work_sync() to ensure that the wq is not running and simply assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the close_wq since flush_workqueue() is no longer needed. Link: https://lore.kernel.org/r/20200818120526.702120-15-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Rework how new connections are passed through event deliveryJason Gunthorpe1-126/+96
When a new connection is established the RDMA CM creates a new cm_id and passes it through to the event handler. However inside the UCMA the new ID is not assigned a ucma_context until the user retrieves the event from a syscall. This creates a weird edge condition where a cm_id's context can continue to point at the listening_id that created it, and a number of additional edge conditions on event list clean up related to destroying half created IDs. There is also a race condition in ucma_get_events() where the cm_id->context is being assigned without holding the handler_mutex. Simplify all of this by creating the ucma_context inside the event handler itself and eliminating the edge case of a half created cm_id. All cm_id's can be uniformly destroyed via __destroy_id() or via the close_work. Link: https://lore.kernel.org/r/20200818120526.702120-14-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Narrow file->mut in ucma_event_handler()Jason Gunthorpe1-9/+7
Since the backlog is now an atomic the file->mut is now only protecting the event_list and ctx_list. Narrow its scope to make it clear Link: https://lore.kernel.org/r/20200818120526.702120-13-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Change backlog into an atomicJason Gunthorpe1-7/+8
There is no reason to grab the file->mut just to do this inc/dec work. Use an atomic. Link: https://lore.kernel.org/r/20200818120526.702120-12-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Add missing locking around rdma_leave_multicast()Jason Gunthorpe1-0/+2
All entry points to the rdma_cm from a ULP must be single threaded, even this error unwinds. Add the missing locking. Fixes: 7c11910783a1 ("RDMA/ucma: Put a lock around every call to the rdma_cm layer") Link: https://lore.kernel.org/r/20200818120526.702120-11-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Fix locking for ctx->events_reportedJason Gunthorpe1-1/+3
This value is locked under the file->mut, ensure it is held whenever touching it. The case in ucma_migrate_id() is a race, while in ucma_free_uctx() it is already not possible for the write side to run, the movement is just for clarity. Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()") Link: https://lore.kernel.org/r/20200818120526.702120-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Fix the locking of ctx->fileJason Gunthorpe1-1/+5
ctx->file is changed under the file->mut lock by ucma_migrate_id(), which is impossible to lock correctly. Instead change ctx->file under the handler_lock and ctx_table lock and revise all places touching ctx->file to use this locking when reading ctx->file. Link: https://lore.kernel.org/r/20200818120526.702120-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Do not use file->mut to lock destroyingJason Gunthorpe1-3/+11
The only reader of destroying is inside a handler under the handler_mutex, so directly use the handler_mutex when setting it instead of the larger file->mut. As the refcount could be zero here, and the cm_id already freed, and additional refcount grab around the locking is required to touch the cm_id. Link: https://lore.kernel.org/r/20200818120526.702120-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/cma: Add missing locking to rdma_accept()Jason Gunthorpe1-4/+8
In almost all cases rdma_accept() is called under the handler_mutex by ULPs from their handler callbacks. The one exception was ucma which did not get the handler_mutex. To improve the understand-ability of the locking scheme obtain the mutex for ucma as well. This improves how ucma works by allowing it to directly use handler_mutex for some of its internal locking against the handler callbacks intead of the global file->mut lock. There does not seem to be a serious bug here, other than a DISCONNECT event can be delivered concurrently with accept succeeding. Link: https://lore.kernel.org/r/20200818120526.702120-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Remove mc_list and rely on xarrayJason Gunthorpe1-37/+22
It is not really necessary to keep a linked list of mcs associated with each context when we can just scan the xarray to find the right things. The removes another overloading of file->mut by relying on the xarray locking for mc instead. Link: https://lore.kernel.org/r/20200818120526.702120-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Fix error cases around ucma_alloc_ctx()Jason Gunthorpe1-26/+42
The store to ctx->cm_id was based on the idea that _ucma_find_context() would not return the ctx until it was fully setup. Without locking this doesn't work properly. Split things so that the xarray is allocated with NULL to reserve the ID and once everything is final set the cm_id and store. Along the way this shows that the error unwind in ucma_get_event() if a new ctx is created is wrong, fix it up. Link: https://lore.kernel.org/r/20200818120526.702120-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Consolidate the two destroy flowsJason Gunthorpe1-42/+22
ucma_close() is open coding the tail end of ucma_destroy_id(), consolidate this duplicated code into a function. Link: https://lore.kernel.org/r/20200818120526.702120-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Remove unnecessary locking of file->ctx_list in closeJason Gunthorpe1-4/+7
During the file_operations release function it is already not possible that write() can be running concurrently, remove the extra locking around the ctx_list. Link: https://lore.kernel.org/r/20200818120526.702120-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx()Jason Gunthorpe1-2/+2
Both ucma_destroy_id() and ucma_close_id() (triggered from an event via a wq) can drive the refcount to zero. ucma_get_ctx() was wrongly assuming that the refcount can only go to zero from ucma_destroy_id() which also removes it from the xarray. Use refcount_inc_not_zero() instead. Link: https://lore.kernel.org/r/20200818120526.702120-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva1-2/+2
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-07-27RDMA/cm: Add min length checks to user structure copiesJason Gunthorpe1-0/+4
These are missing throughout ucma, it harmlessly copies garbage from userspace, but in this new code which uses min to compute the copy length it can result in uninitialized stack memory. Check for minimum length at the very start. BUG: KMSAN: uninit-value in ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 CPU: 0 PID: 8457 Comm: syz-executor069 Not tainted 5.8.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1df/0x240 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 ucma_connect+0x2aa/0xab0 drivers/infiniband/core/ucma.c:1091 ucma_write+0x5c5/0x630 drivers/infiniband/core/ucma.c:1764 do_loop_readv_writev fs/read_write.c:737 [inline] do_iter_write+0x710/0xdc0 fs/read_write.c:1020 vfs_writev fs/read_write.c:1091 [inline] do_writev+0x42d/0x8f0 fs/read_write.c:1134 __do_sys_writev fs/read_write.c:1207 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1204 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1204 do_syscall_64+0xb0/0x150 arch/x86/entry/common.c:386 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 34e2ab57a911 ("RDMA/ucma: Extend ucma_connect to receive ECE parameters") Fixes: 0cb15372a615 ("RDMA/cma: Connect ECE to rdma_accept") Link: https://lore.kernel.org/r/0-v1-d5b86dab17dc+28c25-ucma_syz_min_jgg@nvidia.com Reported-by: syzbot+086ab5ca9eafd2379aa6@syzkaller.appspotmail.com Reported-by: syzbot+7446526858b83c8828b2@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-05-27RDMA/cma: Provide ECE reject reasonLeon Romanovsky1-1/+14
IBTA declares "vendor option not supported" reject reason in REJ messages if passive side doesn't want to accept proposed ECE options. Due to the fact that ECE is managed by userspace, there is a need to let users to provide such rejected reason. Link: https://lore.kernel.org/r/20200526103304.196371-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27RDMA/cma: Connect ECE to rdma_acceptLeon Romanovsky1-3/+11
The rdma_accept() is called by both passive and active sides of CMID connection to mark readiness to start data transfer. For passive side, this is called explicitly, for active side, it is called implicitly while receiving REP message. Provide ECE data to rdma_accept function needed for passive side to send that REP message. Link: https://lore.kernel.org/r/20200526103304.196371-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27RDMA/ucma: Deliver ECE parameters through UCMA eventsLeon Romanovsky1-1/+5
Passive side of CMID connection receives ECE request through REQ message and needs to respond with relevant REP message which will be forwarded to active side. The UCMA events interface is responsible for such communication with the user space (librdmacm). Extend it to provide ECE wire data. Link: https://lore.kernel.org/r/20200526103304.196371-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-27RDMA/ucma: Extend ucma_connect to receive ECE parametersLeon Romanovsky1-3/+11
Active side of CMID initiates connection through librdmacm's rdma_connect() and kernel's ucma_connect(). Extend UCMA interface to handle those new parameters. Link: https://lore.kernel.org/r/20200526103304.196371-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-05-12RDMA/ucma: Return stable IB device index as identifierLeon Romanovsky1-7/+9
The librdmacm uses node_guid as identifier to correlate between IB devices and CMA devices. However FW resets cause to such "connection" to be lost and require from the user to restart its application. Extend UCMA to return IB device index, which is stable identifier. Link: https://lore.kernel.org/r/20200504132541.355710-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-27RDMA/ucma: Put a lock around every call to the rdma_cm layerJason Gunthorpe1-2/+47
The rdma_cm must be used single threaded. This appears to be a bug in the design, as it does have lots of locking that seems like it should allow concurrency. However, when it is all said and done every single place that uses the cma_exch() scheme is broken, and all the unlocked reads from the ucma of the cm_id data are wrong too. syzkaller has been finding endless bugs related to this. Fixing this in any elegant way is some enormous amount of work. Take a very big hammer and put a mutex around everything to do with the ucma_context at the top of every syscall. Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Link: https://lore.kernel.org/r/20200218210432.GA31966@ziepe.ca Reported-by: syzbot+adb15cf8c2798e4e0db4@syzkaller.appspotmail.com Reported-by: syzbot+e5579222b6a3edd96522@syzkaller.appspotmail.com Reported-by: syzbot+4b628fcc748474003457@syzkaller.appspotmail.com Reported-by: syzbot+29ee8f76017ce6cf03da@syzkaller.appspotmail.com Reported-by: syzbot+6956235342b7317ec564@syzkaller.appspotmail.com Reported-by: syzbot+b358909d8d01556b790b@syzkaller.appspotmail.com Reported-by: syzbot+6b46b135602a3f3ac99e@syzkaller.appspotmail.com Reported-by: syzbot+8458d13b13562abf6b77@syzkaller.appspotmail.com Reported-by: syzbot+bd034f3fdc0402e942ed@syzkaller.appspotmail.com Reported-by: syzbot+c92378b32760a4eef756@syzkaller.appspotmail.com Reported-by: syzbot+68b44a1597636e0b342c@syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-19RDMA/ucma: Use refcount_t for the ctx->refJason Gunthorpe1-5/+5
Don't use an atomic as a refcount. Link: https://lore.kernel.org/r/20200218191657.GA29724@ziepe.ca Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/ucma: Mask QPN to be 24 bits according to IBTALeon Romanovsky1-1/+1
IBTA declares QPN as 24bits, mask input to ensure that kernel doesn't get higher bits and ensure by adding WANR_ONCE() that other CM users do the same. Link: https://lore.kernel.org/r/20200212072635.682689-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-18RDMA: Report available cdevs through RDMA_NLDEV_CMD_GET_CHARDEVJason Gunthorpe1-0/+23
Update the struct ib_client for all modules exporting cdevs related to the ibdevice to also implement RDMA_NLDEV_CMD_GET_CHARDEV. All cdevs are now autoloadable and discoverable by userspace over netlink instead of relying on sysfs. uverbs also exposes the DRIVER_ID for drivers that are able to support driver id binding in rdma-core. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-07ucma: Convert ctx_idr to XArrayMatthew Wilcox1-34/+24
Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-07ucma: Convert multicast_idr to XArrayMatthew Wilcox1-19/+11
Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-07RDMA/ucma: Use struct_size() helperGustavo A. R. Silva1-2/+1
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06*: convert stream-like files from nonseekable_open -> stream_openKirill Smelkov1-1/+1
Using scripts/coccinelle/api/stream_open.cocci added in 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock"), search and convert to stream_open all in-kernel nonseekable_open users for which read and write actually do not depend on ppos and where there is no other methods in file_operations which assume @offset access. I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. The script also does not convert files that should be valid to convert, but that currently have .llseek = noop_llseek or generic_file_llseek for unknown reason despite file being opened with nonseekable_open (e.g. drivers/input/mousedev.c) Among cases converted 14 were potentially vulnerable to read vs write deadlock (see details in 10dce8af3422): drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/infiniband/core/user_mad.c:988:1-17: ERROR: umad_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/input/misc/uinput.c:401:1-17: ERROR: uinput_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix. and the rest were just safe to convert to stream_open because their read and write do not use ppos at all and corresponding file_operations do not have methods that assume @offset file access(*): arch/powerpc/platforms/52xx/mpc52xx_gpt.c:631:8-24: WARNING: mpc52xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. arch/um/drivers/harddog_kern.c:88:8-24: WARNING: harddog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. arch/x86/kernel/cpu/microcode/core.c:430:33-49: WARNING: microcode_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/ds1620.c:215:8-24: WARNING: ds1620_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/dtlk.c:301:1-17: WARNING: dtlk_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/ipmi/ipmi_watchdog.c:840:9-25: WARNING: ipmi_wdog_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/pcmcia/scr24x_cs.c:95:8-24: WARNING: scr24x_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/char/tb0219.c:246:9-25: WARNING: tb0219_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/firewire/nosy.c:306:8-24: WARNING: nosy_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/hwmon/fschmd.c:840:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/hwmon/w83793.c:1344:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/infiniband/core/ucma.c:1747:8-24: WARNING: ucma_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/infiniband/core/ucm.c:1178:8-24: WARNING: ucm_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/infiniband/core/uverbs_main.c:1086:8-24: WARNING: uverbs_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/input/joydev.c:282:1-17: WARNING: joydev_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/pci/switch/switchtec.c:393:1-17: WARNING: switchtec_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/platform/chrome/cros_ec_debugfs.c:135:8-24: WARNING: cros_ec_console_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/rtc/rtc-ds1374.c:470:9-25: WARNING: ds1374_wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/rtc/rtc-m41t80.c:805:9-25: WARNING: wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/s390/char/tape_char.c:293:2-18: WARNING: tape_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/s390/char/zcore.c:194:8-24: WARNING: zcore_reipl_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/s390/crypto/zcrypt_api.c:528:8-24: WARNING: zcrypt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/spi/spidev.c:594:1-17: WARNING: spidev_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/staging/pi433/pi433_if.c:974:1-17: WARNING: pi433_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/acquirewdt.c:203:8-24: WARNING: acq_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/advantechwdt.c:202:8-24: WARNING: advwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/alim1535_wdt.c:252:8-24: WARNING: ali_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/alim7101_wdt.c:217:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ar7_wdt.c:166:8-24: WARNING: ar7_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/at91rm9200_wdt.c:113:8-24: WARNING: at91wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ath79_wdt.c:135:8-24: WARNING: ath79_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/bcm63xx_wdt.c:119:8-24: WARNING: bcm63xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/cpu5wdt.c:143:8-24: WARNING: cpu5wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/cpwd.c:397:8-24: WARNING: cpwd_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/eurotechwdt.c:319:8-24: WARNING: eurwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/f71808e_wdt.c:528:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/gef_wdt.c:232:8-24: WARNING: gef_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/geodewdt.c:95:8-24: WARNING: geodewdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ib700wdt.c:241:8-24: WARNING: ibwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ibmasr.c:326:8-24: WARNING: asr_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/indydog.c:80:8-24: WARNING: indydog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/intel_scu_watchdog.c:307:8-24: WARNING: intel_scu_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/iop_wdt.c:104:8-24: WARNING: iop_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/it8712f_wdt.c:330:8-24: WARNING: it8712f_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ixp4xx_wdt.c:68:8-24: WARNING: ixp4xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/ks8695_wdt.c:145:8-24: WARNING: ks8695wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/m54xx_wdt.c:88:8-24: WARNING: m54xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/machzwd.c:336:8-24: WARNING: zf_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/mixcomwd.c:153:8-24: WARNING: mixcomwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/mtx-1_wdt.c:121:8-24: WARNING: mtx1_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/mv64x60_wdt.c:136:8-24: WARNING: mv64x60_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/nuc900_wdt.c:134:8-24: WARNING: nuc900wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/nv_tco.c:164:8-24: WARNING: nv_tco_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pc87413_wdt.c:289:8-24: WARNING: pc87413_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd.c:698:8-24: WARNING: pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd.c:737:8-24: WARNING: pcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd_pci.c:581:8-24: WARNING: pcipcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd_pci.c:623:8-24: WARNING: pcipcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd_usb.c:488:8-24: WARNING: usb_pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pcwd_usb.c:527:8-24: WARNING: usb_pcwd_temperature_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pika_wdt.c:121:8-24: WARNING: pikawdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/pnx833x_wdt.c:119:8-24: WARNING: pnx833x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/rc32434_wdt.c:153:8-24: WARNING: rc32434_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/rdc321x_wdt.c:145:8-24: WARNING: rdc321x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/riowd.c:79:1-17: WARNING: riowd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sa1100_wdt.c:62:8-24: WARNING: sa1100dog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc60xxwdt.c:211:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc7240_wdt.c:139:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc8360.c:274:8-24: WARNING: sbc8360_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc_epx_c3.c:81:8-24: WARNING: epx_c3_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sbc_fitpc2_wdt.c:78:8-24: WARNING: fitpc2_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sb_wdog.c:108:1-17: WARNING: sbwdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sc1200wdt.c:181:8-24: WARNING: sc1200wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sc520_wdt.c:261:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/sch311x_wdt.c:319:8-24: WARNING: sch311x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/scx200_wdt.c:105:8-24: WARNING: scx200_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/smsc37b787_wdt.c:369:8-24: WARNING: wb_smsc_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/w83877f_wdt.c:227:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/w83977f_wdt.c:301:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wafer5823wdt.c:200:8-24: WARNING: wafwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/watchdog_dev.c:828:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdrtas.c:379:8-24: WARNING: wdrtas_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdrtas.c:445:8-24: WARNING: wdrtas_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt285.c:104:1-17: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt977.c:276:8-24: WARNING: wdt977_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt.c:424:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt.c:484:8-24: WARNING: wdt_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt_pci.c:464:8-24: WARNING: wdtpci_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open. drivers/watchdog/wdt_pci.c:527:8-24: WARNING: wdtpci_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. net/batman-adv/log.c:105:1-17: WARNING: batadv_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. sound/core/control.c:57:7-23: WARNING: snd_ctl_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. sound/core/rawmidi.c:385:7-23: WARNING: snd_rawmidi_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. sound/core/seq/seq_clientmgr.c:310:7-23: WARNING: snd_seq_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open. sound/core/timer.c:1428:7-23: WARNING: snd_timer_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open. One can also recheck/review the patch via generating it with explanation comments included via $ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci SPFLAGS="-D explain" (*) This second group also contains cases with read/write deadlocks that stream_open.cocci don't yet detect, but which are still valid to convert to stream_open since ppos is not used. For example drivers/pci/switch/switchtec.c calls wait_for_completion_interruptible() in its .read, but stream_open.cocci currently detects only "wait_event*" as blocking. Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yongzhi Pan <panyongzhi@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Tejun Heo <tj@kernel.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Nikolaus Rath <Nikolaus@rath.org> Cc: Han-Wen Nienhuys <hanwen@google.com> Cc: Anatolij Gustschin <agust@denx.de> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James R. Van Zandt" <jrv@vanzandt.mv.com> Cc: Corey Minyard <minyard@acm.org> Cc: Harald Welte <laforge@gnumonks.org> Acked-by: Lubomir Rintel <lkundrak@v3.sk> [scr24x_cs] Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Johan Hovold <johan@kernel.org> Cc: David Herrmann <dh.herrmann@googlemail.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: Jean Delvare <jdelvare@suse.com> Acked-by: Guenter Roeck <linux@roeck-us.net> [watchdog/* hwmon/*] Cc: Rudolf Marek <r.marek@assembler.cz> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Acked-by: Logan Gunthorpe <logang@deltatee.com> [drivers/pci/switch/switchtec] Acked-by: Bjorn Helgaas <bhelgaas@google.com> [drivers/pci/switch/switchtec] Cc: Benson Leung <bleung@chromium.org> Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> [platform/chrome] Cc: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> [rtc/*] Cc: Mark Brown <broonie@kernel.org> Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: bcm-kernel-feedback-list@broadcom.com Cc: Wan ZongShun <mcuos.com@gmail.com> Cc: Zwane Mwaikambo <zwanem@gmail.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <a@unstable.cc> Cc: "David S. Miller" <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
2019-02-08IB/cma: Define option to set ack timeout and pack tos_setDanit Goldberg1-0/+7
Define new option in 'rdma_set_option' to override calculated QP timeout when requested to provide QP attributes to modify a QP. At the same time, pack tos_set to be bitfield. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16RDMA/ucma: Fix Spectre v1 vulnerabilityGustavo A. R. Silva1-0/+3
hdr.cmd can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/infiniband/core/ucma.c:1686 ucma_write() warn: potential spectre issue 'ucma_cmd_table' [r] (local cap) Fix this by sanitizing hdr.cmd before using it to index ucm_cmd_table. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-13ucma: fix a use-after-free in ucma_resolve_ip()Cong Wang1-0/+2
There is a race condition between ucma_close() and ucma_resolve_ip(): CPU0 CPU1 ucma_resolve_ip(): ucma_close(): ctx = ucma_get_ctx(file, cmd.id); list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) { mutex_lock(&mut); idr_remove(&ctx_idr, ctx->id); mutex_unlock(&mut); ... mutex_lock(&mut); if (!ctx->closing) { mutex_unlock(&mut); rdma_destroy_id(ctx->cm_id); ... ucma_free_ctx(ctx); ret = rdma_resolve_addr(); ucma_put_ctx(ctx); Before idr_remove(), ucma_get_ctx() could still find the ctx and after rdma_destroy_id(), rdma_resolve_addr() may still access id_priv pointer. Also, ucma_put_ctx() may use ctx after ucma_free_ctx() too. ucma_close() should call ucma_put_ctx() too which tests the refcnt and waits for the last one releasing it. The similar pattern is already used by ucma_destroy_id(). Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-04RDMA/ucma: check fd type in ucma_migrate_id()Jann Horn1-0/+6
The current code grabs the private_data of whatever file descriptor userspace has supplied and implicitly casts it to a `struct ucma_file *`, potentially causing a type confusion. This is probably fine in practice because the pointer is only used for comparisons, it is never actually dereferenced; and even in the comparisons, it is unlikely that a file from another filesystem would have a ->private_data pointer that happens to also be valid in this context. But ->private_data is not always guaranteed to be a valid pointer to an object owned by the file's filesystem; for example, some filesystems just cram numbers in there. Check the type of the supplied file descriptor to be safe, analogous to how other places in the kernel do it. Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04infiniband: fix a possible use-after-free bugCong Wang1-1/+5
ucma_process_join() will free the new allocated "mc" struct, if there is any error after that, especially the copy_to_user(). But in parallel, ucma_leave_multicast() could find this "mc" through idr_find() before ucma_process_join() frees it, since it is already published. So "mc" could be used in ucma_leave_multicast() after it is been allocated and freed in ucma_process_join(), since we don't refcnt it. Fix this by separating "publish" from ID allocation, so that we can get an ID first and publish it later after copy_to_user(). Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support") Reported-by: Noam Rathaus <noamr@beyondsecurity.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-23RDMA/ucma: Allow resolving address w/o specifying source addressRoland Dreier1-1/+1
The RDMA CM will select a source device and address by consulting the routing table if no source address is passed into rdma_resolve_address(). Userspace will ask for this by passing an all-zero source address in the RESOLVE_IP command. Unfortunately the new check for non-zero address size rejects this with EINVAL, which breaks valid userspace applications. Fix this by explicitly allowing a zero address family for the source. Fixes: 2975d5de6428 ("RDMA/ucma: Check AF family prior resolving address") Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-19RDMA/ucma: Check for a cm_id->device in all user calls that need itJason Gunthorpe1-12/+24
This is done by auditing all callers of ucma_get_ctx and switching the ones that unconditionally touch ->device to ucma_get_ctx_dev. This covers a little less than half of the call sites. The 11 remaining call sites to ucma_get_ctx() were manually audited. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-16RDMA/ucma: ucma_context reference leak in error pathShamir Rabinovitch1-3/+3
Validating input parameters should be done before getting the cm_id otherwise it can leak a cm_id reference. Fixes: 6a21dfc0d0db ("RDMA/ucma: Limit possible option size") Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-06Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-16/+24
Pull rdma updates from Jason Gunthorpe: "Doug and I are at a conference next week so if another PR is sent I expect it to only be bug fixes. Parav noted yesterday that there are some fringe case behavior changes in his work that he would like to fix, and I see that Intel has a number of rc looking patches for HFI1 they posted yesterday. Parav is again the biggest contributor by patch count with his ongoing work to enable container support in the RDMA stack, followed by Leon doing syzkaller inspired cleanups, though most of the actual fixing went to RC. There is one uncomfortable series here fixing the user ABI to actually work as intended in 32 bit mode. There are lots of notes in the commit messages, but the basic summary is we don't think there is an actual 32 bit kernel user of drivers/infiniband for several good reasons. However we are seeing people want to use a 32 bit user space with 64 bit kernel, which didn't completely work today. So in fixing it we required a 32 bit rxe user to upgrade their userspace. rxe users are still already quite rare and we think a 32 bit one is non-existing. - Fix RDMA uapi headers to actually compile in userspace and be more complete - Three shared with netdev pull requests from Mellanox: * 7 patches, mostly to net with 1 IB related one at the back). This series addresses an IRQ performance issue (patch 1), cleanups related to the fix for the IRQ performance problem (patches 2-6), and then extends the fragmented completion queue support that already exists in the net side of the driver to the ib side of the driver (patch 7). * Mostly IB, with 5 patches to net that are needed to support the remaining 10 patches to the IB subsystem. This series extends the current 'representor' framework when the mlx5 driver is in switchdev mode from being a netdev only construct to being a netdev/IB dev construct. The IB dev is limited to raw Eth queue pairs only, but by having an IB dev of this type attached to the representor for a switchdev port, it enables DPDK to work on the switchdev device. * All net related, but needed as infrastructure for the rdma driver - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers - SRP performance updates - IB uverbs write path cleanup patch series from Leon - Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to set the port for ib_srpt to listen on in configfs in order for it to be enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port) - TSO and Scatter FCS support in mlx4 - Refactor of modify_qp routine to resolve problems seen while working on new code that is forthcoming - More refactoring and updates of RDMA CM for containers support from Parav - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory' user API features - Infrastructure updates for the new IOCTL interface, based on increased usage - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit kernel as was originally intended. See the commit messages for extensive details - Syzkaller bugs and code cleanups motivated by them" * tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits) IB/rxe: Fix for oops in rxe_register_device on ppc64le arch IB/mlx5: Device memory mr registration support net/mlx5: Mkey creation command adjustments IB/mlx5: Device memory support in mlx5_ib net/mlx5: Query device memory capabilities IB/uverbs: Add device memory registration ioctl support IB/uverbs: Add alloc/free dm uverbs ioctl support IB/uverbs: Add device memory capabilities reporting IB/uverbs: Expose device memory capabilities to user RDMA/qedr: Fix wmb usage in qedr IB/rxe: Removed GID add/del dummy routines RDMA/qedr: Zero stack memory before copying to user space IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR IB/mlx5: Add information for querying IPsec capabilities IB/mlx5: Add IPsec support for egress and ingress {net,IB}/mlx5: Add ipsec helper IB/mlx5: Add modify_flow_action_esp verb IB/mlx5: Add implementation for create and destroy action_xfrm IB/uverbs: Introduce ESP steering match filter IB/uverbs: Add modify ESP flow_action ...
2018-04-03RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA deviceRoland Dreier1-0/+3
Check to make sure that ctx->cm_id->device is set before we use it. Otherwise userspace can trigger a NULL dereference by doing RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device. Cc: <stable@vger.kernel.org> Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>