Age | Commit message (Collapse) | Author | Files | Lines |
|
Extending uverbs_ioctl header with driver_id and another reserved
field. driver_id should be used in order to identify the driver.
Since every driver could have its own parsing tree, this is necessary
for strace support.
Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
support and thus we add some reserved fields for future usage.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Use macros to make names consistent in ioctl() uAPI:
The ioctl() uAPI works with object-method hierarchy. The method part
also states which handler should be executed when this method is called
from user-space. Therefore, we need to tie method, method's id, method's
handler and the object owning this method together.
Previously, this was done through explicit developer chosen names.
This makes grepping the code harder. Changing the method's name,
method's handler and object's name to be automatically generated based
on the ids.
The headers are split in a way so they be included and used by
user-space. One header strictly contains structures that are used
directly by user-space applications, where another header is used for
internal library (i.e. libibverbs) to form the ioctl() commands.
Other header simply contains the required general command structure.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Use rdma_is_port_valid() which performs port validity check instead of
open coding the same check.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Before commit f1b65df5a232 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set default active_width and
active_speed to IB_WIDTH_4X and IB_SPEED_QDR.
Now, the active_width and active_speed are zeros if the RoCE port
is in DOWN state. The speed string should be set to " SDR" instead of
a blank string when active_speed is zero.
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The restrack code relies on the fact that object structures are zeroed at
the allocation stage, the mlx4 CQ wasn't allocated with kzalloc and it
caused to the following crash.
[ 137.392209] general protection fault: 0000 [#1] SMP KASAN PTI
[ 137.392972] CPU: 0 PID: 622 Comm: ibv_rc_pingpong Tainted: G W 4.16.0-rc1-00099-g00313983cda6 #11
[ 137.395079] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 137.396866] RIP: 0010:rdma_restrack_del+0xc8/0xf0
[ 137.397762] RSP: 0018:ffff8801b54e7968 EFLAGS: 00010206
[ 137.399008] RAX: 0000000000000000 RBX: ffff8801d8bcbae8 RCX: ffffffffb82314df
[ 137.400055] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: 70696b533d454741
[ 137.401103] RBP: ffff8801d90c07a0 R08: ffff8801d8bcbb00 R09: 0000000000000000
[ 137.402470] R10: 0000000000000001 R11: ffffed0036a9cf52 R12: ffff8801d90c0ad0
[ 137.403318] R13: ffff8801d853fb20 R14: ffff8801d8bcbb28 R15: 0000000000000014
[ 137.404736] FS: 00007fb415d43740(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000
[ 137.406074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 137.407101] CR2: 00007fb41557df20 CR3: 00000001b580c001 CR4: 00000000003606b0
[ 137.408308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 137.409352] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 137.410385] Call Trace:
[ 137.411058] ib_destroy_cq+0x23/0x60
[ 137.411460] uverbs_free_cq+0x37/0xa0
[ 137.412040] remove_commit_idr_uobject+0x38/0xf0
[ 137.413042] _rdma_remove_commit_uobject+0x5c/0x160
[ 137.413782] ? lookup_get_idr_uobject+0x39/0x50
[ 137.414737] rdma_remove_commit_uobject+0x3b/0x70
[ 137.415742] ib_uverbs_destroy_cq+0x114/0x1d0
[ 137.416260] ? ib_uverbs_req_notify_cq+0x160/0x160
[ 137.417073] ? kernel_text_address+0x5c/0x90
[ 137.417805] ? __kernel_text_address+0xe/0x30
[ 137.418766] ? unwind_get_return_address+0x2f/0x50
[ 137.419558] ib_uverbs_write+0x453/0x6a0
[ 137.420220] ? show_ibdev+0x90/0x90
[ 137.420653] ? __kasan_slab_free+0x136/0x180
[ 137.421155] ? kmem_cache_free+0x78/0x1e0
[ 137.422192] ? remove_vma+0x83/0x90
[ 137.422614] ? do_munmap+0x447/0x6c0
[ 137.423045] ? vm_munmap+0xb0/0x100
[ 137.423481] ? SyS_munmap+0x1d/0x30
[ 137.424120] ? do_syscall_64+0xeb/0x250
[ 137.424984] ? entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 137.425611] ? lru_add_drain_all+0x270/0x270
[ 137.426116] ? lru_add_drain_cpu+0xa3/0x170
[ 137.426616] ? lru_add_drain+0x11/0x20
[ 137.427058] ? free_pages_and_swap_cache+0xa6/0x120
[ 137.427672] ? tlb_flush_mmu_free+0x78/0x90
[ 137.428168] ? arch_tlb_finish_mmu+0x6d/0xb0
[ 137.428680] __vfs_write+0xc4/0x350
[ 137.430917] ? kernel_read+0xa0/0xa0
[ 137.432758] ? remove_vma+0x90/0x90
[ 137.434781] ? __kasan_slab_free+0x14b/0x180
[ 137.437486] ? remove_vma+0x83/0x90
[ 137.439836] ? kmem_cache_free+0x78/0x1e0
[ 137.442195] ? percpu_counter_add_batch+0x1d/0x90
[ 137.444389] vfs_write+0xf7/0x280
[ 137.446030] SyS_write+0xa1/0x120
[ 137.447867] ? SyS_read+0x120/0x120
[ 137.449670] ? mm_fault_error+0x180/0x180
[ 137.451539] ? _cond_resched+0x16/0x50
[ 137.453697] ? SyS_read+0x120/0x120
[ 137.455883] do_syscall_64+0xeb/0x250
[ 137.457686] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 137.459595] RIP: 0033:0x7fb415637b94
[ 137.461315] RSP: 002b:00007ffdebea7d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 137.463879] RAX: ffffffffffffffda RBX: 00005565022d1bd0 RCX: 00007fb415637b94
[ 137.466519] RDX: 0000000000000018 RSI: 00007ffdebea7da0 RDI: 0000000000000003
[ 137.469543] RBP: 00007ffdebea7d98 R08: 0000000000000000 R09: 00005565022d40c0
[ 137.472479] R10: 00000000000009cf R11: 0000000000000246 R12: 00005565022d2520
[ 137.475125] R13: 00000000000003e8 R14: 0000000000000000 R15: 00007ffdebea7fd0
[ 137.477760] Code: f7 e8 dd 0d 0b ff 48 c7 43 40 00 00 00 00 48 89 df e8 0d 0b 0b ff 48 8d 7b 28 c6 03 00 e8 41 0d 0b ff 48 8b 7b 28 48 85 ff 74 06 <f0> ff 4f 48 74 10 5b 48 89 ef 5d 41 5c 41 5d 41 5e e9 32 b0 ee
[ 137.483375] RIP: rdma_restrack_del+0xc8/0xf0 RSP: ffff8801b54e7968
[ 137.486436] ---[ end trace 81835a1ea6722eed ]---
[ 137.488566] Kernel panic - not syncing: Fatal exception
[ 137.491162] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Before commit [1], rdma_addr_find_l2_eth_by_grh() was an exported function
and therefore declaration in include/rdma/ib_addr.h was fine.
But now that its scope is limited to ib_core module, its better to have it
in core_priv.h.
[1] commit 1060f8653414 ("IB/{core/cm}: Fix generating a return AH for
RoCEE")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Introduce and use helper function get_cm_port_from_path() to get
cm_port based on the the path record entry.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Resolving route for RoCE for a path record is needed only for the
received CM requests.
Therefore,
(a) ib_init_ah_attr_from_path() is refactored first to isolate the
code of resolving route.
(b) Setting dlid, path bits is not needed for RoCE.
Additionally ah attribute initialization is done from the path record
entry, so it is better to refer to path record entry type for
different link layer instead of ah attribute type while initializing
ah attribute itself.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Add and use helper function add_cm_id_to_port_list() to attach
cm_id to port list.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
rdma_resolve_ip_route() is used only by ib_core module. Therefore it is
removed as an exported symbol.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
rdma_protocol_roce() API from the ib_core already provides a way to
detect whether a given device+port is RoCE or not.
Therefore, make use of it and avoid implementing it again in rdmacm
module.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
ah_attr contains the port number to which cm_id is bound. However, while
searching for GID table for matching GID entry, the port number is
ignored.
This could cause the wrong GID to be used when the ah_attr is converted to
an AH.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The return status of ib_init_ah_from_mcmember() is ignored by
cma_ib_mc_handler(). Honor it and return error event if ah attribute
initialization failed.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
ib_find_gid() is only used by IPoIB driver. For IB link layer, GID table
entries are not based on netdevice. Netdevice parameter is unused here.
Therefore, it is removed.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Exported symbol's comments should be with function definition and not in
the header file. Therefore comments of ib_find_cached_gid() and
ib_find_cached_gid_by_port() functions are moved closer to their
definitions.
The function name in then comment is different than the actual function
name, fix it to be same as ib_cache_gid_find_by_filter().
Also current comment section of ib_find_cached_gid_by_port() contains the
desciption of ib_find_cached_gid(), fix that as well.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch. This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524
(IB/mlx5: Fix cleanup order on unload) added to for-rc and
commit b5ca15ad7e61 (IB/mlx5: Add proper representors support)
add as part of the devel cycle both needed to modify the
init/de-init functions used by mlx5. To support the new
representors, the new functions added by the cleanup patch
needed to be made non-static, and the init/de-init list
added by the representors patch needed to be modified to
match the init/de-init list changes made by the cleanup
patch.
Updates:
drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
prototypes added by representors patch to reflect new function
names as changed by cleanup patch
drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
stage list to match new order from cleanup patch
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
gcc-4.4.4 has issues with initialization of anonymous unions.
drivers/infiniband/core/verbs.c: In function '__ib_drain_sq':
drivers/infiniband/core/verbs.c:2204: error: unknown field 'wr_cqe' specified in initializer
drivers/infiniband/core/verbs.c:2204: warning: initialization makes integer from pointer without a cast
Work around this.
Fixes: a1ae7d0345edd5 ("RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access")
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Users can provide garbage while calling to ucma_join_ip_multicast(),
it will indirectly cause to rdma_addr_size() return 0, making the
call to ucma_process_join(), which had the right checks, but it is
better to check the input as early as possible.
The following crash from syzkaller revealed it.
kernel BUG at lib/string.c:1052!
invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4113 Comm: syz-executor0 Not tainted 4.16.0-rc5+ #261
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051
RSP: 0018:ffff8801ca81f8f0 EFLAGS: 00010286
RAX: 0000000000000022 RBX: 1ffff10039503f23 RCX: 0000000000000000
RDX: 0000000000000022 RSI: 1ffff10039503ed3 RDI: ffffed0039503f12
RBP: ffff8801ca81f8f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff8801ca81f998
R13: ffff8801ca81f938 R14: ffff8801ca81fa58 R15: 000000000000fa00
FS: 0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:000000000a12a900
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000008138024 CR3: 00000001cbb58004 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
memcpy include/linux/string.h:344 [inline]
ucma_join_ip_multicast+0x36b/0x3b0 drivers/infiniband/core/ucma.c:1421
ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1633
__vfs_write+0xef/0x970 fs/read_write.c:480
vfs_write+0x189/0x510 fs/read_write.c:544
SYSC_write fs/read_write.c:589 [inline]
SyS_write+0xef/0x220 fs/read_write.c:581
do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline]
do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7f9ec99
RSP: 002b:00000000ff8172cc EFLAGS: 00000282 ORIG_RAX: 0000000000000004
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000100
RDX: 0000000000000063 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 42 2c e3 fb eb de
55 48 89 fe 48 c7 c7 80 75 98 86 48 89 e5 e8 85 95 94 fb <0f> 0b 90 90 90 90
90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56
RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801ca81f8f0
Fixes: 5bc2b7b397b0 ("RDMA/ucma: Allow user space to specify AF_IB when joining multicast")
Reported-by: <syzbot+2287ac532caa81900a4e@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The attempt to join multicast group without ensuring that CMA device
exists will lead to the following crash reported by syzkaller.
[ 64.076794] BUG: KASAN: null-ptr-deref in rdma_join_multicast+0x26e/0x12c0
[ 64.076797] Read of size 8 at addr 00000000000000b0 by task join/691
[ 64.076797]
[ 64.076800] CPU: 1 PID: 691 Comm: join Not tainted 4.16.0-rc1-00219-gb97853b65b93 #23
[ 64.076802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4
[ 64.076803] Call Trace:
[ 64.076809] dump_stack+0x5c/0x77
[ 64.076817] kasan_report+0x163/0x380
[ 64.085859] ? rdma_join_multicast+0x26e/0x12c0
[ 64.086634] rdma_join_multicast+0x26e/0x12c0
[ 64.087370] ? rdma_disconnect+0xf0/0xf0
[ 64.088579] ? __radix_tree_replace+0xc3/0x110
[ 64.089132] ? node_tag_clear+0x81/0xb0
[ 64.089606] ? idr_alloc_u32+0x12e/0x1a0
[ 64.090517] ? __fprop_inc_percpu_max+0x150/0x150
[ 64.091768] ? tracing_record_taskinfo+0x10/0xc0
[ 64.092340] ? idr_alloc+0x76/0xc0
[ 64.092951] ? idr_alloc_u32+0x1a0/0x1a0
[ 64.093632] ? ucma_process_join+0x23d/0x460
[ 64.094510] ucma_process_join+0x23d/0x460
[ 64.095199] ? ucma_migrate_id+0x440/0x440
[ 64.095696] ? futex_wake+0x10b/0x2a0
[ 64.096159] ucma_join_multicast+0x88/0xe0
[ 64.096660] ? ucma_process_join+0x460/0x460
[ 64.097540] ? _copy_from_user+0x5e/0x90
[ 64.098017] ucma_write+0x174/0x1f0
[ 64.098640] ? ucma_resolve_route+0xf0/0xf0
[ 64.099343] ? rb_erase_cached+0x6c7/0x7f0
[ 64.099839] __vfs_write+0xc4/0x350
[ 64.100622] ? perf_syscall_enter+0xe4/0x5f0
[ 64.101335] ? kernel_read+0xa0/0xa0
[ 64.103525] ? perf_sched_cb_inc+0xc0/0xc0
[ 64.105510] ? syscall_exit_register+0x2a0/0x2a0
[ 64.107359] ? __switch_to+0x351/0x640
[ 64.109285] ? fsnotify+0x899/0x8f0
[ 64.111610] ? fsnotify_unmount_inodes+0x170/0x170
[ 64.113876] ? __fsnotify_update_child_dentry_flags+0x30/0x30
[ 64.115813] ? ring_buffer_record_is_on+0xd/0x20
[ 64.117824] ? __fget+0xa8/0xf0
[ 64.119869] vfs_write+0xf7/0x280
[ 64.122001] SyS_write+0xa1/0x120
[ 64.124213] ? SyS_read+0x120/0x120
[ 64.126644] ? SyS_read+0x120/0x120
[ 64.128563] do_syscall_64+0xeb/0x250
[ 64.130732] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 64.132984] RIP: 0033:0x7f5c994ade99
[ 64.135699] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 64.138740] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99
[ 64.141056] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015
[ 64.143536] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000
[ 64.146017] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0
[ 64.148608] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0
[ 64.151060]
[ 64.153703] Disabling lock debugging due to kernel taint
[ 64.156032] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[ 64.159066] IP: rdma_join_multicast+0x26e/0x12c0
[ 64.161451] PGD 80000001d0298067 P4D 80000001d0298067 PUD 1dea39067 PMD 0
[ 64.164442] Oops: 0000 [#1] SMP KASAN PTI
[ 64.166817] CPU: 1 PID: 691 Comm: join Tainted: G B 4.16.0-rc1-00219-gb97853b65b93 #23
[ 64.170004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4
[ 64.174985] RIP: 0010:rdma_join_multicast+0x26e/0x12c0
[ 64.177246] RSP: 0018:ffff8801c8207860 EFLAGS: 00010282
[ 64.179901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff94789522
[ 64.183344] RDX: 1ffffffff2d50fa5 RSI: 0000000000000297 RDI: 0000000000000297
[ 64.186237] RBP: ffff8801c8207a50 R08: 0000000000000000 R09: ffffed0039040ea7
[ 64.189328] R10: 0000000000000001 R11: ffffed0039040ea6 R12: 0000000000000000
[ 64.192634] R13: 0000000000000000 R14: ffff8801e2022800 R15: ffff8801d4ac2400
[ 64.196105] FS: 00007f5c99b98700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000
[ 64.199211] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 64.202046] CR2: 00000000000000b0 CR3: 00000001d1c48004 CR4: 00000000003606a0
[ 64.205032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 64.208221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 64.211554] Call Trace:
[ 64.213464] ? rdma_disconnect+0xf0/0xf0
[ 64.216124] ? __radix_tree_replace+0xc3/0x110
[ 64.219337] ? node_tag_clear+0x81/0xb0
[ 64.222140] ? idr_alloc_u32+0x12e/0x1a0
[ 64.224422] ? __fprop_inc_percpu_max+0x150/0x150
[ 64.226588] ? tracing_record_taskinfo+0x10/0xc0
[ 64.229763] ? idr_alloc+0x76/0xc0
[ 64.232186] ? idr_alloc_u32+0x1a0/0x1a0
[ 64.234505] ? ucma_process_join+0x23d/0x460
[ 64.237024] ucma_process_join+0x23d/0x460
[ 64.240076] ? ucma_migrate_id+0x440/0x440
[ 64.243284] ? futex_wake+0x10b/0x2a0
[ 64.245302] ucma_join_multicast+0x88/0xe0
[ 64.247783] ? ucma_process_join+0x460/0x460
[ 64.250841] ? _copy_from_user+0x5e/0x90
[ 64.253878] ucma_write+0x174/0x1f0
[ 64.257008] ? ucma_resolve_route+0xf0/0xf0
[ 64.259877] ? rb_erase_cached+0x6c7/0x7f0
[ 64.262746] __vfs_write+0xc4/0x350
[ 64.265537] ? perf_syscall_enter+0xe4/0x5f0
[ 64.267792] ? kernel_read+0xa0/0xa0
[ 64.270358] ? perf_sched_cb_inc+0xc0/0xc0
[ 64.272575] ? syscall_exit_register+0x2a0/0x2a0
[ 64.275367] ? __switch_to+0x351/0x640
[ 64.277700] ? fsnotify+0x899/0x8f0
[ 64.280530] ? fsnotify_unmount_inodes+0x170/0x170
[ 64.283156] ? __fsnotify_update_child_dentry_flags+0x30/0x30
[ 64.286182] ? ring_buffer_record_is_on+0xd/0x20
[ 64.288749] ? __fget+0xa8/0xf0
[ 64.291136] vfs_write+0xf7/0x280
[ 64.292972] SyS_write+0xa1/0x120
[ 64.294965] ? SyS_read+0x120/0x120
[ 64.297474] ? SyS_read+0x120/0x120
[ 64.299751] do_syscall_64+0xeb/0x250
[ 64.301826] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 64.304352] RIP: 0033:0x7f5c994ade99
[ 64.306711] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 64.309577] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99
[ 64.312334] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015
[ 64.315783] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000
[ 64.318365] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0
[ 64.320980] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0
[ 64.323515] Code: e8 e8 79 08 ff 4c 89 ff 45 0f b6 a7 b8 01 00 00 e8 68 7c 08 ff 49 8b 1f 4d 89 e5 49 c1 e4 04 48 8
[ 64.330753] RIP: rdma_join_multicast+0x26e/0x12c0 RSP: ffff8801c8207860
[ 64.332979] CR2: 00000000000000b0
[ 64.335550] ---[ end trace 0c00c17a408849c1 ]---
Reported-by: <syzbot+e6aba77967bd72cbc9d6@syzkaller.appspotmail.com>
Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
cma_port_is_unique() allows local port reuse if the quad (source
address and port, destination address and port) for this connection
is unique. However, if the destination info is zero or unspecified, it
can't make a correct decision but still allows port reuse. For example,
sometimes rdma_bind_addr() is called with unspecified destination and
reusing the port can lead to creating a connection with a duplicate quad,
after the destination is resolved. The issue manifests when MPI scale-up
tests hang after the duplicate quad is used.
Set the destination address family and add checks for zero destination
address and port to prevent source port reuse based on invalid destination.
Fixes: 19b752a19dce ("IB/cma: Allow port reuse for rdma_id")
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
All callers to ib_modify_qp_is_ok() provides enum ib_qp_state
makes the checks of out-of-scope redundant. Let's remove them
together with updating function signature to return boolean result.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The QP state is internal enum which is checked at the driver
level by calling to ib_modify_qp_is_ok(). Move this check closer
to user and leave kernel users to be checked by compiler.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Implement the RDMA nldev netlink interface for dumping detailed PD
information.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Implement the RDMA nldev netlink interface for dumping detailed
MR information.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Implement the RDMA nldev netlink interface for dumping detailed
CQ information.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Implement RDMA nldev netlink interface to get detailed CM_ID information.
Because cm_id's are attached to rdma devices in various work queue
contexts, the pid and task information at restrak_add() time is sometimes
not useful. For example, an nvme/f host connection cm_id ends up being
bound to a device in a work queue context and the resulting pid at attach
time no longer exists after connection setup. So instead we mark all
cm_id's created via the rdma_ucm as "user", and all others as "kernel".
This required tweaking the restrack code a little. It also required
wrapping some rdma_cm functions to allow passing the module name string.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Move struct rdma_id_private to a new header cma_priv.h so the resource
tracking services in core/nldev.c can read useful information about cm_ids.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Create a common dumpit function that can be used by all common resource
types. This reduces code replication and simplifies the code as we add
more resource types.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Simplify res_to_dev() to make it easier to read/maintain.
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The QP state is limited and declared in enum ib_qp_state,
but ucma user was able to supply any possible (u32) value.
Reported-by: syzbot+0df1ab766f8924b1edba@syzkaller.appspotmail.com
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Users of ucma are supposed to provide size of option level,
in most paths it is supposed to be equal to u8 or u16, but
it is not the case for the IB path record, where it can be
multiple of struct ib_path_rec_data.
This patch takes simplest possible approach and prevents providing
values more than possible to allocate.
Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com
Fixes: 7ce86409adcd ("RDMA/ucma: Allow user space to set service type")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
resolved_dev returned might be NULL as ifindex is transient number.
Ignoring NULL check of resolved_dev might crash the kernel.
Therefore perform NULL check before accessing resolved_dev.
Additionally rdma_resolve_ip_route() invokes addr_resolve() which
performs check and address translation for loopback ifindex.
Therefore, checking it again in rdma_resolve_ip_route() is not helpful.
Therefore, the code is simplified to avoid IFF_LOOPBACK check.
Fixes: 200298326b27 ("IB/core: Validate route when we init ah")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fix warning limit for kernel stack consumption:
drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes
is larger than 1024 bytes [-Werror=frame-larger-than=]
Using smaller ib_wc array on the stack brings us comfortably below that
limit again.
Fixes: 246d8b184c10 ("IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sergey Gorenko <sergeygo@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This patch fixes the following KASAN complaint:
==================================================================
BUG: KASAN: stack-out-of-bounds in rxe_post_send+0x77d/0x9b0 [rdma_rxe]
Read of size 8 at addr ffff880061aef860 by task 01/1080
CPU: 2 PID: 1080 Comm: 01 Not tainted 4.16.0-rc3-dbg+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc7
print_address_description+0x65/0x270
kasan_report+0x231/0x350
rxe_post_send+0x77d/0x9b0 [rdma_rxe]
__ib_drain_sq+0x1ad/0x250 [ib_core]
ib_drain_qp+0x9/0x30 [ib_core]
srp_destroy_qp+0x51/0x70 [ib_srp]
srp_free_ch_ib+0xfc/0x380 [ib_srp]
srp_create_target+0x1071/0x19e0 [ib_srp]
kernfs_fop_write+0x180/0x210
__vfs_write+0xb1/0x2e0
vfs_write+0xf6/0x250
SyS_write+0x99/0x110
do_syscall_64+0xee/0x2b0
entry_SYSCALL_64_after_hwframe+0x42/0xb7
The buggy address belongs to the page:
page:ffffea000186bbc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffffea000186bbe0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880061aef700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880061aef780: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
>ffff880061aef800: f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 f2 f2 f2 f2
^
ffff880061aef880: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 f2 f2
ffff880061aef900: f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Fixes: 765d67748bcf ("IB: new common API for draining queues")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Maintaining the uobjects list is mandatory, hoist it into the common
rdma_alloc_commit_uobject() function and inline it as there is now
only one caller.
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
dev_get_by_index is being called in addr_resolve
function which returns NULL and NULL pointer access
leads to kernel crash.
Following call trace is observed while running
rdma_lat test application
[ 146.173149] BUG: unable to handle kernel NULL pointer dereference
at 00000000000004a0
[ 146.173198] IP: addr_resolve+0x9e/0x3e0 [ib_core]
[ 146.173221] PGD 0 P4D 0
[ 146.173869] Oops: 0000 [#1] SMP PTI
[ 146.182859] CPU: 8 PID: 127 Comm: kworker/8:1 Tainted: G O 4.15.0-rc6+ #18
[ 146.183758] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01KN179,
BIOS-[TCE132H-2.50]- 10/11/2017
[ 146.184691] Workqueue: ib_cm cm_work_handler [ib_cm]
[ 146.185632] RIP: 0010:addr_resolve+0x9e/0x3e0 [ib_core]
[ 146.186584] RSP: 0018:ffffc9000362faa0 EFLAGS: 00010246
[ 146.187521] RAX: 000000000000001b RBX: ffffc9000362fc08 RCX:
0000000000000006
[ 146.188472] RDX: 0000000000000000 RSI: 0000000000000096 RDI
: ffff88087fc16990
[ 146.189427] RBP: ffffc9000362fb18 R08: 00000000ffffff9d R09:
00000000000004ac
[ 146.190392] R10: 00000000000001e7 R11: 0000000000000001 R12:
ffff88086af2e090
[ 146.191361] R13: 0000000000000000 R14: 0000000000000001 R15:
00000000ffffff9d
[ 146.192327] FS: 0000000000000000(0000) GS:ffff88087fc00000(0000)
knlGS:0000000000000000
[ 146.193301] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 146.194274] CR2: 00000000000004a0 CR3: 000000000220a002 CR4:
00000000003606e0
[ 146.195258] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 146.196256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 146.197231] Call Trace:
[ 146.198209] ? rdma_addr_register_client+0x30/0x30 [ib_core]
[ 146.199199] rdma_resolve_ip+0x1af/0x280 [ib_core]
[ 146.200196] rdma_addr_find_l2_eth_by_grh+0x154/0x2b0 [ib_core]
The below patch adds the missing NULL pointer check
returned by dev_get_by_index before accessing the netdev to
avoid kernel crash.
We observed the below crash when we try to do the below test.
server client
--------- ---------
|1.1.1.1|<----rxe-channel--->|1.1.1.2|
--------- ---------
On server: rdma_lat -c -n 2 -s 1024
On client:rdma_lat 1.1.1.1 -c -n 2 -s 1024
Fixes: 200298326b27 ("IB/core: Validate route when we init ah")
Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
During IB device registration process, if query_device() fails or if
ib_core fails to registers sysfs entries, rdma cgroup cleanup is
skipped.
Cc: <stable@vger.kernel.org> # v4.2+
Fixes: 4be3a4fa51f4 ("IB/core: Fix kernel crash during fail to initialize device")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The proper return error is -EOPNOTSUPP and not -ENOSYS, so update
all places in verbs.c to match this semantics.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Simplify the code by directly checking the availability of extended
command flog instead of doing multiple shift operations.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The internal to kernel variable declarations don't need to be
declared with user types. This patch converts such occurrences
appeared in ib_uverbs_write().
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Move all header validation logic to be performed before SRCU read lock.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The SRCU read lock protects the IB device pointer
and doesn't need to be called before copying user
provided header.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
There is no need to take SRCU lock before checking
file->ucontext, so move it do it before it.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The check based on index is not sufficient because
IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ
and IB_USER_VERBS_CMD_CREATE_CQ <= IB_USER_VERBS_CMD_OPEN_QP,
so if we execute IB_USER_VERBS_EX_CMD_CREATE_CQ this code checks
ib_dev->uverbs_cmd_mask not ib_dev->uverbs_ex_cmd_mask.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Move all command header processing into separate function
and perform those checks before acquiring SRCU read lock.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The non-existing command is supposed to return -EOPNOTSUPP, but the
current code returns different errors for different flows for the
same failure. This patch unifies those flows.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Command that doesn't exist means that it is not supported,
so update code to return -EOPNOTSUPP in case of failure.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fail as early as possible if not enough header data
was provided.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Since commit f21519b23c1b ("IB/core: extended command: an
improved infrastructure for uverbs commands"), the uverbs
supports extra flags as an input to the command interface.
However actually, there is only one flag available and used,
so it is better to refactor the code, so the resolution and
report to the users is done as early as possible.
As part of this change, we changed the return value of failure case
from ENOSYS to be EINVAL to be consistent with the rest flags checks.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|