aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2018-03-19IB/uverbs: Extend uverbs_ioctl header with driver_idMatan Barak21-6/+48
Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Move to new headers and make naming consistentMatan Barak10-247/+396
Use macros to make names consistent in ioctl() uAPI: The ioctl() uAPI works with object-method hierarchy. The method part also states which handler should be executed when this method is called from user-space. Therefore, we need to tie method, method's id, method's handler and the object owning this method together. Previously, this was done through explicit developer chosen names. This makes grepping the code harder. Changing the method's name, method's handler and object's name to be automatically generated based on the ids. The headers are split in a way so they be included and used by user-space. One header strictly contains structures that are used directly by user-space applications, where another header is used for internal library (i.e. libibverbs) to form the ioctl() commands. Other header simply contains the required general command structure. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/srp: Disallow duplicate RDMA/CM connectionsBart Van Assche1-3/+0
According to the SRP standard the INITIATOR and TARGET PORT IDENTIFIER fields from the login request specify the I_T nexus. Whether or not an SRP target closes an existing connection for an I_T nexus when a login request is received depends on the value of the MULTICHANNEL field in the login request. The SRP initiator derives the value of the INITIATOR and TARGET PORT IDENTIFIER fields from the .id_ext, .ioc_guid, .initiator_ext .sgid members of the srp_target_port structure. This means that the .rdma_cm.dst check must be removed from srp_conn_unique(). This patch avoids that for target ports that have multiple addresses, e.g. an IPv4 and an IPv6 address, and if a connection is established to both target port addresses, that the initiator logs in alternatingly every 10 seconds to the other target port address. An SRP target must namely terminate all but one connections for a given I_T nexus if the MULTICHANNEL field has not been set in the login request. Fixes: 19f313438c77 ("IB/srp: Add RDMA/CM support") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/mlx5: Packet packing enhancement for RAW QPBodong Wang4-21/+98
Enable RAW QP to be able to configure burst control by modify_qp. By using burst control with rate limiting, user can achieve best performance and accuracy. The burst control information is passed by user through udata. This patch also reports burst control capability for mlx5 related hardwares, burst control is only marked as supported when both packet_pacing_burst_bound and packet_pacing_typical_size are supported. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19net/mlx5: Packet pacing enhancementBodong Wang4-33/+76
Add two new parameters: max_burst_sz and typical_pkt_size (both in bytes) to rate limit configurations. max_burst_sz: The device will schedule bursts of packets for an SQ connected to this rate, smaller than or equal to this value. Value 0x0 indicates packet bursts will be limited to the device defaults. This field should be used if bursts of packets must be strictly kept under a certain value. typical_pkt_size: When the rate limit is intended for a stream of similar packets, stating the typical packet size can improve the accuracy of the rate limiter. The expected packet size will be the same for all SQs associated with the same rate limit index. Ethernet driver is updated according to this change, but these two parameters will be kept as 0 due to lacking of proper way to get the configurations from user space which requires to change ndo_set_tx_maxrate interface. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19RDMA/hns: Fix init resp when alloc ucontextYixian Liu1-1/+1
The data in resp will be copied from kernel to userspace, thus it needs to be initialized to zeros to avoid copying uninited stack memory. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: e088a685eae9 ("RDMA/hns: Support rq record doorbell for the user space") Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/core: Remove unimplemented ib_peek_cqParav Pandit1-12/+0
ib_peek_cq() verb doesn't seem be implemented in current code. There is some past reference to it at [1] about it being unimplemented. Lot of user documentation created out of kdoc refers to this unimplemented API. Therefore, remove unimplemented API. [1] http://lists.openfabrics.org/pipermail/ofw/2008-May/002465.html Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/core: Use rdma_is_port_valid()Parav Pandit1-3/+2
Use rdma_is_port_valid() which performs port validity check instead of open coding the same check. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19RDMA/bnxt: Fix structure layout for bnxt_re_pd_respJason Gunthorpe1-1/+6
What is going on here is a bit subtle, in the kernel there is no problem because the struct is copied using copy_from_user, so it can safely have an 8 byte alignment, however in userspace it must be constructed by concatenation with the ib_uverbs_alloc_pd_resp struct. This is due to the required memory layout to execute the command. Since ibv_uverbs_alloc_pd_resp is only 4 bytes long, this causes misalignment, and the user space will experience an unexpected padding. Currently it works around this via pointer maths. Make everything more robust by having the compiler reduce the alignment of the struct to 4. The userspace has assertions to ensure this works properly in all situations. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/mlx5: Set the default active rate and width to QDR and 4XHonggang Li1-0/+3
Before commit f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed in RoCE"), the mlx5_ib driver set the default active_width and active_speed to IB_WIDTH_4X and IB_SPEED_QDR. When the RoCE port is down, the RoCE port does not negotiate the active width with the remote side, causing the active width to be zero. When running userspace ibstat to view the port status, ibstat will panic as it reads an invalid width from sys file. This patch restores the original behavior. Fixes: f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed in RoCE"). Signed-off-by: Honggang Li <honli@redhat.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/core: Set speed string to SDR for invalid active ratesHonggang Li1-0/+1
Before commit f1b65df5a232 ("IB/mlx5: Add support for active_width and active_speed in RoCE"), the mlx5_ib driver set default active_width and active_speed to IB_WIDTH_4X and IB_SPEED_QDR. Now, the active_width and active_speed are zeros if the RoCE port is in DOWN state. The speed string should be set to " SDR" instead of a blank string when active_speed is zero. Signed-off-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-16RDMA/restrack: Don't rely on uninitialized variable in restrack_add flowLeon Romanovsky1-1/+3
The restrack code relies on the fact that object structures are zeroed at the allocation stage, the mlx4 CQ wasn't allocated with kzalloc and it caused to the following crash. [ 137.392209] general protection fault: 0000 [#1] SMP KASAN PTI [ 137.392972] CPU: 0 PID: 622 Comm: ibv_rc_pingpong Tainted: G W 4.16.0-rc1-00099-g00313983cda6 #11 [ 137.395079] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 137.396866] RIP: 0010:rdma_restrack_del+0xc8/0xf0 [ 137.397762] RSP: 0018:ffff8801b54e7968 EFLAGS: 00010206 [ 137.399008] RAX: 0000000000000000 RBX: ffff8801d8bcbae8 RCX: ffffffffb82314df [ 137.400055] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: 70696b533d454741 [ 137.401103] RBP: ffff8801d90c07a0 R08: ffff8801d8bcbb00 R09: 0000000000000000 [ 137.402470] R10: 0000000000000001 R11: ffffed0036a9cf52 R12: ffff8801d90c0ad0 [ 137.403318] R13: ffff8801d853fb20 R14: ffff8801d8bcbb28 R15: 0000000000000014 [ 137.404736] FS: 00007fb415d43740(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000 [ 137.406074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 137.407101] CR2: 00007fb41557df20 CR3: 00000001b580c001 CR4: 00000000003606b0 [ 137.408308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 137.409352] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 137.410385] Call Trace: [ 137.411058] ib_destroy_cq+0x23/0x60 [ 137.411460] uverbs_free_cq+0x37/0xa0 [ 137.412040] remove_commit_idr_uobject+0x38/0xf0 [ 137.413042] _rdma_remove_commit_uobject+0x5c/0x160 [ 137.413782] ? lookup_get_idr_uobject+0x39/0x50 [ 137.414737] rdma_remove_commit_uobject+0x3b/0x70 [ 137.415742] ib_uverbs_destroy_cq+0x114/0x1d0 [ 137.416260] ? ib_uverbs_req_notify_cq+0x160/0x160 [ 137.417073] ? kernel_text_address+0x5c/0x90 [ 137.417805] ? __kernel_text_address+0xe/0x30 [ 137.418766] ? unwind_get_return_address+0x2f/0x50 [ 137.419558] ib_uverbs_write+0x453/0x6a0 [ 137.420220] ? show_ibdev+0x90/0x90 [ 137.420653] ? __kasan_slab_free+0x136/0x180 [ 137.421155] ? kmem_cache_free+0x78/0x1e0 [ 137.422192] ? remove_vma+0x83/0x90 [ 137.422614] ? do_munmap+0x447/0x6c0 [ 137.423045] ? vm_munmap+0xb0/0x100 [ 137.423481] ? SyS_munmap+0x1d/0x30 [ 137.424120] ? do_syscall_64+0xeb/0x250 [ 137.424984] ? entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 137.425611] ? lru_add_drain_all+0x270/0x270 [ 137.426116] ? lru_add_drain_cpu+0xa3/0x170 [ 137.426616] ? lru_add_drain+0x11/0x20 [ 137.427058] ? free_pages_and_swap_cache+0xa6/0x120 [ 137.427672] ? tlb_flush_mmu_free+0x78/0x90 [ 137.428168] ? arch_tlb_finish_mmu+0x6d/0xb0 [ 137.428680] __vfs_write+0xc4/0x350 [ 137.430917] ? kernel_read+0xa0/0xa0 [ 137.432758] ? remove_vma+0x90/0x90 [ 137.434781] ? __kasan_slab_free+0x14b/0x180 [ 137.437486] ? remove_vma+0x83/0x90 [ 137.439836] ? kmem_cache_free+0x78/0x1e0 [ 137.442195] ? percpu_counter_add_batch+0x1d/0x90 [ 137.444389] vfs_write+0xf7/0x280 [ 137.446030] SyS_write+0xa1/0x120 [ 137.447867] ? SyS_read+0x120/0x120 [ 137.449670] ? mm_fault_error+0x180/0x180 [ 137.451539] ? _cond_resched+0x16/0x50 [ 137.453697] ? SyS_read+0x120/0x120 [ 137.455883] do_syscall_64+0xeb/0x250 [ 137.457686] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 137.459595] RIP: 0033:0x7fb415637b94 [ 137.461315] RSP: 002b:00007ffdebea7d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 137.463879] RAX: ffffffffffffffda RBX: 00005565022d1bd0 RCX: 00007fb415637b94 [ 137.466519] RDX: 0000000000000018 RSI: 00007ffdebea7da0 RDI: 0000000000000003 [ 137.469543] RBP: 00007ffdebea7d98 R08: 0000000000000000 R09: 00005565022d40c0 [ 137.472479] R10: 00000000000009cf R11: 0000000000000246 R12: 00005565022d2520 [ 137.475125] R13: 00000000000003e8 R14: 0000000000000000 R15: 00007ffdebea7fd0 [ 137.477760] Code: f7 e8 dd 0d 0b ff 48 c7 43 40 00 00 00 00 48 89 df e8 0d 0b 0b ff 48 8d 7b 28 c6 03 00 e8 41 0d 0b ff 48 8b 7b 28 48 85 ff 74 06 <f0> ff 4f 48 74 10 5b 48 89 ef 5d 41 5c 41 5d 41 5e e9 32 b0 ee [ 137.483375] RIP: rdma_restrack_del+0xc8/0xf0 RSP: ffff8801b54e7968 [ 137.486436] ---[ end trace 81835a1ea6722eed ]--- [ 137.488566] Kernel panic - not syncing: Fatal exception [ 137.491162] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/mlx4: Add Scatter FCS support over WQ creationGuy Levi3-9/+32
As a default, for Ethernet packets, the device scatters only the payload of ingress packets. The scatter FCS feature lets the user to get the FCS (Ethernet's frame check sequence) in the received WR's buffer as a 4 Bytes trailer following the packet's payload. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/mlx4: Report TSO capabilitiesYishai Hadas2-2/+29
Report to the user area the TSO device capabilities, it includes the max_tso size and the QP types that support it. The TSO is applicable only when when of the ports is ETH and the device supports it. uresp logic around rss_caps is updated to fix a till-now harmless bug computing the length of the structure to copy. The code did not handle the implicit padding before rss_caps correctly. This is necessay to copy tss_caps successfully. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15i40iw: Tear-down connection after CQP Modify QP failureHenry Orosco4-10/+30
There is no explicit tear-down sequence initiated on connections if the Control QP OP, Modify QP to close, fails. Fix this by triggering a driver generated Asynchronous Event (AE) on Modify QP failures and tear-down the connection on receipt of the AE. This fix can be generalized to other Modify QP failures (i.e. RTS->TERM, IDLE->RTS, etc) as any modify failure will require a connection tear-down. Fixes: d37498417947 ("i40iw: add files for iwarp interface") Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15i40iw: Refactor of driver generated AEsHenry Orosco6-10/+108
The flush CQP OP can be used to optionally generate Asynchronous Events (AEs) in addition to QP flush. Consolidate all HW AE generation code under a new function i40iw_gen_ae which use the flush CQP OP to only generate AEs. Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/cxgb4: Use structs to describe the uABI instead of opencodingJason Gunthorpe2-1/+8
Open coding a loose value is not acceptable for describing the uABI in RDMA. Provide the missing struct. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/hns: Use structs to describe the uABI instead of opencodingJason Gunthorpe2-1/+9
Open coding a loose value is not acceptable for describing the uABI in RDMA. Provide the missing struct. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/i40iw: Move uapi header to include/uapiJason Gunthorpe3-3/+4
All of these defines are part of the uABI for the driver, this header duplicates providers/i40iw/i40iw-abi.h in rdma-core. Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/mlx4: Move flag constants to uapi headerJason Gunthorpe6-9/+11
MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps) and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to userspace and form part of the uAPI. Move them to the uapi header where they belong. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/rxe: Use structs to describe the uABI instead of opencodingJason Gunthorpe8-79/+116
Open coding pointer math is not acceptable for describing the uABI in RDMA. Provide structs for all the cases. The udata is casted to the struct as close to the verbs entry point as possible for maximum clarity. Function signatures and so forth are revised to allow for this. Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/rxe: Get rid of confusing udata parameter to rxe_cq_chk_attrJason Gunthorpe3-4/+4
It isn't used and it couldn't possibly ever be used correctly. Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMAVT: Fix synchronization around percpu_refTejun Heo1-4/+6
rvt_mregion uses percpu_ref for reference counting and RCU to protect accesses from lkey_table. When a rvt_mregion needs to be freed, it first gets unregistered from lkey_table and then rvt_check_refs() is called to wait for in-flight usages before the rvt_mregion is freed. rvt_check_refs() seems to have a couple issues. * It has a fast exit path which tests percpu_ref_is_zero(). However, a percpu_ref reading zero doesn't mean that the object can be released. In fact, the ->release() callback might not even have started executing yet. Proceeding with freeing can lead to use-after-free. * lkey_table is RCU protected but there is no RCU grace period in the free path. percpu_ref uses RCU internally but it's sched-RCU whose grace periods are different from regular RCU. Also, it generally isn't a good idea to depend on internal behaviors like this. To address the above issues, this patch removes the fast exit and adds an explicit synchronize_rcu(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: linux-rdma@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/qedr: eliminate duplicate barriers on weakly-ordered archsSinan Kaya1-2/+2
Code includes wmb() followed by writel() in multiple places. writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/hns: Fix cqn type and init respYixian Liu3-14/+12
This patch changes the type of cqn from u32 to u64 to keep userspace and kernel consistent, initializes resp both for cq and qp to zeros, and also changes the condition judgment of outlen considering future caps extension. Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Fixes: e088a685eae9 (hns: Support rq record doorbell for the user space) Fixes: 9b44703d0a21 (hns: Support cq record doorbell for the user space) Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/core: Move rdma_addr_find_l2_eth_by_grh to core_priv.hParav Pandit2-5/+5
Before commit [1], rdma_addr_find_l2_eth_by_grh() was an exported function and therefore declaration in include/rdma/ib_addr.h was fine. But now that its scope is limited to ib_core module, its better to have it in core_priv.h. [1] commit 1060f8653414 ("IB/{core/cm}: Fix generating a return AH for RoCEE") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/cm: Introduce and use helper function to get cm_port from pathParav Pandit1-4/+13
Introduce and use helper function get_cm_port_from_path() to get cm_port based on the the path record entry. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/core: Refactor ib_init_ah_attr_from_path() for RoCEParav Pandit2-96/+108
Resolving route for RoCE for a path record is needed only for the received CM requests. Therefore, (a) ib_init_ah_attr_from_path() is refactored first to isolate the code of resolving route. (b) Setting dlid, path bits is not needed for RoCE. Additionally ah attribute initialization is done from the path record entry, so it is better to refer to path record entry type for different link layer instead of ah attribute type while initializing ah attribute itself. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/cm: Add and use a helper function to add cm_id's to the port listParav Pandit1-10/+21
Add and use helper function add_cm_id_to_port_list() to attach cm_id to port list. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/ocrdma: Removed GID add/del null routinesParav Pandit3-28/+0
add_gid() and del_gid() are optional callback routines. ib_core ignores invoking them while updating GID table entries if they are not implemented by provider drivers. Therefore remove them. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/core: Remove rdma_resolve_ip_route() as exported symbolParav Pandit3-5/+6
rdma_resolve_ip_route() is used only by ib_core module. Therefore it is removed as an exported symbol. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/cma: Use rdma_protocol_roce() and remove cma_protocol_roce_dev_port()Parav Pandit1-12/+3
rdma_protocol_roce() API from the ib_core already provides a way to detect whether a given device+port is RoCE or not. Therefore, make use of it and avoid implementing it again in rdmacm module. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/core: Honor port_num while resolving GID for IB link layerParav Pandit1-14/+12
ah_attr contains the port number to which cm_id is bound. However, while searching for GID table for matching GID entry, the port number is ignored. This could cause the wrong GID to be used when the ah_attr is converted to an AH. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/core: Honor return status of ib_init_ah_from_mcmember()Parav Pandit1-4/+8
The return status of ib_init_ah_from_mcmember() is ignored by cma_ib_mc_handler(). Honor it and return error event if ah attribute initialization failed. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/{core, ipoib}: Simplify ib_find_gid() for unused ndevParav Pandit3-4/+3
ib_find_gid() is only used by IPoIB driver. For IB link layer, GID table entries are not based on netdevice. Netdevice parameter is unused here. Therefore, it is removed. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/core: Fix comments of GID query functionsParav Pandit2-31/+29
Exported symbol's comments should be with function definition and not in the header file. Therefore comments of ib_find_cached_gid() and ib_find_cached_gid_by_port() functions are moved closer to their definitions. The function name in then comment is different than the actual function name, fix it to be same as ib_cache_gid_find_by_filter(). Also current comment section of ib_find_cached_gid_by_port() contains the desciption of ib_find_cached_gid(), fix that as well. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/mlx5: Simplify clean and destroy MR callsLeon Romanovsky1-30/+10
The failure to destroy the MRs is printed on mlx5_core layer as error and it makes warning prints useless. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15RDMA/mlx5: Guard ODP specific assignments with specific CONFIGLeon Romanovsky1-0/+4
"live" is needed for ODP only and is better to be guarded by appropriate CONFIG. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15RDMA/mlx5: Unify error flows in rereg MR failure pathsLeon Romanovsky1-14/+18
According to the IBTA spec 1.3, the driver failure in MR reregister shall release old and new MRs. C11-20: If the CI returns any other error, the CI shall invalidate both "old" and "new" registrations, and release any associated resources. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15RDMA/mlx5: Return proper value for not-supported commandLeon Romanovsky1-1/+1
Return -EOPNOTSUPP value to the user for unsupported reg_user_mr. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-15RDMA/mlx5: Protect from NULL pointer derefenceLeon Romanovsky1-0/+2
The mlx5_ib_alloc_implicit_mr() can fail to acquire pages and the returned mr pointer won't be valid. Ensure that it is not error prior to access. Cc: <stable@vger.kernel.org> # 4.10 Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-nextDoug Ledford28-199/+272
Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masksArnd Bergmann2-3/+3
On 32-bit targets, we otherwise get a warning about an impossible constant integer expression: In file included from include/linux/kernel.h:11, from include/linux/interrupt.h:6, from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39: drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device': include/linux/bitops.h:7:24: error: left shift count >= width of type [-Werror=shift-count-overflow] #define BIT(nr) (1UL << (nr)) ^~ drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 'BIT' #define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39) ^~~ drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE_HIGH' #define BNXT_RE_MAX_MR_SIZE BNXT_RE_MAX_MR_SIZE_HIGH ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE' ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE; ^~~~~~~~~~~~~~~~~~~ Fixes: 872f3578241d ("RDMA/bnxt_re: Add support for MRs with Huge pages") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-14infiniband: qplib_fp: fix pointer castArnd Bergmann1-2/+2
Building for a 32-bit target results in a couple of warnings from casting between a 32-bit pointer and a 64-bit integer: drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq': drivers/infiniband/hw/bnxt_re/qplib_fp.c:333:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle, ^ drivers/infiniband/hw/bnxt_re/qplib_fp.c:336:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] (struct bnxt_qplib_srq *)q_handle, ^ In file included from include/linux/byteorder/little_endian.h:5, from arch/arm/include/uapi/asm/byteorder.h:22, from include/asm-generic/bitops/le.h:6, from arch/arm/include/asm/bitops.h:342, from include/linux/bitops.h:38, from include/linux/kernel.h:11, from include/linux/interrupt.h:6, from drivers/infiniband/hw/bnxt_re/qplib_fp.c:39: drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_create_srq': include/uapi/linux/byteorder/little_endian.h:31:43: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] #define __cpu_to_le64(x) ((__force __le64)(__u64)(x)) ^ include/linux/byteorder/generic.h:86:21: note: in expansion of macro '__cpu_to_le64' #define cpu_to_le64 __cpu_to_le64 ^~~~~~~~~~~~~ drivers/infiniband/hw/bnxt_re/qplib_fp.c:569:19: note: in expansion of macro 'cpu_to_le64' req.srq_handle = cpu_to_le64(srq); Using a uintptr_t as an intermediate works on all architectures. Fixes: 37cb11acf1f7 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-14drivers/infiniband/ulp/srpt/ib_srpt.c: fix build with gcc-4.4.4Andrew Morton1-1/+2
gcc-4.4.4 has issues with initialization of anonymous unions: drivers/infiniband/ulp/srpt/ib_srpt.c: In function 'srpt_zerolength_write': drivers/infiniband/ulp/srpt/ib_srpt.c:854: error: unknown field 'wr_cqe' specified in initializer drivers/infiniband/ulp/srpt/ib_srpt.c:854: warning: initialization makes integer from pointer without a cast Work aound this. Fixes: 2a78cb4db487 ("IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()") Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14drivers/infiniband/core/verbs.c: fix build with gcc-4.4.4Andrew Morton1-1/+2
gcc-4.4.4 has issues with initialization of anonymous unions. drivers/infiniband/core/verbs.c: In function '__ib_drain_sq': drivers/infiniband/core/verbs.c:2204: error: unknown field 'wr_cqe' specified in initializer drivers/infiniband/core/verbs.c:2204: warning: initialization makes integer from pointer without a cast Work around this. Fixes: a1ae7d0345edd5 ("RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access") Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14IB/mlx5: Fix cleanup order on unloadMark Bloch2-10/+14
On load we create private CQ/QP/PD in order to be used by UMR, we create those resources after we register ourself as an IB device, and we destroy them after we unregister as an IB device. This was changed by commit 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") which moved the destruction before we unregistration. This allowed to trigger an invalid memory access when unloading mlx5_ib while there are open resources: BUG: unable to handle kernel paging request at 00000001002c012c ... Call Trace: mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib] __slab_free+0x9a/0x2d0 delay_time_func+0x10/0x10 [mlx5_ib] unreg_umr.isra.15+0x4b/0x50 [mlx5_ib] mlx5_mr_cache_free+0x46/0x150 [mlx5_ib] clean_mr+0xc9/0x190 [mlx5_ib] dereg_mr+0xba/0xf0 [mlx5_ib] ib_dereg_mr+0x13/0x20 [ib_core] remove_commit_idr_uobject+0x16/0x70 [ib_uverbs] uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs] ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs] ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs] ib_unregister_device+0xd4/0x190 [ib_core] __mlx5_ib_remove+0x2e/0x40 [mlx5_ib] mlx5_remove_device+0xf5/0x120 [mlx5_core] mlx5_unregister_interface+0x37/0x90 [mlx5_core] mlx5_ib_cleanup+0xc/0x225 [mlx5_ib] SyS_delete_module+0x153/0x230 do_syscall_64+0x62/0x110 entry_SYSCALL_64_after_hwframe+0x21/0x86 ... We restore the original behavior by breaking the UMR stage into two parts, pre and post IB registration stages, this way we can restore the original functionality and maintain clean separation of logic between stages. Fixes: 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14rdma_rxe: make rxe work over 802.1q VLAN devicesMartin Wilck2-7/+49
This patch fixes RDMA/rxe over 802.1q VLAN devices. Without it, I observed the following behavior: a) adding a VLAN device to RXE via rxe_net_add() creates a non-functional RDMA device. This is caused by the logic in enum_all_gids_of_dev_cb() / is_eth_port_of_netdev(), which only considers networks connected to "upper devices" of the configured network device, resulting in an empty set of gids for a VLAN interface that is an "upper device" itself. Later attempts to connect via this rdma device fail in cma_acuire_dev() because no gids can be resolved. b) adding the master device of the VLAN device instead seems to work initially, target addresses via VLAN devices are resolved successfully. But the connection times out because no 802.1q VLAN headers are inserted in the ethernet packets, which are therefore never received. This happens because the RXE layer sends the packets via the master device rather than the VLAN device. The problem could be solved by changing either a) or b). My thinking was that the logic in a) was created deliberately, thus I decided to work on b). It turns out that the information about the VLAN interface for the gid at hand is available in the AV information. My patch converts the RXE code to use this netdev instead of rxe->ndev. With this change, RXE over vlan works on my test system. Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14RDMA/ucma: Don't allow join attempts for unsupported AF familyLeon Romanovsky1-1/+7
Users can provide garbage while calling to ucma_join_ip_multicast(), it will indirectly cause to rdma_addr_size() return 0, making the call to ucma_process_join(), which had the right checks, but it is better to check the input as early as possible. The following crash from syzkaller revealed it. kernel BUG at lib/string.c:1052! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4113 Comm: syz-executor0 Not tainted 4.16.0-rc5+ #261 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051 RSP: 0018:ffff8801ca81f8f0 EFLAGS: 00010286 RAX: 0000000000000022 RBX: 1ffff10039503f23 RCX: 0000000000000000 RDX: 0000000000000022 RSI: 1ffff10039503ed3 RDI: ffffed0039503f12 RBP: ffff8801ca81f8f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000006 R11: 0000000000000000 R12: ffff8801ca81f998 R13: ffff8801ca81f938 R14: ffff8801ca81fa58 R15: 000000000000fa00 FS: 0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:000000000a12a900 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000008138024 CR3: 00000001cbb58004 CR4: 00000000001606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: memcpy include/linux/string.h:344 [inline] ucma_join_ip_multicast+0x36b/0x3b0 drivers/infiniband/core/ucma.c:1421 ucma_write+0x2d6/0x3d0 drivers/infiniband/core/ucma.c:1633 __vfs_write+0xef/0x970 fs/read_write.c:480 vfs_write+0x189/0x510 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0xef/0x220 fs/read_write.c:581 do_syscall_32_irqs_on arch/x86/entry/common.c:330 [inline] do_fast_syscall_32+0x3ec/0xf9f arch/x86/entry/common.c:392 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7f9ec99 RSP: 002b:00000000ff8172cc EFLAGS: 00000282 ORIG_RAX: 0000000000000004 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000020000100 RDX: 0000000000000063 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 42 2c e3 fb eb de 55 48 89 fe 48 c7 c7 80 75 98 86 48 89 e5 e8 85 95 94 fb <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56 RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801ca81f8f0 Fixes: 5bc2b7b397b0 ("RDMA/ucma: Allow user space to specify AF_IB when joining multicast") Reported-by: <syzbot+2287ac532caa81900a4e@syzkaller.appspotmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-14RDMA/ucma: Fix access to non-initialized CM_ID objectLeon Romanovsky1-0/+3
The attempt to join multicast group without ensuring that CMA device exists will lead to the following crash reported by syzkaller. [ 64.076794] BUG: KASAN: null-ptr-deref in rdma_join_multicast+0x26e/0x12c0 [ 64.076797] Read of size 8 at addr 00000000000000b0 by task join/691 [ 64.076797] [ 64.076800] CPU: 1 PID: 691 Comm: join Not tainted 4.16.0-rc1-00219-gb97853b65b93 #23 [ 64.076802] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4 [ 64.076803] Call Trace: [ 64.076809] dump_stack+0x5c/0x77 [ 64.076817] kasan_report+0x163/0x380 [ 64.085859] ? rdma_join_multicast+0x26e/0x12c0 [ 64.086634] rdma_join_multicast+0x26e/0x12c0 [ 64.087370] ? rdma_disconnect+0xf0/0xf0 [ 64.088579] ? __radix_tree_replace+0xc3/0x110 [ 64.089132] ? node_tag_clear+0x81/0xb0 [ 64.089606] ? idr_alloc_u32+0x12e/0x1a0 [ 64.090517] ? __fprop_inc_percpu_max+0x150/0x150 [ 64.091768] ? tracing_record_taskinfo+0x10/0xc0 [ 64.092340] ? idr_alloc+0x76/0xc0 [ 64.092951] ? idr_alloc_u32+0x1a0/0x1a0 [ 64.093632] ? ucma_process_join+0x23d/0x460 [ 64.094510] ucma_process_join+0x23d/0x460 [ 64.095199] ? ucma_migrate_id+0x440/0x440 [ 64.095696] ? futex_wake+0x10b/0x2a0 [ 64.096159] ucma_join_multicast+0x88/0xe0 [ 64.096660] ? ucma_process_join+0x460/0x460 [ 64.097540] ? _copy_from_user+0x5e/0x90 [ 64.098017] ucma_write+0x174/0x1f0 [ 64.098640] ? ucma_resolve_route+0xf0/0xf0 [ 64.099343] ? rb_erase_cached+0x6c7/0x7f0 [ 64.099839] __vfs_write+0xc4/0x350 [ 64.100622] ? perf_syscall_enter+0xe4/0x5f0 [ 64.101335] ? kernel_read+0xa0/0xa0 [ 64.103525] ? perf_sched_cb_inc+0xc0/0xc0 [ 64.105510] ? syscall_exit_register+0x2a0/0x2a0 [ 64.107359] ? __switch_to+0x351/0x640 [ 64.109285] ? fsnotify+0x899/0x8f0 [ 64.111610] ? fsnotify_unmount_inodes+0x170/0x170 [ 64.113876] ? __fsnotify_update_child_dentry_flags+0x30/0x30 [ 64.115813] ? ring_buffer_record_is_on+0xd/0x20 [ 64.117824] ? __fget+0xa8/0xf0 [ 64.119869] vfs_write+0xf7/0x280 [ 64.122001] SyS_write+0xa1/0x120 [ 64.124213] ? SyS_read+0x120/0x120 [ 64.126644] ? SyS_read+0x120/0x120 [ 64.128563] do_syscall_64+0xeb/0x250 [ 64.130732] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 64.132984] RIP: 0033:0x7f5c994ade99 [ 64.135699] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 64.138740] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99 [ 64.141056] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015 [ 64.143536] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000 [ 64.146017] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0 [ 64.148608] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0 [ 64.151060] [ 64.153703] Disabling lock debugging due to kernel taint [ 64.156032] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 [ 64.159066] IP: rdma_join_multicast+0x26e/0x12c0 [ 64.161451] PGD 80000001d0298067 P4D 80000001d0298067 PUD 1dea39067 PMD 0 [ 64.164442] Oops: 0000 [#1] SMP KASAN PTI [ 64.166817] CPU: 1 PID: 691 Comm: join Tainted: G B 4.16.0-rc1-00219-gb97853b65b93 #23 [ 64.170004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-proj4 [ 64.174985] RIP: 0010:rdma_join_multicast+0x26e/0x12c0 [ 64.177246] RSP: 0018:ffff8801c8207860 EFLAGS: 00010282 [ 64.179901] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff94789522 [ 64.183344] RDX: 1ffffffff2d50fa5 RSI: 0000000000000297 RDI: 0000000000000297 [ 64.186237] RBP: ffff8801c8207a50 R08: 0000000000000000 R09: ffffed0039040ea7 [ 64.189328] R10: 0000000000000001 R11: ffffed0039040ea6 R12: 0000000000000000 [ 64.192634] R13: 0000000000000000 R14: ffff8801e2022800 R15: ffff8801d4ac2400 [ 64.196105] FS: 00007f5c99b98700(0000) GS:ffff8801e5d00000(0000) knlGS:0000000000000000 [ 64.199211] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 64.202046] CR2: 00000000000000b0 CR3: 00000001d1c48004 CR4: 00000000003606a0 [ 64.205032] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 64.208221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 64.211554] Call Trace: [ 64.213464] ? rdma_disconnect+0xf0/0xf0 [ 64.216124] ? __radix_tree_replace+0xc3/0x110 [ 64.219337] ? node_tag_clear+0x81/0xb0 [ 64.222140] ? idr_alloc_u32+0x12e/0x1a0 [ 64.224422] ? __fprop_inc_percpu_max+0x150/0x150 [ 64.226588] ? tracing_record_taskinfo+0x10/0xc0 [ 64.229763] ? idr_alloc+0x76/0xc0 [ 64.232186] ? idr_alloc_u32+0x1a0/0x1a0 [ 64.234505] ? ucma_process_join+0x23d/0x460 [ 64.237024] ucma_process_join+0x23d/0x460 [ 64.240076] ? ucma_migrate_id+0x440/0x440 [ 64.243284] ? futex_wake+0x10b/0x2a0 [ 64.245302] ucma_join_multicast+0x88/0xe0 [ 64.247783] ? ucma_process_join+0x460/0x460 [ 64.250841] ? _copy_from_user+0x5e/0x90 [ 64.253878] ucma_write+0x174/0x1f0 [ 64.257008] ? ucma_resolve_route+0xf0/0xf0 [ 64.259877] ? rb_erase_cached+0x6c7/0x7f0 [ 64.262746] __vfs_write+0xc4/0x350 [ 64.265537] ? perf_syscall_enter+0xe4/0x5f0 [ 64.267792] ? kernel_read+0xa0/0xa0 [ 64.270358] ? perf_sched_cb_inc+0xc0/0xc0 [ 64.272575] ? syscall_exit_register+0x2a0/0x2a0 [ 64.275367] ? __switch_to+0x351/0x640 [ 64.277700] ? fsnotify+0x899/0x8f0 [ 64.280530] ? fsnotify_unmount_inodes+0x170/0x170 [ 64.283156] ? __fsnotify_update_child_dentry_flags+0x30/0x30 [ 64.286182] ? ring_buffer_record_is_on+0xd/0x20 [ 64.288749] ? __fget+0xa8/0xf0 [ 64.291136] vfs_write+0xf7/0x280 [ 64.292972] SyS_write+0xa1/0x120 [ 64.294965] ? SyS_read+0x120/0x120 [ 64.297474] ? SyS_read+0x120/0x120 [ 64.299751] do_syscall_64+0xeb/0x250 [ 64.301826] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 64.304352] RIP: 0033:0x7f5c994ade99 [ 64.306711] RSP: 002b:00007f5c99b97d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 64.309577] RAX: ffffffffffffffda RBX: 00000000200001e4 RCX: 00007f5c994ade99 [ 64.312334] RDX: 00000000000000a0 RSI: 00000000200001c0 RDI: 0000000000000015 [ 64.315783] RBP: 00007f5c99b97ec0 R08: 0000000000000000 R09: 0000000000000000 [ 64.318365] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5c99b97fc0 [ 64.320980] R13: 0000000000000000 R14: 00007fff660e1c40 R15: 00007f5c99b989c0 [ 64.323515] Code: e8 e8 79 08 ff 4c 89 ff 45 0f b6 a7 b8 01 00 00 e8 68 7c 08 ff 49 8b 1f 4d 89 e5 49 c1 e4 04 48 8 [ 64.330753] RIP: rdma_join_multicast+0x26e/0x12c0 RSP: ffff8801c8207860 [ 64.332979] CR2: 00000000000000b0 [ 64.335550] ---[ end trace 0c00c17a408849c1 ]--- Reported-by: <syzbot+e6aba77967bd72cbc9d6@syzkaller.appspotmail.com> Fixes: c8f6a362bf3e ("RDMA/cma: Add multicast communication support") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>