aboutsummaryrefslogtreecommitdiffstats
path: root/drivers (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-02-28IB/uverbs: Tidy uverbs_uobject_addJason Gunthorpe1-9/+4
Maintaining the uobjects list is mandatory, hoist it into the common rdma_alloc_commit_uobject() function and inline it as there is now only one caller. Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28Merge tag 'mlx5-updates-2018-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-nextDoug Ledford13-147/+732
mlx5-update-2018-02-23 (IB representors) From: Mark Bloch <markb@mellanox.com> ========= Add IB representor when in switchdev mode The following series adds support for an IB (RAW Ethernet only) device representor which is created when the user switches to switchdev mode. Today when switching to switchdev mode the only representors which are created are net devices. Each netdev is a representor of a virtual function and any data sent via the representor is received on the virtual function, and any data sent via the virtual function is received by the representor. For the mlx5 driver the main use of this functionality is to be able to use Open vSwitch on the hypervisor in order to manage/control traffic from/to the virtual functions. Open vSwitch can also work with DPDK devices and not just net devices, this series exposes an IB device, which Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK. An IB device representor exposes only RAW Ethernet QP capabilities and the ability to create flow rules to direct traffic to its RX queues. The state of the IB device (ACTIVE/DOWN etc..) is based on the state of the corresponding net device representor. No other RDMA/RoCE functionality is currently supported and no GID table is exposed. ========= Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-23IB/mlx5: Disable self loopback check when in switchdev modeMark Bloch1-0/+8
When in switchdev mode, there is no need to do self loopback checks as we can't receive those packets, we insert steering rules to the eswitch that make sure packets can't be looped back. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Reload IB interface when switching devlink modesMark Bloch4-17/+26
Up until this point it wasn't possible to activate IB representors when switching to switchdev mode, remove this limitation. We trigger reload of the PF IB interface in order to make sure that already allocated resources are invalid and new resources will be opened correctly with all the limitations of switchdev mode applied (only raw packet capabilities, without RoCE). We also move the remove/add to a place where the E-Switch mode is set/unset to better control when to trigger this action, this will allow the IB side to start in the correct mode. For better code reuse, create a function which reloads an interface and export it. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23IB/mlx5: Add proper representors supportMark Bloch4-30/+192
This commit adds full support for IB representor: 1) Representors profile, We add two new profiles: nic_rep_profile - This profile will be used to create an IB device that represents the PF/UPLINK. rep_profile - This profile will be used to create an IB device that represents VFs. Each VF will be its own representor. 2) Proper load/unload callbacks, Those are called by the E-Switch when moving to/from switchdev mode. 3) Different flow DB handling for when we in switchdev mode. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23IB/mlx5: E-Switch, Add rule to forward traffic to vportMark Bloch4-0/+45
In order to forward traffic from representor's SQ to the right virtual function, every time an SQ is created also add the corresponding flow rule to the FDB. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23IB/mlx5: Don't expose MR cache in switchdev modeMark Bloch1-2/+3
When enabling many VFs and switching to switchdev mode, the total amount of mkeys we try to allocate when loading representors is very large and may cause timeouts on allocations, the same issues was observed on VFs and we employ the same fix that was done for them. We avoid allocating the full MR cache on load but still allow it to be manipulated once the IB device is loaded. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23IB/mlx5: When in switchdev mode, expose only raw packet capabilitiesMark Bloch2-30/+124
Currently in switchdev mode we allow only for raw packet QPs. Expose the right capabilities and set the gid table length to 0, also make sure we don't try to enable RoCE, so split the function to enable RoCE so representors can enable only the notifier needed for net device events. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23IB/mlx5: Listen to netdev register/unresiter events in switchdev modeMark Bloch2-3/+20
Currently we listen to netdev register/unregister event based on PCI device. When in switchdev mode PF and representors share the same PCI device, so in order to pair ib device and netdev in switchdev mode compare the netdev that triggered the event to that of the representor. Expose a function that lets you receive the netdev associated what a given representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23IB/mlx5: Add match on vport when in switchdev modeMark Bloch1-0/+12
When we point to a representor, it means we are in switchdev mode. The flow db is shared between PF and virtual function representors so each rule created needs to have a match on its specific source port. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23IB/mlx5: Allocate flow DB only on PF IB deviceMark Bloch2-14/+34
A flow DB is a shared resource between PF and representors, need to allocate it only when creating the PF IB device. Once we add IB representors, they will use the flow db which was created by the PF. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23IB/mlx5: Add basic regiser/unregister representors codeMark Bloch5-0/+169
Create the basic infrastructure of registering and unregistering IB representors. The load/unload callbacks are left empty and proper implementation will be introduced in following patches. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Optimize HW steering tables in switchdev modeMark Bloch2-7/+44
Under switchdev mode we insert an eswitch miss rule causing any unmatched traffic to be sent towards the PF vport. This miss rule can be optimized if we break it to two, one case is for multicast traffic and the other for unicast. Breaking the miss rule into two (unicast and multicast) allows the firmware to program the hardware in a more efficient way. Using ConncetX-5 Ex with IXIA and testpmd (which use IB representors): IXIA -> NIC -> PF -> IB representor -> NIC -> VF: - Without this optimization: 9.2 MPPS. - With this optimization: 18 MPPS. VF -> NIC -> IB representor-> PF -> NIC -> IXIA: - Without this optimization: 17 MPPS. - With this optimization: 23.4 MPPS. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Increase number of FTEs in FDB in switchdev modeMark Bloch1-2/+3
The max FTE number should be the max number of SQs that can be opened. Ethernet representors open one SQ each. Once we add IB representor this will increase (depends on the user). For now lets start with 31 per IB representor and if needed increase in the future. This increase only affects the number of FTEs in the slow path FDB, offloaded rules (done via TC on the fast path portion of the FDB) aren't affected. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Move representors definition to a global scopeMark Bloch4-49/+19
In preparation for IB representors, move representors structs to a global scope, also expose functions needed for registration, unregistration, eswitch mode and creating a flow rule to direct traffic from SQs to the right VF. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23net/mlx5: E-Switch, Add callback to get representor deviceMark Bloch3-0/+40
Add a callback interface to get a protocol device (per representor type). The Ethernet representors will expose their netdev via this interface. This functionality can be later used by IB representor in order to find the corresponding net device representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-22RDMA/verbs: Return proper error code for not supported system callLeon Romanovsky1-23/+23
The proper return error is -EOPNOTSUPP and not -ENOSYS, so update all places in verbs.c to match this semantics. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Reduce number of command header flags checksLeon Romanovsky1-9/+2
Simplify the code by directly checking the availability of extended command flog instead of doing multiple shift operations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Replace user's types with kernel's typesLeon Romanovsky1-5/+5
The internal to kernel variable declarations don't need to be declared with user types. This patch converts such occurrences appeared in ib_uverbs_write(). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Refactor the header validation logicLeon Romanovsky1-43/+47
Move all header validation logic to be performed before SRCU read lock. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMa/uverbs: Copy ex_hdr outside of SRCU read lockLeon Romanovsky1-7/+6
The SRCU read lock protects the IB device pointer and doesn't need to be called before copying user provided header. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Move uncontext check before SRCU read lockLeon Romanovsky1-11/+4
There is no need to take SRCU lock before checking file->ucontext, so move it do it before it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Properly check command supported maskLeon Romanovsky1-12/+6
The check based on index is not sufficient because IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ and IB_USER_VERBS_CMD_CREATE_CQ <= IB_USER_VERBS_CMD_OPEN_QP, so if we execute IB_USER_VERBS_EX_CMD_CREATE_CQ this code checks ib_dev->uverbs_cmd_mask not ib_dev->uverbs_ex_cmd_mask. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Refactor command header processingLeon Romanovsky1-30/+32
Move all command header processing into separate function and perform those checks before acquiring SRCU read lock. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Unify return values of not supported commandLeon Romanovsky1-12/+4
The non-existing command is supposed to return -EOPNOTSUPP, but the current code returns different errors for different flows for the same failure. This patch unifies those flows. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Return not supported error code for unsupported commandsLeon Romanovsky1-1/+1
Command that doesn't exist means that it is not supported, so update code to return -EOPNOTSUPP in case of failure. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Fail as early as possible if not enough header data was providedLeon Romanovsky1-6/+7
Fail as early as possible if not enough header data was provided. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Refactor flags checks and update return valueLeon Romanovsky1-4/+6
Since commit f21519b23c1b ("IB/core: extended command: an improved infrastructure for uverbs commands"), the uverbs supports extra flags as an input to the command interface. However actually, there is only one flag available and used, so it is better to refactor the code, so the resolution and report to the users is done as early as possible. As part of this change, we changed the return value of failure case from ENOSYS to be EINVAL to be consistent with the rest flags checks. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Update sizeof usersLeon Romanovsky1-5/+5
Update sizeof() users to be consistent with coding style. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Convert command mask validity check function to be boolLeon Romanovsky1-4/+4
The function validate_command_mask() returns only two results: success or failure, so convert it to return bool instead of 0 and -1. Reported-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22Merge branch 'k.o/for-rc' into k.o/wip/dl-for-nextDoug Ledford20-97/+209
There is a 14 patch series waiting to come into for-next that has a dependecy on code submitted into this kernel's for-rc series. So, merge the for-rc branch into the current for-next in order to make the patch series apply cleanly. Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22Merge tag 'mlx5-updates-2018-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into k.o/wip/dl-for-nextDoug Ledford10-172/+223
mlx5-updates-2018-02-21 This series includes shared code updates for mlx5 core driver for both netdev and rdma subsystems. By Saeed, First six patches of the series are meant to address a performance issue and should provide a performance boost for multi core IRQ interrupt hungry workloads. The issue is fixed in the first patch, all other patches are meant to refactor the code in light of this fix. The problem it comes to fix, is a shared spinlock accessed across all HCA IRQs which protects the CQ database. To solve this we simply move the CQ database and its spinlock to be per EQ (IRQ), thus per core. By Yonatan, Fragmented completion queue (CQ) for RDMA, core driver implementation to create fragmented CQ buffers rather than one large contiguous memory buffer, the implementation scheme already exist and used by the netdev CQs, the patch shares that code with the rdma CQ creation flow and makes use of the new API in mlx5_ib driver. Thanks, Saeed. Merged into rdma-next tree as well as net-next tree to prevent conflicts in future patches between the two trees. Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-21RDMA/uverbs: Fix kernel panic while using XRC_TGT QP typeLeon Romanovsky1-0/+3
Attempt to modify XRC_TGT QP type from the user space (ibv_xsrq_pingpong invocation) will trigger the following kernel panic. It is caused by the fact that such QPs missed uobject initialization. [ 17.408845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 17.412645] IP: rdma_lookup_put_uobject+0x9/0x50 [ 17.416567] PGD 0 P4D 0 [ 17.419262] Oops: 0000 [#1] SMP PTI [ 17.422915] CPU: 0 PID: 455 Comm: ibv_xsrq_pingpo Not tainted 4.16.0-rc1+ #86 [ 17.424765] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 17.427399] RIP: 0010:rdma_lookup_put_uobject+0x9/0x50 [ 17.428445] RSP: 0018:ffffb8c7401e7c90 EFLAGS: 00010246 [ 17.429543] RAX: 0000000000000000 RBX: ffffb8c7401e7cf8 RCX: 0000000000000000 [ 17.432426] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000 [ 17.437448] RBP: 0000000000000000 R08: 00000000000218f0 R09: ffffffff8ebc4cac [ 17.440223] R10: fffff6038052cd80 R11: ffff967694b36400 R12: ffff96769391f800 [ 17.442184] R13: ffffb8c7401e7cd8 R14: 0000000000000000 R15: ffff967699f60000 [ 17.443971] FS: 00007fc29207d700(0000) GS:ffff96769fc00000(0000) knlGS:0000000000000000 [ 17.446623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.448059] CR2: 0000000000000048 CR3: 000000001397a000 CR4: 00000000000006b0 [ 17.449677] Call Trace: [ 17.450247] modify_qp.isra.20+0x219/0x2f0 [ 17.451151] ib_uverbs_modify_qp+0x90/0xe0 [ 17.452126] ib_uverbs_write+0x1d2/0x3c0 [ 17.453897] ? __handle_mm_fault+0x93c/0xe40 [ 17.454938] __vfs_write+0x36/0x180 [ 17.455875] vfs_write+0xad/0x1e0 [ 17.456766] SyS_write+0x52/0xc0 [ 17.457632] do_syscall_64+0x75/0x180 [ 17.458631] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 17.460004] RIP: 0033:0x7fc29198f5a0 [ 17.460982] RSP: 002b:00007ffccc71f018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 17.463043] RAX: ffffffffffffffda RBX: 0000000000000078 RCX: 00007fc29198f5a0 [ 17.464581] RDX: 0000000000000078 RSI: 00007ffccc71f050 RDI: 0000000000000003 [ 17.466148] RBP: 0000000000000000 R08: 0000000000000078 R09: 00007ffccc71f050 [ 17.467750] R10: 000055b6cf87c248 R11: 0000000000000246 R12: 00007ffccc71f300 [ 17.469541] R13: 000055b6cf8733a0 R14: 0000000000000000 R15: 0000000000000000 [ 17.471151] Code: 00 00 0f 1f 44 00 00 48 8b 47 48 48 8b 00 48 8b 40 10 e9 0b 8b 68 00 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 89 f5 <48> 8b 47 48 48 89 fb 40 0f b6 f6 48 8b 00 48 8b 40 20 e8 e0 8a [ 17.475185] RIP: rdma_lookup_put_uobject+0x9/0x50 RSP: ffffb8c7401e7c90 [ 17.476841] CR2: 0000000000000048 [ 17.477764] ---[ end trace 1dbcc5354071a712 ]--- [ 17.478880] Kernel panic - not syncing: Fatal exception [ 17.480277] Kernel Offset: 0xd000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Fixes: 2f08ee363fe0 ("RDMA/restrack: don't use uaccess_kernel()") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Avoid system hang during device un-regSelvin Xavier2-5/+4
BNXT_RE_FLAG_TASK_IN_PROG doesn't handle multiple work requests posted together. Track schedule of multiple workqueue items by maintaining a per device counter and proceed with IB dereg only if this counter is zero. flush_workqueue is no longer required from NETDEV_UNREGISTER path. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Fix system crash during load/unloadSelvin Xavier1-0/+5
During driver unload, the driver proceeds with cleanup without waiting for the scheduled events. So the device pointers get freed up and driver crashes when the events are scheduled later. Flush the bnxt_re_task work queue before starting device removal. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Synchronize destroy_qp with poll_cqSelvin Xavier4-19/+47
Avoid system crash when destroy_qp is invoked while the driver is processing the poll_cq. Synchronize these functions using the cq_lock. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Unpin SQ and RQ memory if QP create failsDevesh Sharma1-1/+8
Driver leaves the QP memory pinned if QP create command fails from the FW. Avoids this scenario by adding a proper exit path if the FW command fails. Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-20RDMA/bnxt_re: Disable atomic capability on bnxt_re adaptersDevesh Sharma2-17/+3
More testing needs to be done before enabling this feature. Disabling the feature temporarily Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-16RDMA/restrack: don't use uaccess_kernel()Steve Wise4-7/+22
uaccess_kernel() isn't sufficient to determine if an rdma resource is user-mode or not. For example, resources allocated in the add_one() function of an ib_client get falsely labeled as user mode, when they are kernel mode allocations. EG: mad qps. The result is that these qps are skipped over during a nldev query because of an erroneous namespace mismatch. So now we determine if the resource is user-mode by looking at the object struct's uobject or similar pointer to know if it was allocated for user mode applications. Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-16RDMA/verbs: Check existence of function prior to accessing itLeon Romanovsky2-0/+24
Update all the flows to ensure that function pointer exists prior to accessing it. This is much safer than checking the uverbs_ex_mask variable, especially since we know that test isn't working properly and will be removed in -next. This prevents a user triggereable oops. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15IB/srp: Fix completion vector assignment algorithmBart Van Assche1-6/+4
Ensure that cv_end is equal to ibdev->num_comp_vectors for the NUMA node with the highest index. This patch improves spreading of RDMA channels over completion vectors and thereby improves performance, especially on systems with only a single NUMA node. This patch drops support for the comp_vector login parameter by ignoring the value of that parameter since I have not found a good way to combine support for that parameter and automatic spreading of RDMA channels over completion vectors. Fixes: d92c0da71a35 ("IB/srp: Add multichannel support") Reported-by: Alexander Schmid <alex@modula-shop-systems.de> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Alexander Schmid <alex@modula-shop-systems.de> Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15RDMA/vmw_pvrdma: Fix usage of user response structures in ABI fileAdit Ranadive3-3/+9
This ensures that we return the right structures back to userspace. Otherwise, it looks like the reserved fields in the response structures in userspace might have uninitialized data in them. Fixes: 8b10ba783c9d ("RDMA/vmw_pvrdma: Add shared receive queue support") Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Bryan Tan <bryantan@vmware.com> Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15RDMA/uverbs: Sanitize user entered port numbers prior to access itLeon Romanovsky1-1/+8
================================================================== BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs+0x6f2/0x8c0 Read of size 4 at addr ffff88006476a198 by task syzkaller697701/265 CPU: 0 PID: 265 Comm: syzkaller697701 Not tainted 4.15.0+ #90 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ? show_regs_print_info+0x17/0x17 ? lock_contended+0x11a0/0x11a0 print_address_description+0x83/0x3e0 kasan_report+0x18c/0x4b0 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0 ? lookup_get_idr_uobject+0x120/0x200 ? copy_ah_attr_from_uverbs+0x6f2/0x8c0 copy_ah_attr_from_uverbs+0x6f2/0x8c0 ? modify_qp+0xd0e/0x1350 modify_qp+0xd0e/0x1350 ib_uverbs_modify_qp+0xf9/0x170 ? ib_uverbs_query_qp+0xa70/0xa70 ib_uverbs_write+0x7f9/0xef0 ? attach_entity_load_avg+0x8b0/0x8b0 ? ib_uverbs_query_qp+0xa70/0xa70 ? uverbs_devnode+0x110/0x110 ? cyc2ns_read_end+0x10/0x10 ? print_irqtrace_events+0x280/0x280 ? sched_clock_cpu+0x18/0x200 ? _raw_spin_unlock_irq+0x29/0x40 ? _raw_spin_unlock_irq+0x29/0x40 ? _raw_spin_unlock_irq+0x29/0x40 ? time_hardirqs_on+0x27/0x670 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x1bd/0x7a0 ? finish_task_switch+0x194/0x7a0 ? prandom_u32_state+0xe/0x180 ? rcu_read_unlock+0x80/0x80 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x433c29 RSP: 002b:00007ffcf2be82a8 EFLAGS: 00000217 Allocated by task 62: kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc+0x141/0x480 dup_fd+0x101/0xcc0 copy_process.part.62+0x166f/0x4390 _do_fork+0x1cb/0xe90 kernel_thread+0x34/0x40 call_usermodehelper_exec_work+0x112/0x260 process_one_work+0x929/0x1aa0 worker_thread+0x5c6/0x12a0 kthread+0x346/0x510 ret_from_fork+0x3a/0x50 Freed by task 259: kasan_slab_free+0x71/0xc0 kmem_cache_free+0xf3/0x4c0 put_files_struct+0x225/0x2c0 exit_files+0x88/0xc0 do_exit+0x67c/0x1520 do_group_exit+0xe8/0x380 SyS_exit_group+0x1e/0x20 entry_SYSCALL_64_fastpath+0x1e/0x8b The buggy address belongs to the object at ffff88006476a000 which belongs to the cache files_cache of size 832 The buggy address is located 408 bytes inside of 832-byte region [ffff88006476a000, ffff88006476a340) The buggy address belongs to the page: page:ffffea000191da80 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 0000000000000000 0000000000000000 0000000100080008 raw: 0000000000000000 0000000100000001 ffff88006bcf7a80 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88006476a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88006476a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88006476a180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88006476a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88006476a280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.11 Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15RDMA/uverbs: Fix circular locking dependencyLeon Romanovsky1-1/+2
Avoid circular locking dependency by calling to uobj_alloc_commit() outside of xrcd_tree_mutex lock. ====================================================== WARNING: possible circular locking dependency detected 4.15.0+ #87 Not tainted ------------------------------------------------------ syzkaller401056/269 is trying to acquire lock: (&uverbs_dev->xrcd_tree_mutex){+.+.}, at: [<000000006c12d2cd>] uverbs_free_xrcd+0xd2/0x360 but task is already holding lock: (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ucontext->uobjects_lock){+.+.}: __mutex_lock+0x111/0x1720 rdma_alloc_commit_uobject+0x22c/0x600 ib_uverbs_open_xrcd+0x61a/0xdd0 ib_uverbs_write+0x7f9/0xef0 __vfs_write+0x10d/0x700 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 entry_SYSCALL_64_fastpath+0x1e/0x8b -> #0 (&uverbs_dev->xrcd_tree_mutex){+.+.}: lock_acquire+0x19d/0x440 __mutex_lock+0x111/0x1720 uverbs_free_xrcd+0xd2/0x360 remove_commit_idr_uobject+0x6d/0x110 uverbs_cleanup_ucontext+0x2f0/0x730 ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120 ib_uverbs_close+0xf2/0x570 __fput+0x2cd/0x8d0 task_work_run+0xec/0x1d0 do_exit+0x6a1/0x1520 do_group_exit+0xe8/0x380 SyS_exit_group+0x1e/0x20 entry_SYSCALL_64_fastpath+0x1e/0x8b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ucontext->uobjects_lock); lock(&uverbs_dev->xrcd_tree_mutex); lock(&ucontext->uobjects_lock); lock(&uverbs_dev->xrcd_tree_mutex); *** DEADLOCK *** 3 locks held by syzkaller401056/269: #0: (&file->cleanup_mutex){+.+.}, at: [<00000000c9f0c252>] ib_uverbs_close+0xac/0x570 #1: (&ucontext->cleanup_rwsem){++++}, at: [<00000000b6994d49>] uverbs_cleanup_ucontext+0xf6/0x730 #2: (&ucontext->uobjects_lock){+.+.}, at: [<00000000da010f09>] uverbs_cleanup_ucontext+0x168/0x730 stack backtrace: CPU: 0 PID: 269 Comm: syzkaller401056 Not tainted 4.15.0+ #87 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ? uverbs_cleanup_ucontext+0x168/0x730 ? console_unlock+0x502/0xbd0 print_circular_bug.isra.24+0x35e/0x396 ? print_circular_bug_header+0x12e/0x12e ? find_usage_backwards+0x30/0x30 ? entry_SYSCALL_64_fastpath+0x1e/0x8b validate_chain.isra.28+0x25d1/0x40c0 ? check_usage+0xb70/0xb70 ? graph_lock+0x160/0x160 ? find_usage_backwards+0x30/0x30 ? cyc2ns_read_end+0x10/0x10 ? print_irqtrace_events+0x280/0x280 ? __lock_acquire+0x93d/0x1630 __lock_acquire+0x93d/0x1630 lock_acquire+0x19d/0x440 ? uverbs_free_xrcd+0xd2/0x360 __mutex_lock+0x111/0x1720 ? uverbs_free_xrcd+0xd2/0x360 ? uverbs_free_xrcd+0xd2/0x360 ? __mutex_lock+0x828/0x1720 ? mutex_lock_io_nested+0x1550/0x1550 ? uverbs_cleanup_ucontext+0x168/0x730 ? __lock_acquire+0x9a9/0x1630 ? mutex_lock_io_nested+0x1550/0x1550 ? uverbs_cleanup_ucontext+0xf6/0x730 ? lock_contended+0x11a0/0x11a0 ? uverbs_free_xrcd+0xd2/0x360 uverbs_free_xrcd+0xd2/0x360 remove_commit_idr_uobject+0x6d/0x110 uverbs_cleanup_ucontext+0x2f0/0x730 ? sched_clock_cpu+0x18/0x200 ? uverbs_close_fd+0x1c0/0x1c0 ib_uverbs_cleanup_ucontext.constprop.3+0x52/0x120 ib_uverbs_close+0xf2/0x570 ? ib_uverbs_remove_one+0xb50/0xb50 ? ib_uverbs_remove_one+0xb50/0xb50 __fput+0x2cd/0x8d0 task_work_run+0xec/0x1d0 do_exit+0x6a1/0x1520 ? fsnotify_first_mark+0x220/0x220 ? exit_notify+0x9f0/0x9f0 ? entry_SYSCALL_64_fastpath+0x5/0x8b ? entry_SYSCALL_64_fastpath+0x5/0x8b ? trace_hardirqs_on_thunk+0x1a/0x1c ? time_hardirqs_on+0x27/0x670 ? time_hardirqs_off+0x27/0x490 ? syscall_return_slowpath+0x6c/0x460 ? entry_SYSCALL_64_fastpath+0x5/0x8b do_group_exit+0xe8/0x380 SyS_exit_group+0x1e/0x20 entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x431ce9 Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.11 Fixes: fd3c7904db6e ("IB/core: Change idr objects to use the new schema") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcdLeon Romanovsky1-3/+1
There is no matching lock for this mutex. Git history suggests this is just a missed remnant from an earlier version of the function before this locking was moved into uverbs_free_xrcd. Originally this lock was protecting the xrcd_table_delete() ===================================== WARNING: bad unlock balance detected! 4.15.0+ #87 Not tainted ------------------------------------- syzkaller223405/269 is trying to release lock (&uverbs_dev->xrcd_tree_mutex) at: [<00000000b8703372>] ib_uverbs_close_xrcd+0x195/0x1f0 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller223405/269: #0: (&uverbs_dev->disassociate_srcu){....}, at: [<000000005af3b960>] ib_uverbs_write+0x265/0xef0 stack backtrace: CPU: 0 PID: 269 Comm: syzkaller223405 Not tainted 4.15.0+ #87 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ? ib_uverbs_write+0x265/0xef0 ? console_unlock+0x502/0xbd0 ? ib_uverbs_close_xrcd+0x195/0x1f0 print_unlock_imbalance_bug+0x131/0x160 lock_release+0x59d/0x1100 ? ib_uverbs_close_xrcd+0x195/0x1f0 ? lock_acquire+0x440/0x440 ? lock_acquire+0x440/0x440 __mutex_unlock_slowpath+0x88/0x670 ? wait_for_completion+0x4c0/0x4c0 ? rdma_lookup_get_uobject+0x145/0x2f0 ib_uverbs_close_xrcd+0x195/0x1f0 ? ib_uverbs_open_xrcd+0xdd0/0xdd0 ib_uverbs_write+0x7f9/0xef0 ? cyc2ns_read_end+0x10/0x10 ? ib_uverbs_open_xrcd+0xdd0/0xdd0 ? uverbs_devnode+0x110/0x110 ? cyc2ns_read_end+0x10/0x10 ? cyc2ns_read_end+0x10/0x10 ? sched_clock_cpu+0x18/0x200 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? __fget+0x358/0x5d0 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x4335c9 Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.11 Fixes: fd3c7904db6e ("IB/core: Change idr objects to use the new schema") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15RDMA/restrack: Increment CQ restrack object before committingLeon Romanovsky1-3/+3
Once the uobj is committed it is immediately possible another thread could destroy it, which worst case, can result in a use-after-free of the restrack objects. Cc: syzkaller <syzkaller@googlegroups.com> Fixes: 08f294a1524b ("RDMA/core: Add resource tracking for create and destroy CQs") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15RDMA/uverbs: Protect from command mask overflowLeon Romanovsky1-7/+20
The command number is not bounds checked against the command mask before it is shifted, resulting in an ubsan hit. This does not cause malfunction since the command number is eventually bounds checked, but we can make this ubsan clean by moving the bounds check to before the mask check. ================================================================================ UBSAN: Undefined behaviour in drivers/infiniband/core/uverbs_main.c:647:21 shift exponent 207 is too large for 64-bit type 'long long unsigned int' CPU: 0 PID: 446 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #61 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ubsan_epilogue+0xe/0x81 __ubsan_handle_shift_out_of_bounds+0x293/0x2f7 ? debug_check_no_locks_freed+0x340/0x340 ? __ubsan_handle_load_invalid_value+0x19b/0x19b ? lock_acquire+0x440/0x440 ? lock_acquire+0x19d/0x440 ? __might_fault+0xf4/0x240 ? ib_uverbs_write+0x68d/0xe20 ib_uverbs_write+0x68d/0xe20 ? __lock_acquire+0xcf7/0x3940 ? uverbs_devnode+0x110/0x110 ? cyc2ns_read_end+0x10/0x10 ? sched_clock_cpu+0x18/0x200 ? sched_clock_cpu+0x18/0x200 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? __fget+0x35b/0x5d0 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0x85 RIP: 0033:0x448e29 RSP: 002b:00007f033f567c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f033f5686bc RCX: 0000000000448e29 RDX: 0000000000000060 RSI: 0000000020001000 RDI: 0000000000000012 RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000056a0 R14: 00000000006e8740 R15: 0000000000000000 ================================================================================ Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.5 Fixes: 2dbd5186a39c ("IB/core: IB/core: Allow legacy verbs through extended interfaces") Reported-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroyJason Gunthorpe1-2/+3
If remove_commit fails then the lock is left locked while the uobj still exists. Eventually the kernel will deadlock. lockdep detects this and says: test/4221 is leaving the kernel with locks still held! 1 lock held by test/4221: #0: (&ucontext->cleanup_rwsem){.+.+}, at: [<000000001e5c7523>] rdma_explicit_destroy+0x37/0x120 [ib_uverbs] Fixes: 4da70da23e9b ("IB/core: Explicitly destroy an object while keeping uobject") Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15IB/uverbs: Improve lockdep_checkJason Gunthorpe1-7/+7
This is really being used as an assert that the expected usecnt is being held and implicitly that the usecnt is valid. Rename it to assert_uverbs_usecnt and tighten the checks to only accept valid values of usecnt (eg 0 and < -1 are invalid). The tigher checkes make the assertion cover more cases and is more likely to find bugs via syzkaller/etc. Fixes: 3832125624b7 ("IB/core: Add support for idr types") Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15RDMA/uverbs: Protect from races between lookup and destroy of uobjectsLeon Romanovsky1-1/+9
The race is between lookup_get_idr_uobject and uverbs_idr_remove_uobj -> uverbs_uobject_put. We deliberately do not call sychronize_rcu after the idr_remove in uverbs_idr_remove_uobj for performance reasons, instead we call kfree_rcu() during uverbs_uobject_put. However, this means we can obtain pointers to uobj's that have already been released and must protect against krefing them using kref_get_unless_zero. ================================================================== BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs.isra.2+0x860/0xa00 Read of size 4 at addr ffff88005fda1ac8 by task syz-executor2/441 CPU: 1 PID: 441 Comm: syz-executor2 Not tainted 4.15.0-rc2+ #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0x8d/0xd4 print_address_description+0x73/0x290 kasan_report+0x25c/0x370 ? copy_ah_attr_from_uverbs.isra.2+0x860/0xa00 copy_ah_attr_from_uverbs.isra.2+0x860/0xa00 ? uverbs_try_lock_object+0x68/0xc0 ? modify_qp.isra.7+0xdc4/0x10e0 modify_qp.isra.7+0xdc4/0x10e0 ib_uverbs_modify_qp+0xfe/0x170 ? ib_uverbs_query_qp+0x970/0x970 ? __lock_acquire+0xa11/0x1da0 ib_uverbs_write+0x55a/0xad0 ? ib_uverbs_query_qp+0x970/0x970 ? ib_uverbs_query_qp+0x970/0x970 ? ib_uverbs_open+0x760/0x760 ? futex_wake+0x147/0x410 ? sched_clock_cpu+0x18/0x180 ? check_prev_add+0x1680/0x1680 ? do_futex+0x3b6/0xa30 ? sched_clock_cpu+0x18/0x180 __vfs_write+0xf7/0x5c0 ? ib_uverbs_open+0x760/0x760 ? kernel_read+0x110/0x110 ? lock_acquire+0x370/0x370 ? __fget+0x264/0x3b0 vfs_write+0x18a/0x460 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0x85 RIP: 0033:0x448e29 RSP: 002b:00007f443fee0c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f443fee16bc RCX: 0000000000448e29 RDX: 0000000000000078 RSI: 00000000209f8000 RDI: 0000000000000012 RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000008e98 R14: 00000000006ebf38 R15: 0000000000000000 Allocated by task 1: kmem_cache_alloc_trace+0x16c/0x2f0 mlx5_alloc_cmd_msg+0x12e/0x670 cmd_exec+0x419/0x1810 mlx5_cmd_exec+0x40/0x70 mlx5_core_mad_ifc+0x187/0x220 mlx5_MAD_IFC+0xd7/0x1b0 mlx5_query_mad_ifc_gids+0x1f3/0x650 mlx5_ib_query_gid+0xa4/0xc0 ib_query_gid+0x152/0x1a0 ib_query_port+0x21e/0x290 mlx5_port_immutable+0x30f/0x490 ib_register_device+0x5dd/0x1130 mlx5_ib_add+0x3e7/0x700 mlx5_add_device+0x124/0x510 mlx5_register_interface+0x11f/0x1c0 mlx5_ib_init+0x56/0x61 do_one_initcall+0xa3/0x250 kernel_init_freeable+0x309/0x3b8 kernel_init+0x14/0x180 ret_from_fork+0x24/0x30 Freed by task 1: kfree+0xeb/0x2f0 mlx5_free_cmd_msg+0xcd/0x140 cmd_exec+0xeba/0x1810 mlx5_cmd_exec+0x40/0x70 mlx5_core_mad_ifc+0x187/0x220 mlx5_MAD_IFC+0xd7/0x1b0 mlx5_query_mad_ifc_gids+0x1f3/0x650 mlx5_ib_query_gid+0xa4/0xc0 ib_query_gid+0x152/0x1a0 ib_query_port+0x21e/0x290 mlx5_port_immutable+0x30f/0x490 ib_register_device+0x5dd/0x1130 mlx5_ib_add+0x3e7/0x700 mlx5_add_device+0x124/0x510 mlx5_register_interface+0x11f/0x1c0 mlx5_ib_init+0x56/0x61 do_one_initcall+0xa3/0x250 kernel_init_freeable+0x309/0x3b8 kernel_init+0x14/0x180 ret_from_fork+0x24/0x30 The buggy address belongs to the object at ffff88005fda1ab0 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 24 bytes inside of 32-byte region [ffff88005fda1ab0, ffff88005fda1ad0) The buggy address belongs to the page: page:00000000d5655c19 count:1 mapcount:0 mapping: (null) index:0xffff88005fda1fc0 flags: 0x4000000000000100(slab) raw: 4000000000000100 0000000000000000 ffff88005fda1fc0 0000000180550008 raw: ffffea00017f6780 0000000400000004 ffff88006c803980 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88005fda1980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb ffff88005fda1a00: fb fb fc fc fb fb fb fb fc fc 00 00 00 00 fc fc ffff88005fda1a80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb ffff88005fda1b00: fc fc 00 00 00 00 fc fc fb fb fb fb fc fc fb fb ffff88005fda1b80: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc ==================================================================@ Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.11 Fixes: 3832125624b7 ("IB/core: Add support for idr types") Reported-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>