aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-06-29RDMA/core: Always release restrack objectLeon Romanovsky1-1/+1
Change location of rdma_restrack_del() to fix the bug where task_struct was acquired but not released, causing to resource leak. ucma_create_id() { ucma_alloc_ctx(); rdma_create_user_id() { rdma_restrack_new(); rdma_restrack_set_name() { rdma_restrack_attach_task.part.0(); <--- task_struct was gotten } } ucma_destroy_private_ctx() { ucma_put_ctx(); rdma_destroy_id() { _destroy_id() <--- id_priv was freed } } } Fixes: 889d916b6f8a ("RDMA/core: Don't access cm_id after its destruction") Link: https://lore.kernel.org/r/073ec27acb943ca8b6961663c47c5abe78a5c8cc.1624948948.git.leonro@nvidia.com Reported-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-25RDMA/cma: Fix rdma_resolve_route() memory leakGerd Rausch1-1/+2
Fix a memory leak when "mda_resolve_route() is called more than once on the same "rdma_cm_id". This is possible if cma_query_handler() triggers the RDMA_CM_EVENT_ROUTE_ERROR flow which puts the state machine back and allows rdma_resolve_route() to be called again. Link: https://lore.kernel.org/r/f6662b7b-bdb7-2706-1e12-47c61d3474b6@oracle.com Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-25RDMA/core/sa_query: Remove unused argumentHåkon Bugge1-4/+2
Fixes:4c33bd1926cc ("IB/SA: Add support to query OPA path records") Link: https://lore.kernel.org/r/1624624257-3677-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-25RDMA/cma: Fix incorrect Packet Lifetime calculationHåkon Bugge1-2/+4
An approximation for the PacketLifeTime is half the local ACK timeout. The encoding for both timers are logarithmic. If the local ACK timeout is set, but zero, it means the timer is disabled. In this case, we choose the CMA_IBOE_PACKET_LIFETIME value, since 50% of infinite makes no sense. Before this commit, the PacketLifeTime became 255 if local ACK timeout was zero (not running). Fixed by explicitly testing for timeout being zero. Fixes: e1ee1e62bec4 ("RDMA/cma: Use ACK timeout for RoCE packetLifeTime") Link: https://lore.kernel.org/r/1624371207-26710-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-24RDMA/cma: Protect RMW with qp_mutexHåkon Bugge1-1/+19
The struct rdma_id_private contains three bit-fields, tos_set, timeout_set, and min_rnr_timer_set. These are set by accessor functions without any synchronization. If two or all accessor functions are invoked in close proximity in time, there will be Read-Modify-Write from several contexts to the same variable, and the result will be intermittent. Fixed by protecting the bit-fields by the qp_mutex in the accessor functions. The consumer of timeout_set and min_rnr_timer_set is in rdma_init_qp_attr(), which is called with qp_mutex held for connected QPs. Explicit locking is added for the consumers of tos and tos_set. This commit depends on ("RDMA/cma: Remove unnecessary INIT->INIT transition"), since the call to rdma_init_qp_attr() from cma_init_conn_qp() does not hold the qp_mutex. Fixes: 2c1619edef61 ("IB/cma: Define option to set ack timeout and pack tos_set") Fixes: 3aeffc46afde ("IB/cma: Introduce rdma_set_min_rnr_timer()") Link: https://lore.kernel.org/r/1624369197-24578-3-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-24RDMA/cma: Remove unnecessary INIT->INIT transitionHåkon Bugge1-16/+1
In rdma_create_qp(), a connected QP will be transitioned to the INIT state. Afterwards, the QP will be transitioned to the RTR state by the cma_modify_qp_rtr() function. But this function starts by performing an ib_modify_qp() to the INIT state again, before another ib_modify_qp() is performed to transition the QP to the RTR state. Hence, there is no need to transition the QP to the INIT state in rdma_create_qp(). Link: https://lore.kernel.org/r/1624369197-24578-2-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-23Backmerge tag 'v5.13-rc7' into drm-nextDave Airlie1-0/+5
Backmerge Linux 5.13-rc7 to make some pulls from later bases apply, and to bake in the conflicts so far.
2021-06-22Merge tag 'v5.13-rc7' into rdma.git for-nextJason Gunthorpe3-4/+13
Linux 5.13-rc7 Needed for dependencies in following patches. Merge conflict in rxe_cmop.c resolved by compining both patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21IB/core: Removed port validity check from ib_get_cached_subnet_prefixAnand Khoje4-21/+6
Removed port validity check from ib_get_cached_subnet_prefix() as this check is not needed because "port_num" is valid. Link: https://lore.kernel.org/r/20210616154509.1047-2-anand.a.khoje@oracle.com Suggested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Anand Khoje <anand.a.khoje@oracle.com> Signed-off-by: Haakon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21RDMA: Fix kernel-doc warnings about wrong commentLeon Romanovsky2-3/+4
Compilation with W=1 produces warnings similar to the below. drivers/infiniband/ulp/ipoib/ipoib_main.c:320: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst All such occurrences were found with the following one line git grep -A 1 "\/\*\*" drivers/infiniband/ Link: https://lore.kernel.org/r/e57d5f4ddd08b7a19934635b44d6d632841b9ba7.1623823612.git.leonro@nvidia.com Reviewed-by: Jack Wang <jinpu.wang@ionos.com> #rtrs Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21RDMA/core: Fix incorrect print format specifierWenpeng Liang15-45/+45
There are some '%u' for 'int' and '%d' for 'unsigend int', they should be fixed. Link: https://lore.kernel.org/r/1623325232-30900-1-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA: Remove rdma_set_device_sysfs_group()Jason Gunthorpe1-1/+3
The driver's device group can be specified as part of the ops structure like the device's port group. No need for the complicated API. Link: https://lore.kernel.org/r/8964785a34fd3a29ff5b6693493f575b717e594d.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Allow port_groups to be used with namespacesJason Gunthorpe1-6/+4
Now that the port_groups data is being destroyed and managed by the core code this restriction is no longer needed. All the ib_port_attrs are compatible with the core's sysfs lifecycle. When the main device is destroyed and moved to another namespace the driver's port sysfs can be created/destroyed as well due to it now being a simple attribute list. Link: https://lore.kernel.org/r/afd8b676eace2821692d44489ff71856277c48d1.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA: Change ops->init_port to ops->port_groupsJason Gunthorpe2-28/+15
init_port was only being used to register sysfs attributes against the port kobject. Now that all users are creating static attribute_group's we can simply set the attribute_group list in the ops and the core code can just handle it directly. This makes all the sysfs management quite straightforward and prevents any driver from abusing the naked port kobject in future because no driver code can access it. Link: https://lore.kernel.org/r/114f68f3d921460eafe14cea5a80ca65d81729c3.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/cm: Use an attribute_group on the ib_port_attribute intead of kobj'sJason Gunthorpe3-166/+119
This code is trying to attach a list of counters grouped into 4 groups to the ib_port sysfs. Instead of creating a bunch of kobjects simply express everything naturally as an ib_port_attribute and add a single attribute_groups list. Remove all the naked kobject manipulations. Link: https://lore.kernel.org/r/0d5a7241ee0fe66622de04fcbaafaf6a791d5c7c.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Expose the ib port sysfs attribute machineryJason Gunthorpe1-100/+117
Other things outside the core code are creating attributes against the port. This patch exposes the basic machinery to do this. The ib_port_attribute type allows creating groups of attributes attatched to the port and comes with the usual machinery to do this. Link: https://lore.kernel.org/r/5c4aeae57f6fa7c59a1d6d1c5506069516ae9bbf.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Remove the kobject_uevent() NOPJason Gunthorpe1-2/+0
This call does nothing because the ib_port kobj is nested under a struct device kobject and the dev_uevent_filter() function of the struct device blocks uevents for any children kobj's that are not also struct devices. A uevent for the struct device will be triggered after ib_setup_port_attrs() returns which causes udev to pick up all the deep "attributes" which are implemented as kobjects nested under a struct device and assign them to the udev object for the struct device: $ udevadm info -a /sys/class/infiniband/ibp0s9 ATTR{ports/1/counters/excessive_buffer_overrun_errors}=="0" Link: https://lore.kernel.org/r/49231c92c7d4c60686de18f7e20932d0c82160ee.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Create the device hw_counters through the normal groups mechanismJason Gunthorpe3-54/+27
Instead of calling device_add_groups() add the group to the existing groups array which is managed through device_add(). This requires setting up the hw_counters before device_add(), so it gets split up from the already split port sysfs flow. Move all the memory freeing to the release function. Link: https://lore.kernel.org/r/666250d937b64f6fdf45da9e2dc0b6e5e4f7abd8.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Simplify how the port sysfs is createdJason Gunthorpe1-219/+103
Use the same technique as gid_attrs now uses to manage the port sysfs. Bundle everything into three allocations and use a single sysfs_create_groups() to build everything in one shot. All the memory is always freed in the kobj release function, removing most of the error unwinding. The gid_attr technique and the hw_counters are very similar, merge the two together and combine the sysfs_create_group() call for hw_counters with the single sysfs group setup. Link: https://lore.kernel.org/r/b688f3340694c59f7b44b1bde40e25559ef43cf3.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Simplify how the gid_attrs sysfs is createdJason Gunthorpe1-81/+89
Instead of having an whole bunch of different allocations to create the gid_attr kobjects reduce it to three, one for the kobj struct plus the attributes, and one for the attribute list for each of the two groups. Move the freeing of all allocations to the release function. Reorder the operations so all the allocations happen first then the kobject & sysfs operations are last. This removes the majority of the complicated error unwind since the release function will always undo all the memory allocations. Freeing the memory is also much simpler since there is a lot less of it. Consolidate creating the "group of array indexes" pattern into one helper function. Ensure kobject_del is used. Link: https://lore.kernel.org/r/f4149d379db7178d37d11d75e3026bf550f818a1.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Split gid_attrs related sysfs from add_port()Jason Gunthorpe1-71/+89
The gid_attrs directory is a dedicated kobj nested under the port, construct/destruct it with its own pair of functions for understandability. This is much more readable than having it weirdly inlined out of order into the add_port() function. Link: https://lore.kernel.org/r/1c9434111b6770a7aef0e644a88a16eee7e325b8.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Split port and device counter sysfs attributesJason Gunthorpe1-168/+290
This code creates a 'struct hw_stats_attribute' for each sysfs entry that contains a naked 'struct attribute' inside. It then proceeds to attach this same structure to a 'struct device' kobj and a 'struct ib_port' kobj. However, this violates the typing requirements. 'struct device' requires the attribute to be a 'struct device_attribute' and 'struct ib_port' requires the attribute to be 'struct port_attribute'. This happens to work because the show/store function pointers in all three structures happen to be at the same offset and happen to be nearly the same signature. This means when container_of() was used to go between the wrong two types it still managed to work. However clang CFI detection notices that the function pointers have a slightly different signature. As with show/store this was only working because the device and port struct layouts happened to have the kobj at the front. Correct this by have two independent sets of data structures for the port and device case. The two different attributes correctly include the port/device_attribute struct and everything from there up is kept split. The show/store function call chains start with device/port unique functions that invoke a common show/store function pointer. Link: https://lore.kernel.org/r/a8b3864b4e722aed3657512af6aa47dc3c5033be.1623427137.git.leonro@nvidia.com Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointerJason Gunthorpe3-9/+14
It is much saner to store a pointer to the kobject structure that contains the cannonical stats pointer than to copy the stats pointers into a public structure. Future patches will require the sysfs pointer for other purposes. Link: https://lore.kernel.org/r/f90551dfd296cde1cb507bbef27cca9891d19871.1623427137.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16RDMA: Split the alloc_hw_stats() ops to port and device variantsJason Gunthorpe4-8/+11
This is being used to implement both the port and device global stats, which is causing some confusion in the drivers. For instance EFA and i40iw both seem to be misusing the device stats. Split it into two ops so drivers that don't support one or the other can leave the op NULL'd, making the calling code a little simpler to understand. Link: https://lore.kernel.org/r/1955c154197b2a159adc2dc97266ddc74afe420c.1623427137.git.leonro@nvidia.com Tested-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10IB/cm: Remove dgid from the cm_id_priv avJason Gunthorpe1-9/+4
It turns out this is only being used to store the LID for SIDR mode to search the RB tree for request de-duplication. Store the LID value directly and don't pretend it is a GID. Link: https://lore.kernel.org/r/2e7c87b6f662c90c642fc1838e363ad3e6ef14a4.1623236345.git.leonro@nvidia.com Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10RDMA: Verify port when creating flow ruleMaor Gottlieb1-0/+5
Validate port value provided by the user and with that remove no longer needed validation by the driver. The missing check in the mlx5_ib driver could cause to the below oops. Call trace: _create_flow_rule+0x2d4/0xf28 [mlx5_ib] mlx5_ib_create_flow+0x2d0/0x5b0 [mlx5_ib] ib_uverbs_ex_create_flow+0x4cc/0x624 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd4/0x150 [ib_uverbs] ib_uverbs_cmd_verbs.isra.7+0xb28/0xc50 [ib_uverbs] ib_uverbs_ioctl+0x158/0x1d0 [ib_uverbs] do_vfs_ioctl+0xd0/0xaf0 ksys_ioctl+0x84/0xb4 __arm64_sys_ioctl+0x28/0xc4 el0_svc_common.constprop.3+0xa4/0x254 el0_svc_handler+0x84/0xa0 el0_svc+0x10/0x26c Code: b9401260 f9615681 51000400 8b001c20 (f9403c1a) Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs") Link: https://lore.kernel.org/r/faad30dc5219a01727f47db3dc2f029d07c82c00.1623309971.git.leonro@nvidia.com Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-08RDMA/core: Use refcount_t instead of atomic_t on refcount of ib_uverbs_deviceWeihang Li2-7/+7
The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks. Link: https://lore.kernel.org/r/1622194663-2383-8-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-08RDMA/core: Use refcount_t instead of atomic_t on refcount of mcast_portWeihang Li1-4/+4
The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks. Link: https://lore.kernel.org/r/1622194663-2383-6-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-08RDMA/core: Use refcount_t instead of atomic_t on refcount of mcast_memberWeihang Li1-6/+6
The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks. Link: https://lore.kernel.org/r/1622194663-2383-5-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-08RDMA/core: Remove refcount from struct ib_mad_snoop_privateJason Gunthorpe1-1/+0
The member is never used, delete it. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-08RDMA/core: Use refcount_t instead of atomic_t on refcount of iwpm_admin_dataWeihang Li2-5/+9
The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks. Increase refcount_t from 0 to 1 is regarded as there is a risk about use-after-free. So it should be set to 1 directly during initialization. Link: https://lore.kernel.org/r/1622194663-2383-3-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-08RDMA/core: Use refcount_t instead of atomic_t on refcount of iwcm_id_privateWeihang Li2-6/+5
The refcount_t API will WARN on underflow and overflow of a reference counter, and avoid use-after-free risks. Link: https://lore.kernel.org/r/1622194663-2383-2-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-07RDMA/umem: fix missing automated renameChristian König1-1/+1
This occasions was missed during the recent rename of the function. Signed-off-by: Christian König <christian.koenig@amd.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210607070658.11586-1-christian.koenig@amd.com
2021-06-02IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lockMark Zhang1-19/+93
During cm_dev deregistration in cm_remove_one(), the cm_device and cm_ports will be freed, after that they should not be accessed. The mad_agent needs to be protected as well. This patch adds a cm_device kref to protect cm_dev and cm_ports, and a mad_agent_lock spinlock to protect mad_agent. Link: https://lore.kernel.org/r/501ba7a2ff203dccd0e6755d3f93329772adce52.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02IB/cm: Improve the calling of cm_init_av_for_lap and cm_init_av_by_pathMark Zhang1-47/+58
The cm_init_av_for_lap() and cm_init_av_by_path() function calls have the following issues: 1. Both of them might sleep and should not be called under spinlock. 2. The access of cm_id_priv->av should be under cm_id_priv->lock, which means it can't be initialized directly. This patch splits the calling of 2 functions into two parts: first one initializes an AV outside of the spinlock, the second one copies AV to cm_id_priv->av under spinlock. Fixes: e1444b5a163e ("IB/cm: Fix automatic path migration support") Link: https://lore.kernel.org/r/038fb8ad932869b4548b0c7708cab7f76af06f18.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02IB/cm: Simplify ib_cancel_mad() and ib_modify_mad() callsMark Zhang3-37/+26
The mad_agent parameter is redundant since the struct ib_mad_send_buf already has a pointer of it. Link: https://lore.kernel.org/r/0987c784b25f7bfa72f78691f50cff066de587e1.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02Revert "IB/cm: Mark stale CM id's whenever the mad agent was unregistered"Mark Zhang1-109/+14
This reverts commit 9db0ff53cb9b43ed75bacd42a89c1a0ab048b2b0, which wasn't a full fix and still causes to the following panic: panic @ time 1605623870.843, thread 0xfffffeb63b552000: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe811a94e000 time = 1605623870 cpuid = 9, TSC = 0xb7937acc1b6 Panic occurred in module kernel loaded at 0xffffffff80200000:Stack: -------------------------------------------------- kernel:vm_fault+0x19da kernel:vm_fault_trap+0x6e kernel:trap_pfault+0x1f1 kernel:trap+0x31e kernel:cm_destroy_id+0x38c kernel:rdma_destroy_id+0x127 kernel:sdp_shutdown_task+0x3ae kernel:taskqueue_run_locked+0x10b kernel:taskqueue_thread_loop+0x87 kernel:fork_exit+0x83 Link: https://lore.kernel.org/r/4346449a7cdacc7a4eedc89cb1b42d8434ec9814.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02IB/cm: Tidy remaining cm_msg free pathsJason Gunthorpe1-10/+10
Now that all the free paths are explicit cm_free_msg() will only be called for msgs's allocated with cm_alloc_msg(), so we can assume the context is set. Place it after the allocation function it is paired with for clarity. Also remove a bogus NULL assignment in one place after a cancel. This does nothing other than disable completions to become events, but changing the state already did that. Link: https://lore.kernel.org/r/082fd3552be0d1a2c19b1c4cefb5f3f0e3e68e82.1622629024.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02IB/cm: Call the correct message free functions in cm_send_handler()Jason Gunthorpe1-27/+25
There are now three destroy functions for the cm_msg, and all places except the general send completion handler use the correct function. Fix cm_send_handler() to detect which kind of message is being completed and destroy it using the correct function with the correct locking. Link: https://lore.kernel.org/r/62a507195b8db85bb11228d0c6e7fa944204bf12.1622629024.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02IB/cm: Split cm_alloc_msg()Jason Gunthorpe1-75/+115
This is being used with two quite different flows, one attaches the message to the priv and the other does not. Ensure the message attach is consistently done under the spinlock and ensure that the free on error always detaches the message from the cm_id_priv, also always under lock. This makes read/write to the cm_id_priv->msg consistently locked and consistently NULL'd when the message is freed, even in all error paths. Link: https://lore.kernel.org/r/f692b8c89eecb34fd82244f317e478bea6c97688.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02IB/cm: Pair cm_alloc_response_msg() with a cm_free_response_msg()Jason Gunthorpe1-11/+18
This is not a functional change, but it helps make the purpose of all the cm_free_msg() calls clearer. In this case a response msg has a NULL context[0], and is never placed in cm_id_priv->msg. Link: https://lore.kernel.org/r/5cd53163be7df0a94f0d4ef7294546bc674fb74a.1622629024.git.leonro@nvidia.com Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-02RDMA/core: Sanitize WQ state received from the userspaceLeon Romanovsky1-2/+19
The mlx4 and mlx5 implemented differently the WQ input checks. Instead of duplicating mlx4 logic in the mlx5, let's prepare the input in the central place. The mlx5 implementation didn't check for validity of state input. It is not real bug because our FW checked that, but still worth to fix. Fixes: f213c0527210 ("IB/uverbs: Add WQ support") Link: https://lore.kernel.org/r/ac41ad6a81b095b1a8ad453dcf62cf8d3c5da779.1621413310.git.leonro@nvidia.com Reported-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28RDMA/core: Use the DEVICE_ATTR_RO macroYueHaibing1-4/+3
Use the DEVICE_ATTR_RO() helper instead of plain DEVICE_ATTR(), which makes the code a bit shorter and easier to read. Link: https://lore.kernel.org/r/20210526132949.20184-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-28IB/core: Only update PKEY and GID caches on respective eventsHåkon Bugge1-8/+15
Both the PKEY and GID tables in an HCA can hold in the order of hundreds entries. Reading them is expensive. Partly because the API for retrieving them only returns a single entry at a time. Further, on certain implementations, e.g., CX-3, the VFs are paravirtualized in this respect and have to rely on the PF driver to perform the read. This again demands VF to PF communication. IB Core's cache is refreshed on all events. Hence, filter the refresh of the PKEY and GID caches based on the event received being IB_EVENT_PKEY_CHANGE and IB_EVENT_GID_CHANGE respectively. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/1621964949-28484-1-git-send-email-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-19RDMA/uverbs: Fix a NULL vs IS_ERR() bugDan Carpenter1-2/+2
The uapi_get_object() function returns error pointers, it never returns NULL. Fixes: 149d3845f4a5 ("RDMA/uverbs: Add a method to introspect handles in a context") Link: https://lore.kernel.org/r/YJ6Got+U7lz+3n9a@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-18RDMA/core: Don't access cm_id after its destructionShay Drory1-2/+3
restrack should only be attached to a cm_id while the ID has a valid device pointer. It is set up when the device is first loaded, but not cleared when the device is removed. There is also two copies of the device pointer, one private and one in the public API, and these were left out of sync. Make everything go to NULL together and manipulate restrack right around the device assignments. Found by syzcaller: BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline] BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline] BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline] BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline] BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline] BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783 Write of size 8 at addr dead000000000108 by task syz-executor716/334 CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0xbe/0xf9 lib/dump_stack.c:120 __kasan_report mm/kasan/report.c:400 [inline] kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413 __list_del include/linux/list.h:112 [inline] __list_del_entry include/linux/list.h:135 [inline] list_del include/linux/list.h:146 [inline] cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline] cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline] cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783 _destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862 ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185 ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576 ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797 __fput+0x169/0x540 fs/file_table.c:280 task_work_run+0xb7/0x100 kernel/task_work.c:140 exit_task_work include/linux/task_work.h:30 [inline] do_exit+0x7da/0x17f0 kernel/exit.c:825 do_group_exit+0x9e/0x190 kernel/exit.c:922 __do_sys_exit_group kernel/exit.c:933 [inline] __se_sys_exit_group kernel/exit.c:931 [inline] __x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931 do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 255d0c14b375 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count") Link: https://lore.kernel.org/r/3352ee288fe34f2b44220457a29bfc0548686363.1620711734.git.leonro@nvidia.com Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-11RDMA/core: Remove never used ib_modify_wq function callLeon Romanovsky1-21/+0
The function ib_modify_wq() is not used, so remove it. Link: https://lore.kernel.org/r/c5e48d517b9163fe4f9ffd224050b83fdb3571c6.1620552935.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-11RDMA/ucma: Cleanup to reduce duplicate codeXiaofei Tan1-2/+2
The lable "err1" does the same thing as the branch of copy_to_user() failed in the function ucma_create_id(). Just jump to the label directly to reduce duplicate code. Link: https://lore.kernel.org/r/1620291106-3675-1-git-send-email-tanxiaofei@huawei.com Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-10RDMA/core: Prevent divide-by-zero error triggered by the userLeon Romanovsky1-0/+3
The user_entry_size is supplied by the user and later used as a denominator to calculate number of entries. The zero supplied by the user will trigger the following divide-by-zero error: divide error: 0000 [#1] SMP KASAN PTI CPU: 4 PID: 497 Comm: c_repro Not tainted 5.13.0-rc1+ #281 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:ib_uverbs_handler_UVERBS_METHOD_QUERY_GID_TABLE+0x1b1/0x510 Code: 87 59 03 00 00 e8 9f ab 1e ff 48 8d bd a8 00 00 00 e8 d3 70 41 ff 44 0f b7 b5 a8 00 00 00 e8 86 ab 1e ff 31 d2 4c 89 f0 31 ff <49> f7 f5 48 89 d6 48 89 54 24 10 48 89 04 24 e8 1b ad 1e ff 48 8b RSP: 0018:ffff88810416f828 EFLAGS: 00010246 RAX: 0000000000000008 RBX: 1ffff1102082df09 RCX: ffffffff82183f3d RDX: 0000000000000000 RSI: ffff888105f2da00 RDI: 0000000000000000 RBP: ffff88810416fa98 R08: 0000000000000001 R09: ffffed102082df5f R10: ffff88810416faf7 R11: ffffed102082df5e R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000008 R15: ffff88810416faf0 FS: 00007f5715efa740(0000) GS:ffff88811a700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000840 CR3: 000000010c2e0001 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? ib_uverbs_handler_UVERBS_METHOD_INFO_HANDLES+0x4b0/0x4b0 ib_uverbs_cmd_verbs+0x1546/0x1940 ib_uverbs_ioctl+0x186/0x240 __x64_sys_ioctl+0x38a/0x1220 do_syscall_64+0x3f/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 9f85cbe50aa0 ("RDMA/uverbs: Expose the new GID query API to user space") Link: https://lore.kernel.org/r/b971cc70a8b240a8b5eda33c99fa0558a0071be2.1620657876.git.leonro@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds32-373/+632
Pull rdma updates from Jason Gunthorpe: "This is significantly bug fixes and general cleanups. The noteworthy new features are fairly small: - XRC support for HNS and improves RQ operations - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and qib - Quite a few general cleanups on spelling, error handling, static checker detections, etc - Increase the number of device ports supported beyond 255. High port count software switches now exist - Several bug fixes for rtrs - mlx5 Device Memory support for host controlled atomics - Report SRQ tables through to rdma-tool" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (145 commits) IB/qib: Remove redundant assignment to ret RDMA/nldev: Add copy-on-fork attribute to get sys command RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res RDMA/siw: Fix a use after free in siw_alloc_mr IB/hfi1: Remove redundant variable rcd RDMA/nldev: Add QP numbers to SRQ information RDMA/nldev: Return SRQ information RDMA/restrack: Add support to get resource tracking for SRQ RDMA/nldev: Return context information RDMA/core: Add CM to restrack after successful attachment to a device RDMA/cma: Skip device which doesn't support CM RDMA/rxe: Fix a bug in rxe_fill_ip_info() RDMA/mlx5: Expose private query port RDMA/mlx4: Remove an unused variable RDMA/mlx5: Fix type assignment for ICM DM IB/mlx5: Set right RoCE l3 type and roce version while deleting GID RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails RDMA/cxgb4: add missing qpid increment IB/ipoib: Remove unnecessary struct declaration RDMA/bnxt_re: Get rid of custom module reference counting ...