aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-04RDMA/mlx5: Unfold modify RMP functionLeon Romanovsky1-28/+34
There is no need to perform modify_rmp in two separate function, while one of them uses stack as a placeholder for data while other allocates it dynamically. Combine those two functions to one call instead of two. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Unfold create RMP functionLeon Romanovsky1-19/+16
There is no need to perform create_rmp in two separate function, while one of them uses stack as a placeholder for data while other allocates it dynamically. Combine those two functions to one instead of two. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Initialize SRQ tables on mlx5_ibLeon Romanovsky6-14/+100
Transfer initialization and cleanup from mlx5_priv struct of mlx5_core_dev to be part of mlx5_ib_dev. This completes removal of SRQ from mlx5_core. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Update SRQ functions signatures to mlx5_ib formatLeon Romanovsky4-77/+83
Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by updating function signatures do not require mlx5_core_dev as an input, because all operations in mlx5_ib are supposed to use mlx5_ib_dev. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Use stages for callback to setup and release DEVXLeon Romanovsky2-7/+20
Reuse existing infrastructure to initialize and release DEVX uid. The DevX interface is intended for user space access, so it is supposed to be initialized before ib_register_device(). Also it isn't supported in switchdev mode and don't need to initialize it in that mode. Fixes: 76dc5a8406bf ("IB/mlx5: Manage device uid for DEVX white list commands") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04RDMA/mlx5: Remove SRQ signature global flagLeon Romanovsky1-4/+1
SRQ signature is not supported, hence no need for special static global variable to announce it. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Move SRQ functions to RDMA partLeon Romanovsky5-2/+717
There is no need to keep SRQ which is RDMA object in mlx5_core. In this patch, we partially move the execution code, while next patches will move table initialization/release logic too. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-04net/mlx5: Align SRQ licenses and copyright informationLeon Romanovsky1-29/+2
Ensure that both RDMA and netdev parts of SRQ implementation has same copyright and license information annotated by SPDX tags. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-12-03RDMA/nldev: Export to user space number of contextsLeon Romanovsky1-0/+1
[leonro@server ~]$ rdma res show 1: mlx5_0: pd 3 cq 5 qp 4 cm_id 0 mr 0 ctx 0 Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Annotate alloc/deallloc paths with context trackingLeon Romanovsky2-0/+5
Add restrack annotations to track allocations of ucontexts. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/restrack: Track ucontextLeon Romanovsky1-0/+5
Add ability to track allocated ib_ucontext, which are limited resource and worth to be visible by users. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Use only attrs for the write() handler signatureJason Gunthorpe4-128/+54
All of the old arguments can be derived from the uverbs_attr_bundle structure, so get rid of the redundant arguments. Most of the prior work has been removing users of the arguments to allow this to be a simple patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Do not check the input length on create_cq/qp pathsJason Gunthorpe1-29/+9
If the user did not provide a long enough command buffer then the missing bytes are forced to zero. There is no reason to check the length if a zero value is OK. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()Jason Gunthorpe1-40/+63
This has a very complicated memory layout, with two flex arrays. Use the iterator API to make reading it clearer. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Add a simple iterator interface for reading the commandJason Gunthorpe1-48/+65
Several methods have a command with a trailing flex array, and they all open code some extraction scheme. Centralize this into a simple iterator API. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Simplify ib_uverbs_ex_query_deviceJason Gunthorpe1-60/+4
We truncate the response structure if there is not enough room in the user buffer so there is no reason to have all the mess with finely managing response_length. Just fully fill the attrs and truncate on copy. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Fill in the response for IB_USER_VERBS_EX_CMD_MODIFY_QPJason Gunthorpe1-1/+8
A response struct was defined, and userspace is providing it (but not checking it). Fill it in and write it out. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Use uverbs_request() and core for write_ex handlersJason Gunthorpe1-139/+46
The write_ex handlers have this horrible boilerplate in every function to do the zero extend/zero check and min size checks. This is now handled in the core code via the meta-data, and the zero checks are handled by uverbs_request(). Replace all the occurrences. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Use uverbs_request() for request copyingJason Gunthorpe1-64/+135
This function properly zero-extends, and zero-checks if the user buffer is not the same size as the kernel command struct. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Use uverbs_response() for remaining response copyingJason Gunthorpe1-81/+52
This function properly truncates and zero-fills the response which is the standard used by the ioctl uAPI when working with user data. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Get rid of the 'callback' scheme in the compat pathJason Gunthorpe1-80/+40
There is no reason for this. For response processing we simply need to copy, truncate, and zero fill the response into whatever output buffer was provided. Add a function uverbs_response() that does this consistently. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Use uverbs_attr_bundle to pass ucore for write/write_exJason Gunthorpe2-46/+30
This creates a consistent way to access the two core buffers across write and write_ex handlers. Remove the open coded ucore conversion in the write/ex compatibility handlers. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-12-03RDMA/uverbs: Remove out_len checks that are now done by the coreJason Gunthorpe1-50/+0
write() methods must work with fixed sized structures as that is the only way to know where the udata segment starts. The common udata code now rejects any write() that has a response buffer shorter than the core's response. Thus all the checks of out_len for write methods are redundant and can be removed. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-11-29IB/mlx5: Handle raw delay drop general eventSaeed Mahameed1-3/+15
Handle FW general event rq delay drop as it was received from FW via mlx5 notifiers API, instead of handling the processed software version of that event. After this patch we can safely remove all software processed FW events types and definitions. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-29IB/mlx5: Handle raw port change event rather than the software versionSaeed Mahameed1-34/+52
Use the FW version of the port change event as forwarded via new mlx5 notifiers API. After this patch, processed software version of the port change event will become deprecated and will be totally removed in downstream patches. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-29IB/mlx5: Use the new mlx5 core notifier APISaeed Mahameed2-14/+66
Remove the deprecated mlx5_interface->event mlx5_ib callback and use new mlx5 notifier API to subscribe for mlx5 events. For native mlx5_ib devices profiles pf_profile/nic_rep_profile register the notifier callback mlx5_ib_handle_event which treats the notifier context as mlx5_ib_dev. For vport repesentors, don't register any notifier, same as before, they didn't receive any mlx5 events. For slave port (mlx5_ib_multiport_info) register a different notifier callback mlx5_ib_event_slave_port, which knows that the event is coming for mlx5_ib_multiport_info and prepares the event job accordingly. Before this on the event handler work we had to ask mlx5_core if this is a slave port mlx5_core_is_mp_slave(work->dev), now it is not needed anymore. mlx5_ib_multiport_info notifier registration is done on mlx5_ib_bind_slave_port and de-registration is done on mlx5_ib_unbind_slave_port. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-29IB/mlx5: Use fragmented QP's buffer for in-kernel usersGuy Levi2-167/+241
The current implementation of create QP requires contiguous memory, such a requirement is problematic once the memory is fragmented or the system is low in memory, it causes failures in dma_zalloc_coherent(). This patch takes advantage of the new mlx5_core API which allocates a fragmented buffer. This makes the QP creation much more resilient to memory fragmentation. Data-path code was adapted to the fact that WQEs can cross buffers. We also use the opportunity to fix some cosmetic legacy coding convention errors which were in the feature scope. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29IB/mlx5: Use fragmented SRQ's buffer for in-kernel usersGuy Levi2-5/+10
The current implementation of create SRQ requires contiguous memory, such a requirement is problematic once the memory is fragmented or the system is low in memory, it causes failures in dma_zalloc_coherent(). This patch takes the advantage of the new mlx5_core API which allocates a fragmented buffer, and makes the SRQ creation much more resilient to memory fragmentation. Data-path code was adapted to the fact that WQEs can cross buffers. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29rxe: IB_WR_REG_MR does not capture MR's iova fieldChuck Lever1-0/+1
FRWR memory registration is done with a series of calls and WRs. 1. ULP invokes ib_dma_map_sg() 2. ULP invokes ib_map_mr_sg() 3. ULP posts an IB_WR_REG_MR on the Send queue Step 2 generates an iova. It is permissible for ULPs to change this iova (with certain restrictions) between steps 2 and 3. rxe_map_mr_sg captures the MR's iova but later when rxe processes the REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova after step 2 but before step 3, rxe never captures that change. When the remote sends an RDMA Read targeting that MR, rxe looks up the R_key, but the altered iova does not match the iova stored in the MR, causing the RDMA Read request to fail. Reported-by: Anna Schumaker <schumaker.anna@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29RDMA/mlx5: Attach a DEVX counter via raw flow creationMark Bloch4-12/+55
Allow a user to attach a DEVX counter via mlx5 raw flow creation. In order to attach a counter we introduce a new attribute: MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX A counter can be attached to multiple flow steering rules. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29RDMA/qib: Remove all occurrences of BUG_ON()Leon Romanovsky8-13/+0
QIB driver was added in 2010 with many BUG_ON(), most of them were cleaned out after years of development and usages. It looks like that it is safe now to remove rest of BUG_ONs. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29IB/usnic: fix spelling mistake "miniumum" -> "minimum"Colin Ian King1-1/+1
There is a spelling mistake in a usnic_err error message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29RDMA/uverbs: fix ptr_ret.cocci warningskbuild test robot1-5/+1
drivers/infiniband/core/uverbs_cmd.c:1095:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Fixes: 7106a9769715 ("RDMA/uverbs: Make write() handlers return 0 on success") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29RDMA/drivers: Fix spelling mistake "initalize" -> "initialize"Colin Ian King1-1/+1
Fix spelling mistake in usnic_err error message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-29RDMA/mlx5: Initialize return variable in case pagefault was skippedLeon Romanovsky1-0/+1
Pagefaults occurred in non-ODP MR are completely valid events, so initialize return variable to 0. Fixes: 4d5422a309de ("IB/mlx5: Skip non-ODP MR when handling a page fault") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26RDMA/uverbs: Use uverbs_attr_bundle to pass udata for ioctl()Jason Gunthorpe4-5/+9
Have the core code initialize the driver_udata if the method has a udata description. This is done using the same create_udata the handler was supposed to call. This makes ioctl consistent with the write and write_ex paths. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Use uverbs_attr_bundle to pass udata for writeJason Gunthorpe2-73/+46
Now that we have metadata describing the command format the core code can directly compute the udata pointers and all the really ugly ib_uverbs_init_udata() calls can be removed from the handlers. This means all the write() handlers are no longer sensitive to the layout of the command buffer. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write_exJason Gunthorpe4-91/+71
The core code needs to compute the udata so we may as well pass it in the uverbs_attr_bundle instead of on the stack. This converts the simple case of write_ex() which already has a core calculation. Also change the write() path to use the attrs for ib_uverbs_init_udata() instead of on the stack. This lets the write to write_ex compatibility path continue to follow the lead of the _ex path. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Prohibit write() calls with too small buffersJason Gunthorpe1-4/+28
The size meta-data in the prior patch describes the smallest acceptable buffer for the write() interface. Globally check this in the core code. This is necessary in the case of write() methods that have a driver udata to prevent computing a negative udata buffer length. The return code of -ENOSPC is chosen here as some of the handlers already use this code, however many other handler use EINVAL. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Add structure size info to write commandsJason Gunthorpe3-106/+310
We need the structure sizes to compute the location of the udata in the core code. Annotate the sizes into the new macro language. This is generated largely by script and checked by comparing against the similar list in rdma-core. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Do not pass ib_uverbs_file to ioctl methodsJason Gunthorpe10-39/+32
The uverbs_attr_bundle already contains this pointer, and most methods don't actually need it. Get rid of the redundant function argument. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Make write() handlers return 0 on successJason Gunthorpe5-181/+131
Currently they return the command length, while all other handlers return 0. This makes the write path closer to the write_ex and ioctl path. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26RDMA/uverbs: Replace ib_uverbs_file with uverbs_attr_bundle for writeJason Gunthorpe6-166/+174
Now that we can add meta-data to the description of write() methods we need to pass the uverbs_attr_bundle into all write based handlers so future patches can use it as a container for any new data transferred out of the core. This is the first step to bringing the write() and ioctl() methods to a common interface signature. This is a simple search/replace, and we push the attr down into the uobj and other APIs to keep changes minimal. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-26IB/qib: fix spelling mistake "colescing" -> "coalescing"Colin Ian King1-1/+1
There is a spelling mistake in the module description text, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26IB/mlx5: Fix page fault handling for MWArtemy Kovalyov1-0/+1
Memory windows are implemented with an indirect MKey, when a page fault event comes for a MW Mkey we need to find the MR at the end of the list of the indirect MKeys by iterating on all items from the first to the last. The offset calculated during this process has to be zeroed after the first iteration or the next iteration will start from a wrong address, resulting incorrect ODP faulting behavior. Fixes: db570d7deafb ("IB/mlx5: Add ODP support to MW") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26IB/umem: Set correct address to the invalidation functionArtemy Kovalyov1-14/+6
The invalidate range was using PAGE_SIZE instead of the computed 'end', and had the wrong transformation of page_index due the weird construction. This can trigger during error unwind and would cause malfunction. Inline the code and correct the math. Fixes: 403cd12e2cf7 ("IB/umem: Add contiguous ODP support") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-26IB/mlx5: Skip non-ODP MR when handling a page faultArtemy Kovalyov1-0/+8
It is possible that we call pagefault_single_data_segment() with a MKey that belongs to a memory region which is not on demand (i.e. pinned pages). This can happen if, for instance, a WQE that points to multiple MRs where some of them are ODP MRs and some are not. In this case we don't need to handle this MR in the ODP context besides reporting success. Otherwise the code will call pagefault_mr() which will do to_ib_umem_odp() on a non-ODP MR and thus access out of bounds. Fixes: 7bdf65d411c1 ("IB/mlx5: Handle page faults") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-23RDMA/hns: Bugfix pbl configuration for rereg mrYixian Liu1-68/+60
Current hns driver assigned the first two PBL page addresses from previous registered MR to the hardware when reregister MR changing the memory locations occurred. This will lead to PBL addressing error as the PBL has already been released. This patch fixes this wrong assignment by using the page address from new allocated PBL. Fixes: a2c80b7b4119 ("RDMA/hns: Add rereg mr support for hip08") Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-22RDMA/core: Sync unregistration with netlink commandsParav Pandit3-14/+33
When the rdma device is getting removed, get resource info can race with device removal, as below: CPU-0 CPU-1 -------- -------- rdma_nl_rcv_msg() nldev_res_get_cq_dumpit() mutex_lock(device_lock); get device reference mutex_unlock(device_lock); [..] ib_unregister_device() /* Valid reference to * device->dev exists. */ ib_dealloc_device() [..] provider->fill_res_entry(); Even though device object is not freed, fill_res_entry() can get called on device which doesn't have a driver anymore. Kernel core device reference count is not sufficient, as this only keeps the structure valid, and doesn't guarantee the driver is still loaded. Similar race can occur with device renaming and device removal, where device_rename() tries to rename a unregistered device. While this is fine for devices of a class which are not net namespace aware, but it is incorrect for net namespace aware class coming in subsequent series. If a class is net namespace aware, then the below [1] call trace is observed in above situation. Therefore, to avoid the race, keep a reference count and let device unregistration wait until all netlink users drop the reference. [1] Call trace: kernfs: ns required in 'infiniband' for 'mlx5_0' WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120 libahci i2c_core mlxfw libata dca [last unloaded: devlink] RIP: 0010:kernfs_find_ns+0x104/0x120 Call Trace: kernfs_find_and_get_ns+0x2e/0x50 sysfs_rename_link_ns+0x40/0xb0 device_rename+0xb2/0xf0 ib_device_rename+0xb3/0x100 [ib_core] nldev_set_doit+0x165/0x190 [ib_core] rdma_nl_rcv_msg+0x249/0x250 [ib_core] ? netlink_deliver_tap+0x8f/0x3e0 rdma_nl_rcv+0xd6/0x120 [ib_core] netlink_unicast+0x17c/0x230 netlink_sendmsg+0x2f0/0x3e0 sock_sendmsg+0x30/0x40 __sys_sendto+0xdc/0x160 Fixes: da5c85078215 ("RDMA/nldev: add driver-specific resource tracking") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-22RDMA/cma: Move cma module specific functions to cma_priv.hParav Pandit3-29/+31
Currently several rdma_cm module specific functions are declared in core_priv.h file. Now that we have cma_priv.h file specific to rdma_cm kernel module, move them from core_priv.h to cma_priv.h Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>