aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core/device.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-12RDMA: Start use ib_device_opsKamal Heib1-104/+107
Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-11RDMA/core: Introduce ib_device_opsKamal Heib1-0/+98
This change introduces the ib_device_ops structure that defines all the InfiniBand device operations in one place, so the code will be more readable and clean, unlike today when the ops are mixed with ib_device data members. The providers will need to define the supported operations and assign them using ib_set_device_ops(), that will also make the providers code more readable and clean. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-07RDMA/core: Validate port number in query_pkey verbYuval Shaia1-0/+3
Before calling the driver's function let's make sure port is valid. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-22RDMA/core: Sync unregistration with netlink commandsParav Pandit1-4/+22
When the rdma device is getting removed, get resource info can race with device removal, as below: CPU-0 CPU-1 -------- -------- rdma_nl_rcv_msg() nldev_res_get_cq_dumpit() mutex_lock(device_lock); get device reference mutex_unlock(device_lock); [..] ib_unregister_device() /* Valid reference to * device->dev exists. */ ib_dealloc_device() [..] provider->fill_res_entry(); Even though device object is not freed, fill_res_entry() can get called on device which doesn't have a driver anymore. Kernel core device reference count is not sufficient, as this only keeps the structure valid, and doesn't guarantee the driver is still loaded. Similar race can occur with device renaming and device removal, where device_rename() tries to rename a unregistered device. While this is fine for devices of a class which are not net namespace aware, but it is incorrect for net namespace aware class coming in subsequent series. If a class is net namespace aware, then the below [1] call trace is observed in above situation. Therefore, to avoid the race, keep a reference count and let device unregistration wait until all netlink users drop the reference. [1] Call trace: kernfs: ns required in 'infiniband' for 'mlx5_0' WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120 libahci i2c_core mlxfw libata dca [last unloaded: devlink] RIP: 0010:kernfs_find_ns+0x104/0x120 Call Trace: kernfs_find_and_get_ns+0x2e/0x50 sysfs_rename_link_ns+0x40/0xb0 device_rename+0xb2/0xf0 ib_device_rename+0xb3/0x100 [ib_core] nldev_set_doit+0x165/0x190 [ib_core] rdma_nl_rcv_msg+0x249/0x250 [ib_core] ? netlink_deliver_tap+0x8f/0x3e0 rdma_nl_rcv+0xd6/0x120 [ib_core] netlink_unicast+0x17c/0x230 netlink_sendmsg+0x2f0/0x3e0 sock_sendmsg+0x30/0x40 __sys_sendto+0xdc/0x160 Fixes: da5c85078215 ("RDMA/nldev: add driver-specific resource tracking") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-17RDMA/core: Refactor ib_register_device() functionParav Pandit1-51/+75
ib_register_device() does several allocation and initialization steps. Split it into smaller more readable functions for easy review and maintenance. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17RDMA/core: Fix unwinding flow in case of error to register deviceParav Pandit1-2/+4
If port pkey list initialization fails, free the port_immutable memory during cleanup path. Currently it is missed out. If cache setup fails, free the pkey list during cleanup path. Fixes: d291f1a65 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16RDMA/core: Implement IB device rename functionLeon Romanovsky1-0/+25
Generic implementation of IB device rename function. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-26RDMA/core: Use dev_name instead of ibdev->nameJason Gunthorpe1-5/+3
These return the same thing but dev_name is a more conventional use of the kernel API. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
2018-09-26RDMA/core: Use dev_err/dbg/etc instead of pr_* + ibdev->nameJason Gunthorpe1-14/+19
Any messages related to a device should be printed with the dev_* formatters. This provides greater consistency for the user. The core does not set pr_fmt so this has no significant change. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
2018-09-26RDMA: Fully setup the device name in ib_register_deviceJason Gunthorpe1-15/+20
The current code has two copies of the device name, ibdev->dev and dev_name(&ibdev->dev), and they are setup at different times, which is very confusing. Set them both up at the same time and make dev_name() the lead name, which is the proper use of the driver core APIs. To make it very clear that the name is not valid until registration pass it in to the ib_register_device() call rather than messing with ibdev->name directly. Also the reorganization now checks that dev_name is unique even if it does not contain a %. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Devesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-09-06RDMA/core: Assign device ifindex before publishing the deviceParav Pandit1-1/+2
Even though device->ifindex is assigned before adding the device in the list which is read by netlink flow, it is better to assign rdma device index before publishing the device in the system to users and clients. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06RDMA/core: Follow correct unregister order between sysfs and cgroupParav Pandit1-1/+1
During register_device() init sequence is, (a) register with rdma cgroup followed by (b) register with sysfs Therefore, unregister_device() sequence should follow the reverse order. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06RDMA/core: Define client_data_lock as rwlock instead of spinlockParav Pandit1-15/+15
Even though device registration/unregistration and client registration/unregistration is not a performance path, define the client_data_lock as rwlock for code clarity. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06RDMA/core: Use simpler spin lock irq API from blocking contextParav Pandit1-11/+9
add_client_context(), ib_unregister_device() and ib_unregister_client() are designed to call from blocking context. There is no need to save and restore last interrupt state when called from such blocking context. Even though this is not a performance path, using the right spin lock API is desired for code clarity. To avoid checkpatch warning while removing flags, sizeof() is used. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06RDMA/core: Remove context entries from list while unregistering deviceParav Pandit1-1/+5
While unregistering a device, remove the context elements from the list to not have any stale entries. With that any errors/bugs can be checked when device is freed. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06RDMA/core: Use simplified list_for_eachParav Pandit1-5/+4
While traversing client_data_list in following conditions, linked list is only read, no elements of the list are removed. Therefore, use list_for_each_entry(), instead of list_for_each_safe(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-06RDMA/core: No need to protect kfree with spin lock and semaphoreParav Pandit1-1/+1
While unregistering a client, only context removal should be protected with lock. There is no need to protect a freeing of such context which is already removed from the list. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05IB/core: Add an unbound WQ type to the new CQ APIJack Morgenstein1-1/+14
The upstream kernel commit cited below modified the workqueue in the new CQ API to be bound to a specific CPU (instead of being unbound). This caused ALL users of the new CQ API to use the same bound WQ. Specifically, MAD handling was severely delayed when the CPU bound to the WQ was busy handling (higher priority) interrupts. This caused a delay in the MAD "heartbeat" response handling, which resulted in ports being incorrectly classified as "down". To fix this, add a new "unbound" WQ type to the new CQ API, so that users have the option to choose either a bound WQ or an unbound WQ. For MADs, choose the new "unbound" WQ. Fixes: b7363e67b23e ("IB/device: Convert ib-comp-wq to be CPU-bound") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.m> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/core: Remove {create,destroy}_ah from mandatory verbsKamal Heib1-2/+0
{create,destroy}_ah aren't mandatory verbs, because not all providers are implementing them. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gidParav Pandit1-20/+1
If the gid_attr argument is NULL then the functions behave identically to rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything can be consolidated to one function. Now that all callers either use rdma_query_gid() or ib_get_cached_gid(), ib_query_gid() API is removed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook1-2/+2
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-05-29RDMA/core: Remove indirection through ib_cache_setup()Jason Gunthorpe1-2/+2
This once might have made sense when cache.c was in a different module from device.c, but today it just obfuscation. Get rid of the wrappers and call roge_gid_mgmt_init()/cleanup() directly. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-04-06Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-13/+5
Pull rdma updates from Jason Gunthorpe: "Doug and I are at a conference next week so if another PR is sent I expect it to only be bug fixes. Parav noted yesterday that there are some fringe case behavior changes in his work that he would like to fix, and I see that Intel has a number of rc looking patches for HFI1 they posted yesterday. Parav is again the biggest contributor by patch count with his ongoing work to enable container support in the RDMA stack, followed by Leon doing syzkaller inspired cleanups, though most of the actual fixing went to RC. There is one uncomfortable series here fixing the user ABI to actually work as intended in 32 bit mode. There are lots of notes in the commit messages, but the basic summary is we don't think there is an actual 32 bit kernel user of drivers/infiniband for several good reasons. However we are seeing people want to use a 32 bit user space with 64 bit kernel, which didn't completely work today. So in fixing it we required a 32 bit rxe user to upgrade their userspace. rxe users are still already quite rare and we think a 32 bit one is non-existing. - Fix RDMA uapi headers to actually compile in userspace and be more complete - Three shared with netdev pull requests from Mellanox: * 7 patches, mostly to net with 1 IB related one at the back). This series addresses an IRQ performance issue (patch 1), cleanups related to the fix for the IRQ performance problem (patches 2-6), and then extends the fragmented completion queue support that already exists in the net side of the driver to the ib side of the driver (patch 7). * Mostly IB, with 5 patches to net that are needed to support the remaining 10 patches to the IB subsystem. This series extends the current 'representor' framework when the mlx5 driver is in switchdev mode from being a netdev only construct to being a netdev/IB dev construct. The IB dev is limited to raw Eth queue pairs only, but by having an IB dev of this type attached to the representor for a switchdev port, it enables DPDK to work on the switchdev device. * All net related, but needed as infrastructure for the rdma driver - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers - SRP performance updates - IB uverbs write path cleanup patch series from Leon - Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to set the port for ib_srpt to listen on in configfs in order for it to be enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port) - TSO and Scatter FCS support in mlx4 - Refactor of modify_qp routine to resolve problems seen while working on new code that is forthcoming - More refactoring and updates of RDMA CM for containers support from Parav - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory' user API features - Infrastructure updates for the new IOCTL interface, based on increased usage - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit kernel as was originally intended. See the commit messages for extensive details - Syzkaller bugs and code cleanups motivated by them" * tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits) IB/rxe: Fix for oops in rxe_register_device on ppc64le arch IB/mlx5: Device memory mr registration support net/mlx5: Mkey creation command adjustments IB/mlx5: Device memory support in mlx5_ib net/mlx5: Query device memory capabilities IB/uverbs: Add device memory registration ioctl support IB/uverbs: Add alloc/free dm uverbs ioctl support IB/uverbs: Add device memory capabilities reporting IB/uverbs: Expose device memory capabilities to user RDMA/qedr: Fix wmb usage in qedr IB/rxe: Removed GID add/del dummy routines RDMA/qedr: Zero stack memory before copying to user space IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR IB/mlx5: Add information for querying IPsec capabilities IB/mlx5: Add IPsec support for egress and ingress {net,IB}/mlx5: Add ipsec helper IB/mlx5: Add modify_flow_action_esp verb IB/mlx5: Add implementation for create and destroy action_xfrm IB/uverbs: Introduce ESP steering match filter IB/uverbs: Add modify ESP flow_action ...
2018-04-03IB/core: Simplify ib_query_gid to always refer to cacheParav Pandit1-12/+3
Currently following inconsistencies exist. 1. ib_query_gid() returns GID from the software cache for a RoCE port and returns GID from the HCA for an IB port. This is incorrect because software GID cache is maintained regardless of HCA port type. 2. GID is queries from the HCA via ib_query_gid and updated in the software cache for IB link layer. Both of them might not be in sync. ULPs such as SRP initiator, SRP target, IPoIB driver have historically used ib_query_gid() API to query the GID. However CM used cached version during CM processing, When software cache was introduced, this inconsitency remained. In order to simplify, improve readability and avoid link layer specific above inconsistencies, this patch brings following changes. 1. ib_query_gid() always refers to the cache layer regardless of link layer. 2. cache module who reads the GID entry from HCA and builds the cache, directly invokes the HCA provider verb's query_gid() callback function. 3. ib_query_port() is being called in early stage where GID cache is not yet build while reading port immutable property. Therefore it needs to read the default GID from the HCA for IB link layer to publish the subnet prefix. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit1-1/+3
ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Search GID only for IB link layerParav Pandit1-1/+1
Even though API is only used by IPoIB driver, its incorrect to refer RoCE GID table property to search for GID. Look for only IB link layer to search for the GID. Fixes: dbb12562f7c2 ("IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22IB/core: Refer to RoCE port property instead of GID table propertyParav Pandit1-1/+1
ib_query_gid() in commit [1] refers to RoCE GID table capability of the HCA using rdma_cap_roce_gid_table(). ib_core maintains the GID table cache regardless of the HCA provider drivers capability to maintain RoCE GID table. Therefore, whether to return a GID table entry from the software cache or from HCA should be done based on whether the port is RoCE or not. [1] commit 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21RDMA/restrack: Move restrack_clean to be symmetrical to restrack_initLeon Romanovsky1-2/+1
The fact that resource tracking commit 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") was added immediately after commit 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") caused us to miss the fact that PD and CQ are created after ib_register_device, but released after ib_unregister_device() and not before as it is expected from normal flow. Fix introduced in commit 42cea83f9524 ("IB/mlx5: Fix cleanup order on unload") revealed this fact, so this patch is needed to avoid from restrack warnings It fixes resource tracking warnings during shutdown. [ 43.473906] CPU: 5 PID: 3016 Comm: modprobe Not tainted 4.16.0-rc5-for-linust-perf-2018-03-19_07-01-58-14 #1 [ 43.473907] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 [ 43.473919] RIP: 0010:rdma_restrack_clean+0x25/0x30 [ib_core] [ 43.473921] RSP: 0018:ffffc9000267be48 EFLAGS: 00010282 [ 43.473924] RAX: 0000000000000000 RBX: ffff88033c690070 RCX: 0000000180080006 [ 43.473925] RDX: ffff88035ce922e0 RSI: ffffea000cf1a200 RDI: ffff88033c6907c8 [ 43.473926] RBP: ffff88033c690070 R08: ffff88033c689000 R09: 0000000180080006 [ 43.473927] R10: 000000003c68a001 R11: ffff88033c689000 R12: ffff88033c690000 [ 43.473929] R13: ffff88033c69005c R14: 0000000000000000 R15: 0000000000000000 [ 43.473932] FS: 00007f5928359740(0000) GS:ffff88036c540000(0000) knlGS:0000000000000000 [ 43.473933] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.473935] CR2: 00007ffffc760cc8 CR3: 000000035620c000 CR4: 00000000000006e0 [ 43.473940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 43.473941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 43.473942] Call Trace: [ 43.473969] ib_unregister_device+0xf5/0x190 [ib_core] [ 43.474000] __mlx5_ib_remove+0x2e/0x40 [mlx5_ib] [ 43.474098] mlx5_remove_device+0xf5/0x120 [mlx5_core] [ 43.474132] mlx5_unregister_interface+0x37/0x90 [mlx5_core] [ 43.474142] mlx5_ib_cleanup+0xc/0x16a [mlx5_ib] [ 43.474152] SyS_delete_module+0x159/0x260 [ 43.474159] do_syscall_64+0x61/0x110 [ 43.474165] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 43.474168] RIP: 0033:0x7f59278466b7 [ 43.474170] RSP: 002b:00007ffffc763e38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 [ 43.474172] RAX: ffffffffffffffda RBX: 000000000130d590 RCX: 00007f59278466b7 [ 43.474173] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000000000130d5f8 [ 43.474175] RBP: 0000000000000000 R08: 00007f5927b0b060 R09: 00007f59278b6a40 [ 43.474176] R10: 00007ffffc763bc0 R11: 0000000000000202 R12: 0000000000000000 [ 43.474177] R13: 0000000000000001 R14: 000000000130d5f8 R15: 0000000000000000 [ 43.474179] Code: 84 00 00 00 00 00 0f 1f 44 00 00 48 83 c7 28 31 c0 eb 0c 48 83 c0 08 48 3d 00 08 00 00 74 0f 48 8d 14 07 48 8b 12 48 85 d2 74 e8 <0f> 0b c3 f3 c3 66 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 47 28 [ 43.474221] ---[ end trace e89771e2250ffc23 ]--- Fixes: 42cea83f9524 ("IB/mlx5: Fix cleanup order on unload") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/{core, ipoib}: Simplify ib_find_gid() for unused ndevParav Pandit1-2/+1
ib_find_gid() is only used by IPoIB driver. For IB link layer, GID table entries are not based on netdevice. Netdevice parameter is unused here. Therefore, it is removed. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28IB/core: Fix missing RDMA cgroups release in case of failure to register deviceParav Pandit1-2/+4
During IB device registration process, if query_device() fails or if ib_core fails to registers sysfs entries, rdma cgroup cleanup is skipped. Cc: <stable@vger.kernel.org> # v4.2+ Fixes: 4be3a4fa51f4 ("IB/core: Fix kernel crash during fail to initialize device") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29RDMA/restrack: Add general infrastructure to track RDMA resourcesLeon Romanovsky1-0/+4
The RDMA subsystem has very strict set of objects to work with, but it completely lacks tracking facilities and has no visibility of resource utilization. The following patch adds such infrastructure to keep track of RDMA resources to help with debugging of user space applications. The primary user of this infrastructure is RDMA nldev netlink (following patches), to be exposed to userspace via rdmatool, but it is not limited too that. At this stage, the main three objects (PD, CQ and QP) are added, and more will be added later. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08Merge branch 'bart-srpt-for-next' into k.o/wip/dl-for-nextDoug Ledford1-1/+17
Merging in 12 patch series from Bart that required changes in the current for-rc branch in order to apply cleanly. Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-03IB/core: Fix two kernel warnings triggered by rxe registrationBart Van Assche1-6/+14
Eliminate the WARN_ONs that create following two warnings when registering an rxe device: WARNING: CPU: 2 PID: 1005 at drivers/infiniband/core/device.c:449 ib_register_device+0x591/0x640 [ib_core] CPU: 2 PID: 1005 Comm: run_tests Not tainted 4.15.0-rc4-dbg+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:ib_register_device+0x591/0x640 [ib_core] Call Trace: rxe_register_device+0x3c6/0x470 [rdma_rxe] rxe_add+0x543/0x5e0 [rdma_rxe] rxe_net_add+0x37/0xb0 [rdma_rxe] rxe_param_set_add+0x5a/0x120 [rdma_rxe] param_attr_store+0x5e/0xc0 module_attr_store+0x19/0x30 sysfs_kf_write+0x3d/0x50 kernfs_fop_write+0x116/0x1a0 __vfs_write+0x23/0x120 vfs_write+0xbe/0x1b0 SyS_write+0x44/0xa0 entry_SYSCALL_64_fastpath+0x23/0x9a WARNING: CPU: 2 PID: 1005 at drivers/infiniband/core/sysfs.c:1279 ib_device_register_sysfs+0x11d/0x160 [ib_core] CPU: 2 PID: 1005 Comm: run_tests Tainted: G W 4.15.0-rc4-dbg+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:ib_device_register_sysfs+0x11d/0x160 [ib_core] Call Trace: ib_register_device+0x3f7/0x640 [ib_core] rxe_register_device+0x3c6/0x470 [rdma_rxe] rxe_add+0x543/0x5e0 [rdma_rxe] rxe_net_add+0x37/0xb0 [rdma_rxe] rxe_param_set_add+0x5a/0x120 [rdma_rxe] param_attr_store+0x5e/0xc0 module_attr_store+0x19/0x30 sysfs_kf_write+0x3d/0x50 kernfs_fop_write+0x116/0x1a0 __vfs_write+0x23/0x120 vfs_write+0xbe/0x1b0 SyS_write+0x44/0xa0 entry_SYSCALL_64_fastpath+0x23/0x9a The code should accept either a parent pointer or a fully specified DMA specification without producing warnings. Fixes: 99db9494035f ("IB/core: Remove ib_device.dma_device") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: stable@vger.kernel.org # v4.11 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02RDMA/netlink: Fix locking around __ib_get_device_by_indexLeon Romanovsky1-1/+17
Holding locks is mandatory when calling __ib_device_get_by_index, otherwise there are races during the list iteration with device removal. Since the locks are static to device.c, __ib_device_get_by_index can never be called correctly by any user out side the file. Make the function static and provide a safe function that gets the correct locks and returns a kref'd pointer. Fix all callers. Fixes: e5c9469efcb1 ("RDMA/netlink: Add nldev device doit implementation") Fixes: c3f66f7b0052 ("RDMA/netlink: Implement nldev port doit callback") Fixes: 7d02f605f0dc ("RDMA/netlink: Add nldev port dumpit implementation") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-02RDMA/core: Replace open-coded variant of put_deviceLeon Romanovsky1-1/+1
There is an existing function to decrease reference counter of the device, let's use it. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.gitJason Gunthorpe1-2/+2
Patches for 4.16 that are dependent on patches sent to 4.15-rc. These are small clean ups for the vmw_pvrdma and i40iw drivers. * 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git: RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file i40iw: Change accelerated flag to bool
2017-12-18IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layerParav Pandit1-13/+3
Currently there are no users of ib_find_gid for RoCE transport. It is only used by IPoIB. Therefore its simplified to ignore RoCE ports and GID type check which was previously done for every port. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-07RDMA/netlink: Fix general protection faultLeon Romanovsky1-1/+1
The RDMA netlink core code checks validity of messages by ensuring that type and operand are in range. It works well for almost all clients except NLDEV, which has cb_table less than number of operands. Request to access such operand will trigger the following kernel panic. This patch updates all places where cb_table is declared for the consistency, but only NLDEV is actually need it. general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 0 PID: 522 Comm: syz-executor6 Not tainted 4.13.0+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 task: ffff8800657799c0 task.stack: ffff8800695d000 RIP: 0010:rdma_nl_rcv_msg+0x13a/0x4c0 RSP: 0018:ffff8800695d7838 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: 1ffff1000d2baf0b RCX: 00000000704ff4d7 RDX: 0000000000000000 RSI: ffffffff81ddb03c RDI: 00000003827fa6bc RBP: ffff8800695d7900 R08: ffffffff82ec0578 R09: 0000000000000000 R10: ffff8800695d7900 R11: 0000000000000001 R12: 000000000000001c R13: ffff880069d31e00 R14: 00000000ffffffff R15: ffff880069d357c0 FS: 00007fee6acb8700(0000) GS:ffff88006ca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000201a9000 CR3: 0000000059766000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? rdma_nl_multicast+0x80/0x80 rdma_nl_rcv+0x36b/0x4d0 ? ibnl_put_attr+0xc0/0xc0 netlink_unicast+0x4bd/0x6d0 ? netlink_sendskb+0x50/0x50 ? drop_futex_key_refs.isra.4+0x68/0xb0 netlink_sendmsg+0x9ab/0xbd0 ? nlmsg_notify+0x140/0x140 ? wake_up_q+0xa1/0xf0 ? drop_futex_key_refs.isra.4+0x68/0xb0 sock_sendmsg+0x88/0xd0 sock_write_iter+0x228/0x3c0 ? sock_sendmsg+0xd0/0xd0 ? do_futex+0x3e5/0xb20 ? iov_iter_init+0xaf/0x1d0 __vfs_write+0x46e/0x640 ? sched_clock_cpu+0x1b/0x190 ? __vfs_read+0x620/0x620 ? __fget+0x23a/0x390 ? rw_verify_area+0xca/0x290 vfs_write+0x192/0x490 SyS_write+0xde/0x1c0 ? SyS_read+0x1c0/0x1c0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fee6a74a219 RSP: 002b:00007fee6acb7d58 EFLAGS: 00000212 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000638000 RCX: 00007fee6a74a219 RDX: 0000000000000078 RSI: 0000000020141000 RDI: 0000000000000006 RBP: 0000000000000046 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000212 R12: ffff8800695d7f98 R13: 0000000020141000 R14: 0000000000000006 R15: 00000000ffffffff Code: d6 48 b8 00 00 00 00 00 fc ff df 66 41 81 e4 ff 03 44 8d 72 ff 4a 8d 3c b5 c0 a6 7f 82 44 89 b5 4c ff ff ff 48 89 f9 48 c1 e9 03 <0f> b6 0c 01 48 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 RIP: rdma_nl_rcv_msg+0x13a/0x4c0 RSP: ffff8800695d7838 ---[ end trace ba085d123959c8ec ]--- Kernel panic - not syncing: Fatal exception Cc: syzkaller <syzkaller@googlegroups.com> Fixes: b4c598a67ea1 ("RDMA/netlink: Implement nldev device dumpit calback") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-12-01IB/core: Init subsys if compiled to vmlinuz-coreDmitry Monakhov1-1/+1
Once infiniband is compiled as a core component its subsystem must be enabled before device initialization. Otherwise there is a NULL pointer dereference during mlx4_core init, calltrace: ->device_add if (dev->class) { deref dev->class->p =>NULLPTR #Config CONFIG_NET_DEVLINK=y CONFIG_MAY_USE_DEVLINK=y CONFIG_MLX4_EN=y Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-08-24Merge branch 'mellanox' into k.o/for-nextDoug Ledford1-6/+2
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB: Avoid ib_modify_port() failure for RoCE devicesSelvin Xavier1-4/+7
IB CM calls ib_modify_port() irrespective of link layer. If the failure is returned, the mad agent gets unregistered for those devices. Recently, modify_port() hook was removed from some of the low level drivers as it was always returning success. This breaks rdma connection establishment over those devices. For ethernet devices, Qkey violation and port capabilities are not applicable. So returning success for RoCE when modify_port hook is is not implemented. Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/(core, ulp): Convert register/unregister event handler to be voidLeon Romanovsky1-6/+2
The functions ib_register_event_handler() and ib_unregister_event_handler() always returned success and they can't fail. Let's convert those functions to be void, remove redundant checks and cleanup tons of goto statements. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22rdma: Autoload netlink client modulesJason Gunthorpe1-0/+2
If a message comes in and we do not have the client in the table, then try to load the module supplying that client using MODULE_ALIAS to find it. This duplicates the scheme seen in other netlink muxes (eg nfnetlink). Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18Merge branch 'k.o/for-4.13-rc' into k.o/for-nextDoug Ledford1-2/+3
Merging our (hopefully) final -rc pull branch into our for-next branch because some of our pending patches won't apply cleanly without having the -rc patches in our tree. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18Merge branch 'misc' into k.o/for-nextDoug Ledford1-2/+2
Conflicts: drivers/infiniband/core/iwcm.c - The rdma_netlink patches in HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM, we aren't safe for that context) touched the same code. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18RDMA/core: make ib_device.add method optionalSagi Grimberg1-2/+2
ib_clients can indeed fill .add to NULL, but then they will not see any device removal notifications. The reason is that that ib_register_client and ib_register_device checked existence of .add before adding the creating a corresponding client_data and adding it to the list. Simple condition reverse fixes the issue. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16IB/core: Protect sysfs entry on ib_unregister_deviceShiraz Saleem1-2/+3
ib_unregister_device is not protecting removal of sysfs entries. A call to ib_register_device in that window can result in duplicate sysfs entry warning. Move mutex_unlock to after ib_device_unregister_sysfs to protect against sysfs entry creation. This issue is exposed during driver load/unload stress test. WARNING: CPU: 5 PID: 4445 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5f/0x70 sysfs: cannot create duplicate filename '/class/infiniband/i40iw0' Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H BIOS F7 01/17/2014 Workqueue: i40e i40e_service_task [i40e] Call Trace: dump_stack+0x67/0x98 __warn+0xcc/0xf0 warn_slowpath_fmt+0x4a/0x50 ? kernfs_path_from_node+0x4b/0x60 sysfs_warn_dup+0x5f/0x70 sysfs_do_create_link_sd.isra.2+0xb7/0xc0 sysfs_create_link+0x20/0x40 device_add+0x28c/0x600 ib_device_register_sysfs+0x58/0x170 [ib_core] ib_register_device+0x325/0x570 [ib_core] ? i40iw_register_rdma_device+0x1f4/0x400 [i40iw] ? kmem_cache_alloc_trace+0x143/0x330 ? __raw_spin_lock_init+0x2d/0x50 i40iw_register_rdma_device+0x2dc/0x400 [i40iw] i40iw_open+0x10a6/0x1950 [i40iw] ? i40iw_open+0xeab/0x1950 [i40iw] ? i40iw_make_cm_node+0x9c0/0x9c0 [i40iw] i40e_client_subtask+0xa4/0x110 [i40e] i40e_service_task+0xc2d/0x1320 [i40e] process_one_work+0x203/0x710 ? process_one_work+0x16f/0x710 worker_thread+0x126/0x4a0 ? trace_hardirqs_on+0xd/0x10 kthread+0x112/0x150 ? process_one_work+0x710/0x710 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x2e/0x40 ---[ end trace fd11b69e21ea7653 ]--- Couldn't register device i40iw0 with driver model Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sindhu Devale <sindhu.devale@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10RDMA: Simplify get firmware interfaceLeon Romanovsky1-2/+2
There is a need to forward FW version to user space application through RDMA netlink. In order to make it safe, there is need to declare nla_policy and limit the size of FW string. The new define IB_FW_VERSION_NAME_MAX will limit the size of FW version string. That define was chosen to be equal to ETHTOOL_FWVERS_LEN, because many drivers anyway are limited by that value indirectly. The introduction of this define allows us to remove the string size from get_fw_str function signature. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10RDMA/netlink: Add nldev initialization flowsLeon Romanovsky1-0/+2
Add nldev init and exit flows to the RDMA/core. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10RDMA/netlink: Convert LS to doit callbackLeon Romanovsky1-3/+3
RDMA_NL_LS protocol is actually does not dump anything, but sets data and it should be handled by doit callback. This patch actually converts RDMA_NL_LS to doit callback, while preserving IWCM and RDMA_CM flows through netlink_dump_start(). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com>