aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-09-06RDMA/cma: Protect cma dev list with lockParav Pandit1-5/+7
When AF_IB addresses are used during rdma_resolve_addr() a lock is not held. A cma device can get removed while list traversal is in progress which may lead to crash. ie CPU0 CPU1 ==== ==== rdma_resolve_addr() cma_resolve_ib_dev() list_for_each() cma_remove_one() cur_dev->device mutex_lock(&lock) list_del(); mutex_unlock(&lock); cma_process_remove(); Therefore, hold a lock while traversing the list which avoids such situation. Cc: <stable@vger.kernel.org> # 3.10 Fixes: f17df3b0dede ("RDMA/cma: Add support for AF_IB to rdma_resolve_addr()") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-05RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one()Parav Pandit1-3/+2
If ib_uverbs_create_uapi() fails, dev_num should be freed from the bitmap. Fixes: 7d96c9b17636 ("IB/uverbs: Have the core code create the uverbs_root_spec") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-04IB/core: Release object lock if destroy failedArtemy Kovalyov1-0/+2
The object lock was supposed to always be released during destroy, but when the destruction retry series was integrated with the destroy series it created a failure path that missed the unlock. Keep with convention, if destroy fails the caller must undo all locking. Fixes: 87ad80abc70d ("IB/uverbs: Consolidate uobject destruction") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-04RDMA/ucma: check fd type in ucma_migrate_id()Jann Horn1-0/+6
The current code grabs the private_data of whatever file descriptor userspace has supplied and implicitly casts it to a `struct ucma_file *`, potentially causing a type confusion. This is probably fine in practice because the pointer is only used for comparisons, it is never actually dereferenced; and even in the comparisons, it is unlikely that a file from another filesystem would have a ->private_data pointer that happens to also be valid in this context. But ->private_data is not always guaranteed to be a valid pointer to an object owned by the file's filesystem; for example, some filesystems just cram numbers in there. Check the type of the supplied file descriptor to be safe, analogous to how other places in the kernel do it. Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-22mm, oom: distinguish blockable mode for mmu notifiersMichal Hocko1-7/+26
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: David Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-16Merge branch 'linus/master' into rdma.git for-nextJason Gunthorpe1-1/+1
rdma.git merge resolution for the 4.19 merge window Conflicts: drivers/infiniband/core/rdma_core.c - Use the rdma code and revise with the new spelling for atomic_fetch_add_unless drivers/nvme/host/rdma.c - Replace max_sge with max_send_sge in new blk code drivers/nvme/target/rdma.c - Use the blk code and revise to use NULL for ib_post_recv when appropriate - Replace max_sge with max_recv_sge in new blk code net/rds/ib_send.c - Use the net code and revise to use NULL for ib_post_recv when appropriate Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16Merge tag 'v4.18' into rdma.git for-nextJason Gunthorpe2-21/+79
Resolve merge conflicts from the -rc cycle against the rdma.git tree: Conflicts: drivers/infiniband/core/uverbs_cmd.c - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next - Merge removal of file->ucontext in for-next with new code in -rc drivers/infiniband/core/uverbs_main.c - for-next removed code from ib_uverbs_write() that was modified in for-rc Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15IB/core: Change filter function return type from int to boolParav Pandit2-31/+43
Filter functions returns either 0 or 1, therefore better change their return type from int to bool to reflect the same. Additionally some filter functions have suffix of _filter some doesn't. Make all filter function consistent to have __filter suffix to improve code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15IB/core: Update GID entries for netdevice whose mac address changesParav Pandit1-6/+7
Update all GID table entries of the netdevice whose MAC address changed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15IB/core: Add default GIDs of the bond master netdevParav Pandit1-29/+59
Currently following issues exist: 1. Default GIDs of the lower (slave) netdevice if the bond netdevice is added. Rather default GID should be of bond master netdevice. 2. Due to this, when failover event occurs FAILOVER event handler attempts to delete the GID of the upper device and tries to add the default GID of the lower device. This is incorrect behavior. To have simple and correct code: (a) Split default GIDs addition out of add_netdev_ips(). This allows easier removal in future if RoCE default GIDs are removed. (b) Add default GIDs of the bond master device by using right filter and callback function. (c) Remove unused function enum_netdev_default_gids(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15IB/core: Consider adding default GIDs of bond deviceParav Pandit1-1/+23
Now that we correctly delete the default GIDs of lower devices during CHANGEUPPER event, add default GIDs of the bonding master device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-15IB/core: Delete lower netdevice default GID entries in bonding scenarioParav Pandit1-9/+62
When NETDEV_CHANGEUPPER event occurs, lower device is not yet established as slave of the master, and when upper device is bond device, default GID entries not deleted. Due to this, when bond device is fully configured, default GID entries of bond device cannot be added as default GID entries are occupied by the lower netdevice. This is incorrect. Default GID entries should really be of bond netdevice because in all RoCE GIDs (default or IP), MAC address of the bond device will be used. It is confusing to have default GID of netdevice which is not really used for any purpose. Therefore, as first step, implement (a) filter function which filters if a CHANGEUPPER event netdevice and associated upper device is master device or not. (b) callback function which deletes the default GIDs of lower (event netdevice). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14IB/core: Avoid confusing del_netdev_default_ipsParav Pandit1-13/+10
Currently bond_delete_netdev_default_gids() is called by two callers. (a) del_netdev_default_ips_join() (b) del_netdev_default_ips() Both above functions changes the argument order while calling bond_delete_netdev_default_gids(). This required silly del_netdev_default_ips() wrapper. Additionally, del_netdev_default_ips() deletes default GIDs not IP based GIDs. del_netdev_default_ips() having _ips suffix is confusing. Therefore, get rid of confusing del_netdev_default_ips() and simplify bond_delete_netdev_default_gids() to follow same argument order as its caller. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-14IB/core: Add comment for change upper netevent handlingParav Pandit1-16/+39
Add comment for handling CHANGEUPPER netevent handling. To improve code readability, (a) move cmd definitions to its respective if-else branches, (b) avoid single line structure definitions. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13IB/ucm: Fix compiling ucm.cJason Gunthorpe1-5/+5
Even though this interface is marked CONFIG_BROKEN we still expect it to compile, at least until we delete it completely. Also mark INFINIBAND_USER_ACCESS_UCM with COMPILE_TEST so these situations can be detected. Fixes: e7ff98aefc9e ("RDMA/cma: Constify path record, ib_cm_event, listen_id pointers") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull locking/atomics update from Thomas Gleixner: "The locking, atomics and memory model brains delivered: - A larger update to the atomics code which reworks the ordering barriers, consolidates the atomic primitives, provides the new atomic64_fetch_add_unless() primitive and cleans up the include hell. - Simplify cmpxchg() instrumentation and add instrumentation for xchg() and cmpxchg_double(). - Updates to the memory model and documentation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) locking/atomics: Rework ordering barriers locking/atomics: Instrument cmpxchg_double*() locking/atomics: Instrument xchg() locking/atomics: Simplify cmpxchg() instrumentation locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation tools/memory-model: Rename litmus tests to comply to norm7 tools/memory-model/Documentation: Fix typo, smb->smp sched/Documentation: Update wake_up() & co. memory-barrier guarantees locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock() sched/core: Use smp_mb() in wake_woken_function() tools/memory-model: Add informal LKMM documentation to MAINTAINERS locking/atomics/Documentation: Describe atomic_set() as a write operation tools/memory-model: Make scripts executable tools/memory-model: Remove ACCESS_ONCE() from model tools/memory-model: Remove ACCESS_ONCE() from recipes locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example MAINTAINERS: Add Daniel Lustig as an LKMM reviewer tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name tools/memory-model: Add litmus test for full multicopy atomicity locking/refcount: Always allow checked forms ...
2018-08-13IB/uverbs: Do not check for device disassociation during ioctlJason Gunthorpe1-28/+13
Now that the ioctl path and uobjects are converted to use uverbs_api, it is now safe to remove the disassociation protection from the common ioctl code. This completes the work to make destroy functions continue to work even after device disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13IB/uverbs: Remove struct uverbs_root_spec and all supporting codeJason Gunthorpe6-742/+2
Everything now uses the uverbs_uapi data structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13IB/uverbs: Use uverbs_api to unmarshal ioctl commandsJason Gunthorpe3-269/+210
Convert the ioctl method syscall path to use the uverbs_api data structures. The new uapi structure includes all the same information, just in a different and more optimal way. - Use attr_bkey instead of 2 level radix trees for everything related to attributes. This includes the attribute storage, presence, and detection of missing mandatory attributes. - Avoid iterating over all attribute storage at finish, instead use find_first_bit with the attr_bkey to locate only those attrs that need cleanup. - Organize things to always run, and always rely on, cleanup. This avoids a bunch of tricky error unwind cases. - Locate the method using the radix tree, and locate the attributes using a very efficient incremental radix tree lookup - Use the precomputed destroy_bkey to handle uobject destruction - Use the precomputed allocation sizes and precomputed 'need_stack' to avoid maths in the fast path. This is optimal if userspace does not pass (many) unsupported attributes. Overall this results in much better codegen for the attribute accessors, everything is now stored in bitmaps or linear arrays indexed by attr_bkey. The compiler can compute attr_bkey values at compile time for all method attributes, meaning things like uverbs_attr_is_valid() now compile into single instruction bit tests. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-13IB/uverbs: Use uverbs_alloc for allocationsJason Gunthorpe1-12/+8
Several handlers need temporary allocations for the life of the method, switch them to use the uverbs_alloc allocator. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-13IB/uverbs: Add a simple allocator to uverbs_attr_bundleJason Gunthorpe1-20/+89
This is similar in spirit to devm, it keeps track of any allocations linked to this method call and ensures they are all freed when the method exits. Further, if there is space in the internal/onstack buffer then the allocator will hand out that memory and avoid an expensive call to kalloc/kfree in the syscall path. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10IB/uverbs: Remove the ib_uverbs_attr pointer from each attrJason Gunthorpe2-35/+64
Memory in the bundle is valuable, do not waste it holding an 8 byte pointer for the rare case of writing to a PTR_OUT. We can compute the pointer by storing a small 1 byte array offset and the base address of the uattr memory in the bundle private memory. This also means we can access the kernel's copy of the ib_uverbs_attr, so drop the copy of flags as well. Since the uattr base should be private bundle information this also de-inlines the already too big uverbs_copy_to inline and moves create_udata into uverbs_ioctl.c so they can see the private struct definition. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10IB/uverbs: Provide implementation private memory for the uverbs_attr_bundleJason Gunthorpe1-55/+57
This already existed as the anonymous 'ctx' structure, but this was not really a useful form. Hoist this struct into bundle_priv and rework the internal things to use it instead. Move a bunch of the processing internal state into the priv and reduce the excessive use of function arguments. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-10IB/uverbs: Use uverbs_api to manage the object type inside the uobjectJason Gunthorpe3-51/+57
Currently the struct uverbs_obj_type stored in the ib_uobject is part of the .rodata segment of the module that defines the object. This is a problem if drivers define new uapi objects as we will be left with a dangling pointer after device disassociation. Switch the uverbs_obj_type for struct uverbs_api_object, which is allocated memory that is part of the uverbs_api and is guaranteed to always exist. Further this moves the 'type_class' into this memory which means access to the IDR/FD function pointers is also guaranteed. Drivers cannot define new types. This makes it safe to continue to use all uobjects, including driver defined ones, after disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-10IB/uverbs: Build the specs into a radix tree at runtimeJason Gunthorpe5-3/+408
This radix tree datastructure is intended to replace the 'hash' structure used today for parsing ioctl methods during system calls. This first commit introduces the structure and builds it from the existing .rodata descriptions. The so-called hash arrangement is actually a 5 level open coded radix tree. This new version uses a 3 level radix tree built using the radix tree library. Overall this is much less code and much easier to build as the radix tree API allows for dynamic modification during the building. There is a small memory penalty to pay for this, but since the radix tree is allocated on a per device basis, a few kb of RAM seems immaterial considering the gained simplicity. The radix tree is similar to the existing tree, but also has a 'attr_bkey' concept, which is a small value'd index for each method attribute. This is used to simplify and improve performance of everything in the next patches. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-08-10IB/uverbs: Have the core code create the uverbs_root_specJason Gunthorpe3-20/+33
There is no reason for drivers to do this, the core code should take of everything. The drivers will provide their information from rodata to describe their modifications to the core's base uapi specification. The core uses this to build up the runtime uapi for each device. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-09IB/uverbs: Fix reading of 32 bit flagsJason Gunthorpe1-1/+1
This is missing a zeroing of the high bits of flags, and is also not correct for big endian machines. Properly zero extend the 32 bit flags into the 64 bit stack variable. Reported-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Fixes: bccd06223f21 ("IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
2018-08-07IB/ucm: Initialize sgid request GID attribute pointerParav Pandit1-4/+1
sgid_attr is uninitialized on the stack, initialize it to NULL. Fixes: 398391071f25 ("IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Allow all DESTROY commands to succeed after disassociateJason Gunthorpe3-14/+62
The disassociate function was broken by design because it failed all commands. This prevents userspace from calling destroy on a uobject after it has detected a device fatal error and thus reclaiming the resources in userspace is prevented. This fix is now straightforward, when anything destroys a uobject that is not the user the object remains on the IDR with a NULL context and object pointer. All lookup locking modes other than DESTROY will fail. When the user ultimately calls the destroy function it is simply dropped from the IDR while any related information is returned. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Do not block disassociate during write()Jason Gunthorpe2-12/+11
Now that all the callbacks are safe to run concurrently with disassociation this test can be eliminated. The ufile core infrastructure becomes entirely self contained and is not sensitive to disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Do not pass struct ib_device to the ioctl methodsJason Gunthorpe7-63/+51
This does the same as the patch before, except for ioctl. The rules are the same, but for the ioctl methods the core code handles setting up the uobject. - Retrieve the ib_dev from the uobject->context->device. This is safe under ioctl as the core has already done rdma_alloc_begin_uobject and so CREATE calls are entirely protected by the rwsem. - Retrieve the ib_dev from uobject->object - Call ib_uverbs_get_ucontext() Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Do not pass struct ib_device to the write based methodsJason Gunthorpe3-83/+80
This is a step to get rid of the global check for disassociation. In this model, the ib_dev is not proven to be valid by the core code and cannot be provided to the method. Instead, every method decides if it is able to run after disassociation and obtains the ib_dev using one of three different approaches: - Call srcu_dereference on the udevice's ib_dev. As before, this means the method cannot be called after disassociation begins. (eg alloc ucontext) - Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext() - Retrieve the ib_dev from the uobject->object after checking under SRCU if disassociation has started (eg uobj_get) Largely, the code is all ready for this, the main work is to provide a ib_dev after calling uobj_alloc(). The few other places simply use ib_uverbs_get_ucontext() to get the ib_dev. This flexibility will let the next patches allow destroy to operate after disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Lower the test for ongoing disassociationJason Gunthorpe1-0/+11
Commands that are reading/writing to objects can test for an ongoing disassociation during their initial call to rdma_lookup_get_uobject. This directly prevents all of these commands from conflicting with an ongoing disassociation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Allow uobject allocation to work concurrently with disassociateJason Gunthorpe1-11/+26
After all the recent structural changes this is now straightforward, hold the hw_destroy_rwsem across the entire uobject creation. We already take this semaphore on the success path, so holding it a bit longer is not going to change the performance. After this change none of the create callbacks require the disassociate_srcu lock to be correct. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Allow RDMA_REMOVE_DESTROY to work concurrently with disassociateJason Gunthorpe3-23/+57
After all the recent structural changes this is now straightfoward, hoist the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around the uobject write lock as well as the destroy. This is necessary as obtaining a write lock concurrently with uverbs_destroy_ufile_hw() will cause malfunction. After this change none of the destroy callbacks require the disassociate_srcu lock to be correct. This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the IOCTL interface needs to hold an unlocked kref until all command verification is completed. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Convert 'bool exclusive' into an enumJason Gunthorpe1-37/+57
This is more readable, and future patches will need a 3rd lookup type. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Consolidate uobject destructionJason Gunthorpe1-129/+122
There are several flows that can destroy a uobject and each one is minimized and sprinkled throughout the code base, making it difficult to understand and very hard to modify the destroy path. Consolidate all of these into uverbs_destroy_uobject() and call it in all cases where a uobject has to be destroyed. This makes one change to the lifecycle, during any abort (eg when alloc_commit is not called) we always call out to alloc_abort, even if remove_commit needs to be called to delete a HW object. This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to clarify its actual usage and revises some of the comments to reflect what the life cycle is for the type implementation. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Make the write path destroy methods use the same flow as ioctlJason Gunthorpe2-86/+44
The ridiculous dance with uobj_remove_commit() is not needed, the write path can follow the same flow as ioctl - lock and destroy the HW object then use the data left over in the uobject to form the response to userspace. Two helpers are introduced to make this flow straightforward for the caller. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-01IB/uverbs: Remove rdma_explicit_destroy() from the ioctl methodsJason Gunthorpe3-22/+32
The core code will destroy the HW object on behalf of the method, if the method provides an implementation it must simply copy data from the stub uobj into the response. Destroy methods cannot touch the HW object. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/core: Prefix _ib to IB/RoCE specific functionsParav Pandit1-18/+19
In rdma cm module, functions which are common between IB and iWarp are named with cma_. iWarp specific functions are prefixed with cma_iw. IB specific functions are perfixed with cma_ib. However some functions in request processing path didn't follow cma_ib notion. Prefix them with _ib for better code clarity. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/core: Simplify gid type check in cma_acquire_dev()Parav Pandit1-9/+3
cma_add_one() initializes the default GID regardless of device type. listen_id is bound to a device and an IP address, its GID type is initialized by cma_acquire_dev(). Therefore a valid default GID type is always available, it is not needed to check port type during cma_acquire_dev(). Initialize gid type of a cm id when the cm_id is created instead of doing conditional checks during cma_acquire_dev() and trying to initialize to 0 during _cma_attach_to_dev(). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/core: Avoid holding lock while initializing fields on stackParav Pandit1-22/+14
In various functions rdma_cm_event is zero initialized on stack using memset() while holding lock which is not necessary. Therefore, don't hold the lock while initializing on stack. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/core: Return bool instead of intParav Pandit1-10/+13
Return bool for following internal and inline functions as their underlying APIs return bool too. 1. cma_zero_addr() 2. cma_loopback_addr() 3. cma_any_addr() 4. ib_addr_any() 5. ib_addr_loopback() While we are touching cma_loopback_addr(), remove extra white spaces in it. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/cma: Get rid of 1 bit booleanParav Pandit1-2/+2
Arrange fields of cma_req_info structure for efficiency on stack and get rid of one bit boolean field. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/cma: Constify path record, ib_cm_event, listen_id pointersParav Pandit1-24/+31
Constify several pointers such as path_rec, ib_cm_event and listen_id pointers in several functions. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/core: Constify dst_addr argumentParav Pandit2-9/+9
Following APIs are not supposed to modify addr or dest_addr contents. Therefore make those function argument const for better code readability. 1. rdma_resolve_ip() 2. rdma_addr_size() 3. rdma_resolve_addr() Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/cma: Simplify rdma_resolve_addr() error flowParav Pandit1-10/+4
Currently dst address is first set and later on cleared on either of the 3 error conditions are met. However none of the APIs or checks are supposed to refer to the destination address of the cm_id. Therefore, set the destination address after necessary checks pass which simplifies the error flow. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/cma: Initialize resource type in __rdma_create_id()Parav Pandit1-1/+1
Currently rdma_cm_id's resource tracking fields such as owner task and kern_name and other non resource tracking fields are initialized in in single function __rdma_create_id(). Therefore, initialize rdma_cm_id's resource type also in same init function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/core: Check for verbs callbacks before using themKamal Heib2-1/+5
Make sure the providers implement the verbs callbacks before calling them, otherwise return -EOPNOTSUPP. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA/core: Remove {create,destroy}_ah from mandatory verbsKamal Heib1-2/+0
{create,destroy}_ah aren't mandatory verbs, because not all providers are implementing them. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>