aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core/uverbs_main.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-01-10IB/core: Increase number of char device minorsHuy Nguyen1-56/+34
There is a need to increase number of possible char devices to support large number of SR-IOV instances. The current limit is in the range of 64-128 devices/ports. Increase it to support up to 1024. The patch performs the following steps to refactor the code: 1. Removes the split bitmap for fixed and overflow dev numbers. 2. Pre-allocates the non-legacy major number range during driver initialization, choosen for simplicity. 3. Add new define (RDMA_MAX_PORTS) that is shared between all drivers. This is the maximum total number of ports on all struct ib_devices. 4. Set RDMA_MAX_PORTS to 1024. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10IB/core: Remove the locking for character device bitmapsHuy Nguyen1-5/+0
Remove the locks that protect character device bitmaps of uverbs, umad and issm. The character device bitmaps are accessed in "client->add" and "client->remove" calls from ib_register_device and ib_unregister_device respectively. These calls are already protected by the "device_mutex" mutex. Thus, the spinlocks are not needed. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-11-13IB/uverbs: Allow CQ moderation with modify CQYonatan Cohen1-0/+1
Uverbs support in modify_cq for CQ moderation only. Gives ability to change cq_max_count and cq_period. CQ moderation enhance performance by moderating the number of CQEs needed to create an event instead of application having to suffer from event per-CQE. To achieve CQ moderation the application needs to set cq_max_count and cq_period. cq_max_count - defines the number of CQEs needed to create an event. cq_period - defines the timeout (micro seconds) between last event and a new one that will occur even if cq_max_count was not satisfied Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/uverbs: clean up INIT_UDATA_BUF_OR_NULL usageArnd Bergmann1-12/+10
We get a harmless warning about the fact that we use the result of a multiplication as a condition: drivers/infiniband/core/uverbs_main.c: In function 'ib_uverbs_write': drivers/infiniband/core/uverbs_main.c:787:40: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:787:117: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:790:50: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:790:151: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] This avoids the problem by using an inline function in place of the macro. Fixes: a96e4e2ffe43 ("IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()") Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://patchwork.kernel.org/patch/9940777/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Expose ioctl interface through experimental KconfigMatan Barak1-0/+6
Add CONFIG_INFINIBAND_EXP_USER_ACCESS that enables the ioctl interface. This interface is experimental and is subject to change. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Assign root to all driversMatan Barak1-0/+18
In order to use the parsing tree, we need to assign the root to all drivers. Currently, we just assign the default parsing tree via ib_uverbs_add_one. The driver could override this by assigning a parsing tree prior to registering the device. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/(core, ulp): Convert register/unregister event handler to be voidLeon Romanovsky1-12/+1
The functions ib_register_event_handler() and ib_unregister_event_handler() always returned success and they can't fail. Let's convert those functions to be void, remove redundant checks and cleanup tons of goto statements. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16IB/uverbs: Fix NULL pointer dereference during device removalMaor Gottlieb1-1/+1
As part of ib_uverbs_remove_one which might be triggered upon reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace application. If device was removed after uverbs fd was opened but before ib_uverbs_get_context was called, the event file will be accessed before it was allocated, result in NULL pointer dereference: [ 72.325873] BUG: unable to handle kernel NULL pointer dereference at (null) ... [ 72.325984] IP: _raw_spin_lock_irqsave+0x22/0x40 [ 72.327123] Call Trace: [ 72.327168] ib_uverbs_async_handler.isra.8+0x2e/0x160 [ib_uverbs] [ 72.327216] ? synchronize_srcu_expedited+0x27/0x30 [ 72.327269] ib_uverbs_remove_one+0x120/0x2c0 [ib_uverbs] [ 72.327330] ib_unregister_device+0xd0/0x180 [ib_core] [ 72.327373] mlx5_ib_remove+0x74/0x140 [mlx5_ib] [ 72.327422] mlx5_remove_device+0xfb/0x110 [mlx5_core] [ 72.327466] mlx5_unregister_interface+0x3c/0xa0 [mlx5_core] [ 72.327509] mlx5_ib_cleanup+0x10/0x962 [mlx5_ib] [ 72.327546] SyS_delete_module+0x155/0x230 [ 72.328472] ? exit_to_usermode_loop+0x70/0xa6 [ 72.329370] do_syscall_64+0x54/0xc0 [ 72.330262] entry_SYSCALL64_slow_path+0x25/0x25 Fix it by checking that user context was allocated before trigger the event. Fixes: 036b10635739 ('IB/uverbs: Enable device removal when there are active user space applications') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04IB/uverbs: Fix device cleanupYishai Hadas1-2/+1
Uverbs device should be cleaned up only when there is no potential usage of. As part of ib_uverbs_remove_one which might be triggered upon reset flow the device reference count is decreased as expected and leave the final cleanup to the FDs that were opened. Current code increases reference count upon opening a new command FD and decreases it upon closing the file. The event FD is opened internally and rely on the command FD by taking on it a reference count. In case that the command FD was closed and just later the event FD we may ensure that the device resources as of srcu are still alive as they are still in use. Fixing the above by moving the reference count decreasing to the place where the command FD is really freed instead of doing that when it was just closed. fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04Merge tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds1-1/+1
Pull char/misc driver updates from Greg KH: "Here is the big set of new char/misc driver drivers and features for 4.12-rc1. There's lots of new drivers added this time around, new firmware drivers from Google, more auxdisplay drivers, extcon drivers, fpga drivers, and a bunch of other driver updates. Nothing major, except if you happen to have the hardware for these drivers, and then you will be happy :) All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) firmware: google memconsole: Fix return value check in platform_memconsole_init() firmware: Google VPD: Fix return value check in vpd_platform_init() goldfish_pipe: fix build warning about using too much stack. goldfish_pipe: An implementation of more parallel pipe fpga fr br: update supported version numbers fpga: region: release FPGA region reference in error path fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe() mei: drop the TODO from samples firmware: Google VPD sysfs driver firmware: Google VPD: import lib_vpd source files misc: lkdtm: Add volatile to intentional NULL pointer reference eeprom: idt_89hpesx: Add OF device ID table misc: ds1682: Add OF device ID table misc: tsl2550: Add OF device ID table w1: Remove unneeded use of assert() and remove w1_log.h w1: Use kernel common min() implementation uio_mf624: Align memory regions to page size and set correct offsets uio_mf624: Refactor memory info initialization uio: Allow handling of non page-aligned memory regions hangcheck-timer: Fix typo in comment ...
2017-04-20IB/core: Rename uverbs event file structureMatan Barak1-66/+66
Previously, ib_uverbs_event_file was suffixed by _file as it contained the actual file information. Since it's now only used as base struct for ib_uverbs_async_event_file and ib_uverbs_completion_event_file, we change its name to ib_uverbs_event_queue. This represents its logical role better. Fixes: 1e7710f3f656 ('IB/core: Change completion channel to use the reworked objects schema') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/core: Don't use is_async in event files to infer events sizeMatan Barak1-9/+5
Previously, we inferred the events size in ib_uverbs_event_read by using the is_async flag. Instead of that, we pass the event size directly. Fixes: 1e7710f3f656 ('IB/core: Change completion channel to use the reworked objects schema') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Change completion channel to use the reworked objects schemaMatan Barak1-118/+161
This patch adds the standard fd based type - completion_channel. The completion_channel is now prefixed with ib_uobject, similarly to the rest of the uobjects. This requires a few changes: (1) We define a new completion channel fd based object type. (2) completion_event and async_event are now two different types. This means they use different fops. (3) We release the completion_channel exactly as we release other idr based objects. (4) Since ib_uobjects are already kref-ed, we only add the kref to the async event. A fd object requires filling out several parameters. Its op pointer should point to uverbs_fd_ops and its size should be at least the size if ib_uobject. We use a macro to make the type declaration easier. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Add support for fd objectsMatan Barak1-1/+3
The completion channel we use in verbs infrastructure is FD based. Previously, we had a separate way to manage this object. Since we strive for a single way to manage any kind of object in this infrastructure, we conceptually treat all objects as subclasses of ib_uobject. This commit adds the necessary mechanism to support FD based objects like their IDR counterparts. FD objects release need to be synchronized with context release. We use the cleanup_mutex on the uverbs_file for that. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Change idr objects to use the new schemaMatan Barak1-135/+7
This changes only the handlers which deals with idr based objects to use the new idr allocation, fetching and destruction schema. This patch consists of the following changes: (1) Allocation, fetching and destruction is done via idr ops. (2) Context initializing and release is done through uverbs_initialize_ucontext and uverbs_cleanup_ucontext. (3) Ditching the live flag. Mostly, this is pretty straight forward. The only place that is a bit trickier is in ib_uverbs_open_qp. Commit [1] added code to check whether the uobject is already live and initialized. This mostly happens because of a race between open_qp and events. We delayed assigning the uobject's pointer in order to eliminate this race without using the live variable. [1] commit a040f95dc819 ("IB/core: Fix XRC race condition in ib_uverbs_open_qp") Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Add idr based standard typesMatan Barak1-3/+5
This patch adds the standard idr based types. These types are used in downstream patches in order to initialize, destroy and lookup IB standard objects which are based on idr objects. An idr object requires filling out several parameters. Its op pointer should point to uverbs_idr_ops and its size should be at least the size of ib_uobject. We add a macro to make the type declaration easier. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Refactor idr to be per uverbs_fileMatan Barak1-31/+14
The current code creates an idr per type. Since types are currently common for all drivers and known in advance, this was good enough. However, the proposed ioctl based infrastructure allows each driver to declare only some of the common types and declare its own specific types. Thus, we decided to implement idr to be per uverbs_file. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-21infiniband: utilize the new cdev_set_parent functionLogan Gunthorpe1-1/+1
This replaces the suspect looking cdev.kobj.parent lines with the equivalent cdev_set_parent function. This is a straightforward change that's largely cosmetic but it does push the kobj.parent ownership into char_dev.c where it belongs. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-27Merge branch 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds1-0/+20
Pull cgroup updates from Tejun Heo: "Several noteworthy changes. - Parav's rdma controller is finally merged. It is very straight forward and can limit the abosolute numbers of common rdma constructs used by different cgroups. - kernel/cgroup.c got too chubby and disorganized. Created kernel/cgroup/ subdirectory and moved all cgroup related files under kernel/ there and reorganized the core code. This hurts for backporting patches but was long overdue. - cgroup v2 process listing reimplemented so that it no longer depends on allocating a buffer large enough to cache the entire result to sort and uniq the output. v2 has always mangled the sort order to ensure that users don't depend on the sorted output, so this shouldn't surprise anybody. This makes the pid listing functions use the same iterators that are used internally, which have to have the same iterating capabilities anyway. - perf cgroup filtering now works automatically on cgroup v2. This patch was posted a long time ago but somehow fell through the cracks. - misc fixes asnd documentation updates" * 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits) kernfs: fix locking around kernfs_ops->release() callback cgroup: drop the matching uid requirement on migration for cgroup v2 cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy cgroup: misc cleanups cgroup: call subsys->*attach() only for subsystems which are actually affected by migration cgroup: track migration context in cgroup_mgctx cgroup: cosmetic update to cgroup_taskset_add() rdmacg: Fixed uninitialized current resource usage cgroup: Add missing cgroup-v2 PID controller documentation. rdmacg: Added documentation for rdmacg IB/core: added support to use rdma cgroup controller rdmacg: Added rdma cgroup controller cgroup: fix a comment typo cgroup: fix RCU related sparse warnings cgroup: move namespace code to kernel/cgroup/namespace.c cgroup: rename functions for consistency cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c cgroup: separate out cgroup1_kf_syscall_ops cgroup: refactor mount path and clearly distinguish v1 and v2 paths cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c ...
2017-01-24IB/core: Use dev.parent instead of dma_deviceBart Van Assche1-1/+1
Prepare for removal of ib_device.dma_device. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10IB/core: added support to use rdma cgroup controllerParav Pandit1-0/+20
Added support APIs for IB core to register/unregister every IB/RDMA device with rdma cgroup for tracking rdma resources. IB core registers with rdma cgroup controller. Added support APIs for uverbs layer to make use of rdma controller. Added uverbs layer to perform resource charge/uncharge functionality. Added support during query_device uverb operation to ensure it returns resource limits by honoring rdma cgroup configured limits. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-1/+5
Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...
2016-12-14Merge branch 'mlx' into merge-testDoug Ledford1-0/+1
2016-12-13IB/uverbs: Extend modify_qp and support packet pacingBodong Wang1-0/+1
An new uverbs command ib_uverbs_ex_modify_qp is added to support more QP attributes. User driver should choose to call the legacy/extended API based on input mask. IB_USER_LAST_QP_ATTR_MASK is added to indicated the maximum bit position which supports legacy ib_uverbs_modify_qp. IB_USER_LEGACY_LAST_QP_ATTR_MASK indicates the maximum bit position which supports ib_uverbs_ex_modify_qp, the value of this mask should be updated if new mask is added later. Along with this change, rate_limit is supported by the extended command, user driver could use it to control packet packing. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03infiniband: remove WARN that is not kernel bugLeon Romanovsky1-1/+4
On Mon, Nov 21, 2016 at 09:52:53AM -0700, Jason Gunthorpe wrote: > On Mon, Nov 21, 2016 at 02:14:08PM +0200, Leon Romanovsky wrote: > > > > > > In ib_ucm_write function there is a wrong prefix: > > > > > > + pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n", > > > > I did it intentionally to have the same errors for all flows. > > Lets actually use a good message too please? > > pr_err_once("ucm_write: process %d (%s) changed security contexts after opening FD, this is not allowed.\n", > > Jason >From 70f95b2d35aea42e5b97e7d27ab2f4e8effcbe67 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky <leonro@mellanox.com> Date: Mon, 21 Nov 2016 13:30:59 +0200 Subject: [PATCH rdma-next V2] IB/{core, qib}: Remove WARN that is not kernel bug WARNINGs mean kernel bugs, in this case, they are placed to mark programming errors and/or malicious attempts. BUG/WARNs that are not kernel bugs hinder automated testing efforts. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/uverbs: Fix leak of XRC target QPsTariq Toukan1-5/+2
The real QP is destroyed in case of the ref count reaches zero, but for XRC target QPs this call was missed and caused to QP leaks. Let's call to destroy for all flows. Fixes: 0e0ec7e0638e ('RDMA/core: Export ib_open_qp() to share XRC...') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford1-13/+24
2016-08-03IB/uverbs: Fix race between uverbs_close and remove_oneJason Gunthorpe1-13/+24
Fixes an oops that might happen if uverbs_close races with remove_one. Both contexts may run ib_uverbs_cleanup_ucontext, it depends on the flow. Currently, there is no protection for a case that remove_one didn't make the cleanup it runs to its end, the underlying ib_device was freed then uverbs_close will call ib_uverbs_cleanup_ucontext and OOPs. Above might happen if uverbs_close deleted the file from the list then remove_one didn't find it and runs to its end. Fixes to protect against that case by a new cleanup lock so that ib_uverbs_cleanup_ucontext will be called always before that remove_one is ended. Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one") Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Introduce RWQ Indirection tableYishai Hadas1-0/+13
User applications that want to spread traffic on several WQs, need to create an indirection table, by using already created WQs. Adding uverbs API in order to create and destroy this table. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Add WQ supportYishai Hadas1-0/+25
User space applications which use RSS functionality need to create a work queue object (WQ). The lifetime of such an object is: * Create a WQ * Modify the WQ from reset to init state. * Use the WQ (by downstream patches). * Destroy the WQ. These commands are added to the uverbs API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/security: Restrict use of the write() interfaceJason Gunthorpe1-0/+5
The drivers/infiniband stack uses write() as a replacement for bi-directional ioctl(). This is not safe. There are ways to trigger write calls that result in the return structure that is normally written to user space being shunted off to user specified kernel memory instead. For the immediate repair, detect and deny suspicious accesses to the write API. For long term, update the user space libraries and the kernel API to something that doesn't present the same security vulnerabilities (likely a structured ioctl() interface). The impacted uAPI interfaces are generally only available if hardware from drivers/infiniband is installed in the system. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [ Expanded check to all known write() entry points ] Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03IB/core: trivial prink cleanup.Parav Pandit1-5/+5
1. Replaced printk with appropriate pr_warn, pr_err, pr_info. 2. Removed unnecessary prints around memory allocation failure which are not required, as reported by the checkpatch script. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03IB/core: Modify conditional on ucontext existenceEli Cohen1-6/+6
Since we allow to call legacy verbs using their extended counterpart, the check on ucontext has to move up to a common area in case this verb is ever extended. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03IB/core: IB/core: Allow legacy verbs through extended interfacesEli Cohen1-10/+19
When an extended verb is an extension to a legacy verb, the original functionality is preserved. Hence we do not require each hardware driver to set the extended capability. This will allow the use of the extended verb in its simple form with drivers that do not support the extended capability. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-03IB/core: Avoid duplicate codeEli Cohen1-20/+9
Move the check on the validity of the command to a common area. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB: remove in-kernel support for memory windowsChristoph Hellwig1-1/+12
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21IB/core: Extend ib_uverbs_create_qpEran Ben Elisha1-0/+1
ib_uverbs_ex_create_qp follows the extension verbs mechanism. New features (for example, QP creation flags field which is added in a downstream patch) could used via user-space libraries without breaking the ABI. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/uverbs: Enable device removal when there are active user space applicationsYishai Hadas1-67/+293
Enables the uverbs_remove_one to succeed despite the fact that there are running IB applications working with the given ib device. This functionality enables a HW device to be unbind/reset despite the fact that there are running user space applications using it. It exposes a new IB kernel API named 'disassociate_ucontext' which lets a driver detaching its HW resources from a given user context without crashing/terminating the application. In case a driver implemented the above API and registered with ib_uverb there will be no dependency between its device to its uverbs_device. Upon calling remove_one of ib_uverbs the call should return after disassociating the open HW resources without waiting to clients disconnecting. In case driver didn't implement this API there will be no change to current behaviour and uverbs_remove_one will return only when last client has disconnected and reference count on uverbs device became 0. In case the lower driver device was removed any application will continue working over some zombie HCA, further calls will ended with an immediate error. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/uverbs: Explicitly pass ib_dev to uverbs commandsYishai Hadas1-6/+15
Done in preparation for deploying RCU for the device removal flow. Allows isolating the RCU handling to the uverb_main layer and keeping the uverbs_cmd code as is. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/uverbs: Fix race between ib_uverbs_open and remove_oneYishai Hadas1-13/+30
Fixes: 2a72f212263701b927559f6850446421d5906c41 ("IB/uverbs: Remove dev_table") Before this commit there was a device look-up table that was protected by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When it was dropped and container_of was used instead, it enabled the race with remove_one as dev might be freed just after: dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but before the kref_get. In addition, this buggy patch added some dead code as container_of(x,y,z) can never be NULL and so dev can never be NULL. As a result the comment above ib_uverbs_open saying "the open method will either immediately run -ENXIO" is wrong as it can never happen. The solution follows Jason Gunthorpe suggestion from below URL: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html cdev will hold a kref on the parent (the containing structure, ib_uverbs_device) and only when that kref is released it is guaranteed that open will never be called again. In addition, fixes the active count scheme to use an atomic not a kref to prevent WARN_ON as pointed by above comment from Jason. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/uverbs: Fix reference counting usage of event filesYishai Hadas1-6/+38
Fix the reference counting usage to be handled in the event file creation/destruction function, instead of being done by the caller. This is done for both async/non-async event files. Based on Jason Gunthorpe report at https://www.mail-archive.com/ linux-rdma@vger.kernel.org/msg24680.html: "The existing code for this is broken, in ib_uverbs_get_context all the error paths between ib_uverbs_alloc_event_file and the kref_get(file->ref) are wrong - this will result in fput() which will call ib_uverbs_event_close, which will try to do kref_put and ib_unregister_event_handler - which are no longer paired." Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: lock client data with lists_rwsemHaggai Eran1-3/+3
An ib_client callback that is called with the lists_rwsem locked only for read is protected from changes to the IB client lists, but not from ib_unregister_device() freeing its client data. This is because ib_unregister_device() will remove the device from the device list with lists_rwsem locked for write, but perform the rest of the cleanup, including the call to remove() without that lock. Mark client data that is undergoing de-registration with a new going_down flag in the client data context. Lock the client data list with lists_rwsem for write in addition to using the spinlock, so that functions calling the callback would be able to lock only lists_rwsem for read and let callbacks sleep. Since ib_unregister_client() now marks the client data context, no need for remove() to search the context again, so pass the client data directly to remove() callbacks. Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Extend ib_uverbs_create_cqMatan Barak1-0/+1
ib_uverbs_ex_create_cq follows the extension verbs mechanism. New features (for example, CQ creation flags field which is added in a downstream patch) could used via user-space libraries without breaking the ABI. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15ib_uverbs: Fix pages leak when using XRC SRQsSébastien Dugué1-11/+11
Hello, When an application using XRCs abruptly terminates, the mmaped pages of the CQ buffers are leaked. This comes from the fact that when resources are released in ib_uverbs_cleanup_ucontext(), we fail to release the CQs because their refcount is not 0. When creating an XRC SRQ, we increment the associated CQ refcount. This refcount is only decremented when the SRQ is released. Therefore we need to release the SRQs prior to the CQs to make sure that all references to the CQs are gone before trying to release these. Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-02-18IB/core: Add support for extended query device capsEli Cohen1-0/+1
Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Following the discussion about this patch [1], the code now validates the command's comp_mask is zero, returning -EINVAL for unknown values, in order to allow extending the verb in the future. The verb also checks the user-space provided response buffer size and only fills in capabilities that will fit in the buffer. In attempt to follow the spirit of presentation [2] by Tzahi Oved that was presented during OpenFabrics Alliance International Developer Workshop 2013, the comp_mask bits will only describe which fields are valid. Furthermore, fields that can simply be cleared when they are not supported, do not require a comp_mask bit at all. The verb returns a response_length field containing the actual number of bytes written by the kernel, so that a newer version running on an older kernel can tell which fields were actually returned. [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19 http://thread.gmane.org/gmane.linux.kernel.api/7889/ [2] https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-03IB/core: Temporarily disable ex_query_device uverbHaggai Eran1-1/+0
Commit 5a77abf9a97a ("IB/core: Add support for extended query device caps") added a new extended verb to query the capabilities of RDMA devices, but the semantics of this verb are still under debate [1]. Don't expose this verb to userspace until the ABI is nailed down. [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19 http://www.spinics.net/lists/linux-rdma/msg22904.html Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/core: Add support for on demand paging regionsShachar Raindel1-0/+2
* Extend the umem struct to keep the ODP related data. * Allocate and initialize the ODP related information in the umem (page_list, dma_list) and freeing as needed in the end of the run. * Store a reference to the process PID struct in the ucontext. Used to safely obtain the task_struct and the mm during fault handling, without preventing the task destruction if needed. * Add 2 helper functions: ib_umem_odp_map_dma_pages and ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses of specific pages of the umem (and, currently, pin them). * Support for page faults only - IB core will keep the reference on the pages used and call put_page when freeing an ODP umem area. Invalidations support will be added in a later patch. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/core: Add support for extended query device capsEli Cohen1-1/+2
Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-14Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-nextRoland Dreier1-0/+1