aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core/uverbs_main.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-02-22RDMA/uverbs: Reduce number of command header flags checksLeon Romanovsky1-9/+2
Simplify the code by directly checking the availability of extended command flog instead of doing multiple shift operations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Replace user's types with kernel's typesLeon Romanovsky1-5/+5
The internal to kernel variable declarations don't need to be declared with user types. This patch converts such occurrences appeared in ib_uverbs_write(). Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Refactor the header validation logicLeon Romanovsky1-43/+47
Move all header validation logic to be performed before SRCU read lock. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMa/uverbs: Copy ex_hdr outside of SRCU read lockLeon Romanovsky1-7/+6
The SRCU read lock protects the IB device pointer and doesn't need to be called before copying user provided header. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Move uncontext check before SRCU read lockLeon Romanovsky1-11/+4
There is no need to take SRCU lock before checking file->ucontext, so move it do it before it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Properly check command supported maskLeon Romanovsky1-12/+6
The check based on index is not sufficient because IB_USER_VERBS_EX_CMD_CREATE_CQ = IB_USER_VERBS_CMD_CREATE_CQ and IB_USER_VERBS_CMD_CREATE_CQ <= IB_USER_VERBS_CMD_OPEN_QP, so if we execute IB_USER_VERBS_EX_CMD_CREATE_CQ this code checks ib_dev->uverbs_cmd_mask not ib_dev->uverbs_ex_cmd_mask. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Refactor command header processingLeon Romanovsky1-30/+32
Move all command header processing into separate function and perform those checks before acquiring SRCU read lock. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Unify return values of not supported commandLeon Romanovsky1-12/+4
The non-existing command is supposed to return -EOPNOTSUPP, but the current code returns different errors for different flows for the same failure. This patch unifies those flows. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Return not supported error code for unsupported commandsLeon Romanovsky1-1/+1
Command that doesn't exist means that it is not supported, so update code to return -EOPNOTSUPP in case of failure. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Fail as early as possible if not enough header data was providedLeon Romanovsky1-6/+7
Fail as early as possible if not enough header data was provided. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Refactor flags checks and update return valueLeon Romanovsky1-4/+6
Since commit f21519b23c1b ("IB/core: extended command: an improved infrastructure for uverbs commands"), the uverbs supports extra flags as an input to the command interface. However actually, there is only one flag available and used, so it is better to refactor the code, so the resolution and report to the users is done as early as possible. As part of this change, we changed the return value of failure case from ENOSYS to be EINVAL to be consistent with the rest flags checks. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Update sizeof usersLeon Romanovsky1-5/+5
Update sizeof() users to be consistent with coding style. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-22RDMA/uverbs: Convert command mask validity check function to be boolLeon Romanovsky1-4/+4
The function validate_command_mask() returns only two results: success or failure, so convert it to return bool instead of 0 and -1. Reported-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-15RDMA/uverbs: Protect from command mask overflowLeon Romanovsky1-7/+20
The command number is not bounds checked against the command mask before it is shifted, resulting in an ubsan hit. This does not cause malfunction since the command number is eventually bounds checked, but we can make this ubsan clean by moving the bounds check to before the mask check. ================================================================================ UBSAN: Undefined behaviour in drivers/infiniband/core/uverbs_main.c:647:21 shift exponent 207 is too large for 64-bit type 'long long unsigned int' CPU: 0 PID: 446 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #61 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ubsan_epilogue+0xe/0x81 __ubsan_handle_shift_out_of_bounds+0x293/0x2f7 ? debug_check_no_locks_freed+0x340/0x340 ? __ubsan_handle_load_invalid_value+0x19b/0x19b ? lock_acquire+0x440/0x440 ? lock_acquire+0x19d/0x440 ? __might_fault+0xf4/0x240 ? ib_uverbs_write+0x68d/0xe20 ib_uverbs_write+0x68d/0xe20 ? __lock_acquire+0xcf7/0x3940 ? uverbs_devnode+0x110/0x110 ? cyc2ns_read_end+0x10/0x10 ? sched_clock_cpu+0x18/0x200 ? sched_clock_cpu+0x18/0x200 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? __fget+0x35b/0x5d0 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0x85 RIP: 0033:0x448e29 RSP: 002b:00007f033f567c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007f033f5686bc RCX: 0000000000448e29 RDX: 0000000000000060 RSI: 0000000020001000 RDI: 0000000000000012 RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000056a0 R14: 00000000006e8740 R15: 0000000000000000 ================================================================================ Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 4.5 Fixes: 2dbd5186a39c ("IB/core: IB/core: Allow legacy verbs through extended interfaces") Reported-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-15IB/uverbs: Add ioctl support for 32bit processesMatan Barak1-0/+2
32 bit processes running on a 64 bit kernel call compat_ioctl so that implementations can revise any structure layout issues. Point compat_ioctl at our normal ioctl because: - All our structures are designed to be the same on 32 and 64 bit, ie we use __aligned_u64 when required and are careful to manage padding. - Any pointers are stored in u64's and userspace is expected to prepare them properly. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-11vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds1-1/+1
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-61/+34
Pull RDMA subsystem updates from Jason Gunthorpe: "Overall this cycle did not have any major excitement, and did not require any shared branch with netdev. Lots of driver updates, particularly of the scale-up and performance variety. The largest body of core work was Parav's patches fixing and restructing some of the core code to make way for future RDMA containerization. Summary: - misc small driver fixups to bnxt_re/hfi1/qib/hns/ocrdma/rdmavt/vmw_pvrdma/nes - several major feature adds to bnxt_re driver: SRIOV VF RoCE support, HugePages support, extended hardware stats support, and SRQ support - a notable number of fixes to the i40iw driver from debugging scale up testing - more work to enable the new hip08 chip in the hns driver - misc small ULP fixups to srp/srpt//ipoib - preparation for srp initiator and target to support the RDMA-CM protocol for connections - add RDMA-CM support to srp initiator, srp target is still a WIP - fixes for a couple of places where ipoib could spam the dmesg log - fix encode/decode of FDR/EDR data rates in the core - many patches from Parav with ongoing work to clean up inconsistencies and bugs in RoCE support around the rdma_cm - mlx5 driver support for the userspace features 'thread domain', 'wallclock timestamps' and 'DV Direct Connected transport'. Support for the firmware dual port rocee capability - core support for more than 32 rdma devices in the char dev allocation - kernel doc updates from Randy Dunlap - new netlink uAPI for inspecting RDMA objects similar in spirit to 'ss' - one minor change to the kobject code acked by Greg KH" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (259 commits) RDMA/nldev: Provide detailed QP information RDMA/nldev: Provide global resource utilization RDMA/core: Add resource tracking for create and destroy PDs RDMA/core: Add resource tracking for create and destroy CQs RDMA/core: Add resource tracking for create and destroy QPs RDMA/restrack: Add general infrastructure to track RDMA resources RDMA/core: Save kernel caller name when creating PD and CQ objects RDMA/core: Use the MODNAME instead of the function name for pd callers RDMA: Move enum ib_cq_creation_flags to uapi headers IB/rxe: Change RDMA_RXE kconfig to use select IB/qib: remove qib_keys.c IB/mthca: remove mthca_user.h RDMA/cm: Fix access to uninitialized variable RDMA/cma: Use existing netif_is_bond_master function IB/core: Avoid SGID attributes query while converting GID from OPA to IB RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure IB/umad: Fix use of unprotected device pointer IB/iser: Combine substrings for three messages IB/iser: Delete an unnecessary variable initialisation in iser_send_data_out() IB/iser: Delete an error message for a failed memory allocation in iser_send_data_out() ...
2018-01-10IB/core: Increase number of char device minorsHuy Nguyen1-56/+34
There is a need to increase number of possible char devices to support large number of SR-IOV instances. The current limit is in the range of 64-128 devices/ports. Increase it to support up to 1024. The patch performs the following steps to refactor the code: 1. Removes the split bitmap for fixed and overflow dev numbers. 2. Pre-allocates the non-legacy major number range during driver initialization, choosen for simplicity. 3. Add new define (RDMA_MAX_PORTS) that is shared between all drivers. This is the maximum total number of ports on all struct ib_devices. 4. Set RDMA_MAX_PORTS to 1024. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-10IB/core: Remove the locking for character device bitmapsHuy Nguyen1-5/+0
Remove the locks that protect character device bitmaps of uverbs, umad and issm. The character device bitmaps are accessed in "client->add" and "client->remove" calls from ib_register_device and ib_unregister_device respectively. These calls are already protected by the "device_mutex" mutex. Thus, the spinlocks are not needed. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-11-28the rest of drivers/*: annotate ->poll() instancesAl Viro1-4/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-13IB/uverbs: Allow CQ moderation with modify CQYonatan Cohen1-0/+1
Uverbs support in modify_cq for CQ moderation only. Gives ability to change cq_max_count and cq_period. CQ moderation enhance performance by moderating the number of CQEs needed to create an event instead of application having to suffer from event per-CQE. To achieve CQ moderation the application needs to set cq_max_count and cq_period. cq_max_count - defines the number of CQEs needed to create an event. cq_period - defines the timeout (micro seconds) between last event and a new one that will occur even if cq_max_count was not satisfied Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/uverbs: clean up INIT_UDATA_BUF_OR_NULL usageArnd Bergmann1-12/+10
We get a harmless warning about the fact that we use the result of a multiplication as a condition: drivers/infiniband/core/uverbs_main.c: In function 'ib_uverbs_write': drivers/infiniband/core/uverbs_main.c:787:40: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:787:117: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:790:50: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:790:151: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] This avoids the problem by using an inline function in place of the macro. Fixes: a96e4e2ffe43 ("IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()") Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://patchwork.kernel.org/patch/9940777/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Expose ioctl interface through experimental KconfigMatan Barak1-0/+6
Add CONFIG_INFINIBAND_EXP_USER_ACCESS that enables the ioctl interface. This interface is experimental and is subject to change. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Assign root to all driversMatan Barak1-0/+18
In order to use the parsing tree, we need to assign the root to all drivers. Currently, we just assign the default parsing tree via ib_uverbs_add_one. The driver could override this by assigning a parsing tree prior to registering the device. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/(core, ulp): Convert register/unregister event handler to be voidLeon Romanovsky1-12/+1
The functions ib_register_event_handler() and ib_unregister_event_handler() always returned success and they can't fail. Let's convert those functions to be void, remove redundant checks and cleanup tons of goto statements. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16IB/uverbs: Fix NULL pointer dereference during device removalMaor Gottlieb1-1/+1
As part of ib_uverbs_remove_one which might be triggered upon reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace application. If device was removed after uverbs fd was opened but before ib_uverbs_get_context was called, the event file will be accessed before it was allocated, result in NULL pointer dereference: [ 72.325873] BUG: unable to handle kernel NULL pointer dereference at (null) ... [ 72.325984] IP: _raw_spin_lock_irqsave+0x22/0x40 [ 72.327123] Call Trace: [ 72.327168] ib_uverbs_async_handler.isra.8+0x2e/0x160 [ib_uverbs] [ 72.327216] ? synchronize_srcu_expedited+0x27/0x30 [ 72.327269] ib_uverbs_remove_one+0x120/0x2c0 [ib_uverbs] [ 72.327330] ib_unregister_device+0xd0/0x180 [ib_core] [ 72.327373] mlx5_ib_remove+0x74/0x140 [mlx5_ib] [ 72.327422] mlx5_remove_device+0xfb/0x110 [mlx5_core] [ 72.327466] mlx5_unregister_interface+0x3c/0xa0 [mlx5_core] [ 72.327509] mlx5_ib_cleanup+0x10/0x962 [mlx5_ib] [ 72.327546] SyS_delete_module+0x155/0x230 [ 72.328472] ? exit_to_usermode_loop+0x70/0xa6 [ 72.329370] do_syscall_64+0x54/0xc0 [ 72.330262] entry_SYSCALL64_slow_path+0x25/0x25 Fix it by checking that user context was allocated before trigger the event. Fixes: 036b10635739 ('IB/uverbs: Enable device removal when there are active user space applications') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04IB/uverbs: Fix device cleanupYishai Hadas1-2/+1
Uverbs device should be cleaned up only when there is no potential usage of. As part of ib_uverbs_remove_one which might be triggered upon reset flow the device reference count is decreased as expected and leave the final cleanup to the FDs that were opened. Current code increases reference count upon opening a new command FD and decreases it upon closing the file. The event FD is opened internally and rely on the command FD by taking on it a reference count. In case that the command FD was closed and just later the event FD we may ensure that the device resources as of srcu are still alive as they are still in use. Fixing the above by moving the reference count decreasing to the place where the command FD is really freed instead of doing that when it was just closed. fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04Merge tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds1-1/+1
Pull char/misc driver updates from Greg KH: "Here is the big set of new char/misc driver drivers and features for 4.12-rc1. There's lots of new drivers added this time around, new firmware drivers from Google, more auxdisplay drivers, extcon drivers, fpga drivers, and a bunch of other driver updates. Nothing major, except if you happen to have the hardware for these drivers, and then you will be happy :) All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) firmware: google memconsole: Fix return value check in platform_memconsole_init() firmware: Google VPD: Fix return value check in vpd_platform_init() goldfish_pipe: fix build warning about using too much stack. goldfish_pipe: An implementation of more parallel pipe fpga fr br: update supported version numbers fpga: region: release FPGA region reference in error path fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe() mei: drop the TODO from samples firmware: Google VPD sysfs driver firmware: Google VPD: import lib_vpd source files misc: lkdtm: Add volatile to intentional NULL pointer reference eeprom: idt_89hpesx: Add OF device ID table misc: ds1682: Add OF device ID table misc: tsl2550: Add OF device ID table w1: Remove unneeded use of assert() and remove w1_log.h w1: Use kernel common min() implementation uio_mf624: Align memory regions to page size and set correct offsets uio_mf624: Refactor memory info initialization uio: Allow handling of non page-aligned memory regions hangcheck-timer: Fix typo in comment ...
2017-04-20IB/core: Rename uverbs event file structureMatan Barak1-66/+66
Previously, ib_uverbs_event_file was suffixed by _file as it contained the actual file information. Since it's now only used as base struct for ib_uverbs_async_event_file and ib_uverbs_completion_event_file, we change its name to ib_uverbs_event_queue. This represents its logical role better. Fixes: 1e7710f3f656 ('IB/core: Change completion channel to use the reworked objects schema') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-20IB/core: Don't use is_async in event files to infer events sizeMatan Barak1-9/+5
Previously, we inferred the events size in ib_uverbs_event_read by using the is_async flag. Instead of that, we pass the event size directly. Fixes: 1e7710f3f656 ('IB/core: Change completion channel to use the reworked objects schema') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Change completion channel to use the reworked objects schemaMatan Barak1-118/+161
This patch adds the standard fd based type - completion_channel. The completion_channel is now prefixed with ib_uobject, similarly to the rest of the uobjects. This requires a few changes: (1) We define a new completion channel fd based object type. (2) completion_event and async_event are now two different types. This means they use different fops. (3) We release the completion_channel exactly as we release other idr based objects. (4) Since ib_uobjects are already kref-ed, we only add the kref to the async event. A fd object requires filling out several parameters. Its op pointer should point to uverbs_fd_ops and its size should be at least the size if ib_uobject. We use a macro to make the type declaration easier. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Add support for fd objectsMatan Barak1-1/+3
The completion channel we use in verbs infrastructure is FD based. Previously, we had a separate way to manage this object. Since we strive for a single way to manage any kind of object in this infrastructure, we conceptually treat all objects as subclasses of ib_uobject. This commit adds the necessary mechanism to support FD based objects like their IDR counterparts. FD objects release need to be synchronized with context release. We use the cleanup_mutex on the uverbs_file for that. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Change idr objects to use the new schemaMatan Barak1-135/+7
This changes only the handlers which deals with idr based objects to use the new idr allocation, fetching and destruction schema. This patch consists of the following changes: (1) Allocation, fetching and destruction is done via idr ops. (2) Context initializing and release is done through uverbs_initialize_ucontext and uverbs_cleanup_ucontext. (3) Ditching the live flag. Mostly, this is pretty straight forward. The only place that is a bit trickier is in ib_uverbs_open_qp. Commit [1] added code to check whether the uobject is already live and initialized. This mostly happens because of a race between open_qp and events. We delayed assigning the uobject's pointer in order to eliminate this race without using the live variable. [1] commit a040f95dc819 ("IB/core: Fix XRC race condition in ib_uverbs_open_qp") Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Add idr based standard typesMatan Barak1-3/+5
This patch adds the standard idr based types. These types are used in downstream patches in order to initialize, destroy and lookup IB standard objects which are based on idr objects. An idr object requires filling out several parameters. Its op pointer should point to uverbs_idr_ops and its size should be at least the size of ib_uobject. We add a macro to make the type declaration easier. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Refactor idr to be per uverbs_fileMatan Barak1-31/+14
The current code creates an idr per type. Since types are currently common for all drivers and known in advance, this was good enough. However, the proposed ioctl based infrastructure allows each driver to declare only some of the common types and declare its own specific types. Thus, we decided to implement idr to be per uverbs_file. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-03-21infiniband: utilize the new cdev_set_parent functionLogan Gunthorpe1-1/+1
This replaces the suspect looking cdev.kobj.parent lines with the equivalent cdev_set_parent function. This is a straightforward change that's largely cosmetic but it does push the kobj.parent ownership into char_dev.c where it belongs. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-27Merge branch 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds1-0/+20
Pull cgroup updates from Tejun Heo: "Several noteworthy changes. - Parav's rdma controller is finally merged. It is very straight forward and can limit the abosolute numbers of common rdma constructs used by different cgroups. - kernel/cgroup.c got too chubby and disorganized. Created kernel/cgroup/ subdirectory and moved all cgroup related files under kernel/ there and reorganized the core code. This hurts for backporting patches but was long overdue. - cgroup v2 process listing reimplemented so that it no longer depends on allocating a buffer large enough to cache the entire result to sort and uniq the output. v2 has always mangled the sort order to ensure that users don't depend on the sorted output, so this shouldn't surprise anybody. This makes the pid listing functions use the same iterators that are used internally, which have to have the same iterating capabilities anyway. - perf cgroup filtering now works automatically on cgroup v2. This patch was posted a long time ago but somehow fell through the cracks. - misc fixes asnd documentation updates" * 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits) kernfs: fix locking around kernfs_ops->release() callback cgroup: drop the matching uid requirement on migration for cgroup v2 cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy cgroup: misc cleanups cgroup: call subsys->*attach() only for subsystems which are actually affected by migration cgroup: track migration context in cgroup_mgctx cgroup: cosmetic update to cgroup_taskset_add() rdmacg: Fixed uninitialized current resource usage cgroup: Add missing cgroup-v2 PID controller documentation. rdmacg: Added documentation for rdmacg IB/core: added support to use rdma cgroup controller rdmacg: Added rdma cgroup controller cgroup: fix a comment typo cgroup: fix RCU related sparse warnings cgroup: move namespace code to kernel/cgroup/namespace.c cgroup: rename functions for consistency cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c cgroup: separate out cgroup1_kf_syscall_ops cgroup: refactor mount path and clearly distinguish v1 and v2 paths cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c ...
2017-01-24IB/core: Use dev.parent instead of dma_deviceBart Van Assche1-1/+1
Prepare for removal of ib_device.dma_device. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10IB/core: added support to use rdma cgroup controllerParav Pandit1-0/+20
Added support APIs for IB core to register/unregister every IB/RDMA device with rdma cgroup for tracking rdma resources. IB core registers with rdma cgroup controller. Added support APIs for uverbs layer to make use of rdma controller. Added uverbs layer to perform resource charge/uncharge functionality. Added support during query_device uverb operation to ensure it returns resource limits by honoring rdma cgroup configured limits. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-1/+5
Pull rdma updates from Doug Ledford: "This is the complete update for the rdma stack for this release cycle. Most of it is typical driver and core updates, but there is the entirely new VMWare pvrdma driver. You may have noticed that there were changes in DaveM's pull request to the bnxt Ethernet driver to support a RoCE RDMA driver. The bnxt_re driver was tentatively set to be pulled in this release cycle, but it simply wasn't ready in time and was dropped (a few review comments still to address, and some multi-arch build issues like prefetch() not working across all arches). Summary: - shared mlx5 updates with net stack (will drop out on merge if Dave's tree has already been merged) - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe - debug cleanups - new connection rejection helpers - SRP updates - various misc fixes - new paravirt driver from vmware" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits) IB: Add vmw_pvrdma driver IB/mlx4: fix improper return value IB/ocrdma: fix bad initialization infiniband: nes: return value of skb_linearize should be handled MAINTAINERS: Update Intel RDMA RNIC driver maintainers MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers IB/core: fix unmap_sg argument qede: fix general protection fault may occur on probe IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc mlx5, calc_sq_size(): Make a debug message more informative mlx5: Remove a set-but-not-used variable mlx5: Use { } instead of { 0 } to init struct IB/srp: Make writing the add_target sysfs attr interruptible IB/srp: Make mapping failures easier to debug IB/srp: Make login failures easier to debug IB/srp: Introduce a local variable in srp_add_one() IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build IB/multicast: Check ib_find_pkey() return value IPoIB: Avoid reading an uninitialized member variable IB/mad: Fix an array index check ...
2016-12-14Merge branch 'mlx' into merge-testDoug Ledford1-0/+1
2016-12-13IB/uverbs: Extend modify_qp and support packet pacingBodong Wang1-0/+1
An new uverbs command ib_uverbs_ex_modify_qp is added to support more QP attributes. User driver should choose to call the legacy/extended API based on input mask. IB_USER_LAST_QP_ATTR_MASK is added to indicated the maximum bit position which supports legacy ib_uverbs_modify_qp. IB_USER_LEGACY_LAST_QP_ATTR_MASK indicates the maximum bit position which supports ib_uverbs_ex_modify_qp, the value of this mask should be updated if new mask is added later. Along with this change, rate_limit is supported by the extended command, user driver could use it to control packet packing. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03infiniband: remove WARN that is not kernel bugLeon Romanovsky1-1/+4
On Mon, Nov 21, 2016 at 09:52:53AM -0700, Jason Gunthorpe wrote: > On Mon, Nov 21, 2016 at 02:14:08PM +0200, Leon Romanovsky wrote: > > > > > > In ib_ucm_write function there is a wrong prefix: > > > > > > + pr_err_once("ucm_write: process %d (%s) tried to do something hinky\n", > > > > I did it intentionally to have the same errors for all flows. > > Lets actually use a good message too please? > > pr_err_once("ucm_write: process %d (%s) changed security contexts after opening FD, this is not allowed.\n", > > Jason >From 70f95b2d35aea42e5b97e7d27ab2f4e8effcbe67 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky <leonro@mellanox.com> Date: Mon, 21 Nov 2016 13:30:59 +0200 Subject: [PATCH rdma-next V2] IB/{core, qib}: Remove WARN that is not kernel bug WARNINGs mean kernel bugs, in this case, they are placed to mark programming errors and/or malicious attempts. BUG/WARNs that are not kernel bugs hinder automated testing efforts. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/uverbs: Fix leak of XRC target QPsTariq Toukan1-5/+2
The real QP is destroyed in case of the ref count reaches zero, but for XRC target QPs this call was missed and caused to QP leaks. Let's call to destroy for all flows. Fixes: 0e0ec7e0638e ('RDMA/core: Export ib_open_qp() to share XRC...') Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge branches 'misc' and 'rxe' into k.o/for-4.8-1Doug Ledford1-13/+24
2016-08-03IB/uverbs: Fix race between uverbs_close and remove_oneJason Gunthorpe1-13/+24
Fixes an oops that might happen if uverbs_close races with remove_one. Both contexts may run ib_uverbs_cleanup_ucontext, it depends on the flow. Currently, there is no protection for a case that remove_one didn't make the cleanup it runs to its end, the underlying ib_device was freed then uverbs_close will call ib_uverbs_cleanup_ucontext and OOPs. Above might happen if uverbs_close deleted the file from the list then remove_one didn't find it and runs to its end. Fixes to protect against that case by a new cleanup lock so that ib_uverbs_cleanup_ucontext will be called always before that remove_one is ended. Fixes: 35d4a0b63dc0 ("IB/uverbs: Fix race between ib_uverbs_open and remove_one") Reported-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Introduce RWQ Indirection tableYishai Hadas1-0/+13
User applications that want to spread traffic on several WQs, need to create an indirection table, by using already created WQs. Adding uverbs API in order to create and destroy this table. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Add WQ supportYishai Hadas1-0/+25
User space applications which use RSS functionality need to create a work queue object (WQ). The lifetime of such an object is: * Create a WQ * Modify the WQ from reset to init state. * Use the WQ (by downstream patches). * Destroy the WQ. These commands are added to the uverbs API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28IB/security: Restrict use of the write() interfaceJason Gunthorpe1-0/+5
The drivers/infiniband stack uses write() as a replacement for bi-directional ioctl(). This is not safe. There are ways to trigger write calls that result in the return structure that is normally written to user space being shunted off to user specified kernel memory instead. For the immediate repair, detect and deny suspicious accesses to the write API. For long term, update the user space libraries and the kernel API to something that doesn't present the same security vulnerabilities (likely a structured ioctl() interface). The impacted uAPI interfaces are generally only available if hardware from drivers/infiniband is installed in the system. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [ Expanded check to all known write() entry points ] Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>