aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-10-10IB/core: Simplify sa_path_set_[sd]lid() callsBart Van Assche2-19/+17
Instead of making every caller convert the second argument of sa_path_set_slid() and sa_path_set_dlid() to big endian format, make these two functions accept LIDs in CPU endian format. This patch does not change any functionality. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Cc: Don Hiatt <don.hiatt@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-04IB/core: Do not warn on lid conversions for OPADon Hiatt1-1/+10
On OPA devices the user_mad recv_handler can receive 32Bit LIDs (e.g. OPA_PERMISSIVE_LID) and it is okay to lose the upper 16 bits of the LID as this information is obtained elsewhere. Do not issue a warning when calling ib_lid_be16() in this case by masking out the upper 16Bits. [75667.310846] ------------[ cut here ]------------ [75667.316447] WARNING: CPU: 0 PID: 1718 at ./include/rdma/ib_verbs.h:3799 recv_handler+0x15a/0x170 [ib_umad] [75667.327640] Modules linked in: ib_ipoib hfi1(E) rdmavt(E) rdma_ucm(E) ib_ucm(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_umad(E) ib_uverbs(E) ib_core(E) libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod dax x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel mei_me ipmi_si iTCO_wdt iTCO_vendor_support crypto_simd ipmi_devintf pcspkr mei sg i2c_i801 glue_helper lpc_ich shpchp ioatdma mfd_core wmi ipmi_msghandler cryptd acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ptp ahci libahci pps_core crc32c_intel libata dca i2c_algo_bit i2c_core [last unloaded: ib_core] [75667.407704] CPU: 0 PID: 1718 Comm: kworker/0:1H Tainted: G W I E 4.13.0-rc7+ #1 [75667.417310] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [75667.429555] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [75667.436360] task: ffff88084a718000 task.stack: ffffc9000a424000 [75667.443549] RIP: 0010:recv_handler+0x15a/0x170 [ib_umad] [75667.450090] RSP: 0018:ffffc9000a427ce8 EFLAGS: 00010286 [75667.456508] RAX: 00000000ffffffff RBX: ffff88085159ce80 RCX: 0000000000000000 [75667.465094] RDX: ffff88085a47b068 RSI: 0000000000000000 RDI: ffff88085159cf00 [75667.473668] RBP: ffffc9000a427d38 R08: 000000000001efc0 R09: ffff88085159ce80 [75667.482228] R10: ffff88085f007480 R11: ffff88084acf20e8 R12: ffff88085a47b020 [75667.490824] R13: ffff881056842e10 R14: ffff881056840200 R15: ffff88104c8d0800 [75667.499390] FS: 0000000000000000(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000 [75667.509028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [75667.516080] CR2: 00007f9e4b3d9000 CR3: 0000000001c09000 CR4: 00000000001406f0 [75667.524664] Call Trace: [75667.528044] ? find_mad_agent+0x7c/0x1b0 [ib_core] [75667.534031] ? ib_mark_mad_done+0x73/0xa0 [ib_core] [75667.540142] ib_mad_recv_done+0x423/0x9b0 [ib_core] [75667.546215] __ib_process_cq+0x5d/0xb0 [ib_core] [75667.552007] ib_cq_poll_work+0x20/0x60 [ib_core] [75667.557766] process_one_work+0x149/0x360 [75667.562844] worker_thread+0x4d/0x3c0 [75667.567529] kthread+0x109/0x140 [75667.571713] ? rescuer_thread+0x380/0x380 [75667.576775] ? kthread_park+0x60/0x60 [75667.581447] ret_from_fork+0x25/0x30 [75667.586014] Code: 43 4a 0f b6 45 c6 88 43 4b 48 8b 45 b0 48 89 43 4c 48 8b 45 b8 48 89 43 54 8b 45 c0 0f c8 89 43 5c e9 79 ff ff ff e8 16 4e fa e0 <0f> ff e9 42 ff ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 [75667.608323] ---[ end trace cf26df27c9597264 ]--- Fixes: 62ede7779904 ("Add OPA extended LID support") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29RDMA/iwpm: Properly mark end of NL messagesShiraz Saleem2-0/+13
Commit 1a1c116f3dcf removes nlmsg_len calculation in ibnl_put_attr causing netlink messages to be rejected due to incorrect length. Add nlmsg_end after all attributes are appended to calculate the nlmsg_len. Fixes: 1a1c116f3dcf ("RDMA/netlink: Simplify the put_msg and put_attr") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/uverbs: clean up INIT_UDATA() macro usageArnd Bergmann2-80/+67
After changing INIT_UDATA_BUF_OR_NULL() to an inline function, this does the same change to INIT_UDATA for consistency. I'm keeping it separate as this part is much larger and we wouldn't want to backport this to stable kernels if we ever want to address the gcc warnings by backporting the first patch. Again, using an inline function gives us better type safety here among other issues with macros. I'm using u64_to_user_ptr() to convert the user pointer to simplify the logic rather than adding lots of new type casts. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/uverbs: clean up INIT_UDATA_BUF_OR_NULL usageArnd Bergmann3-20/+20
We get a harmless warning about the fact that we use the result of a multiplication as a condition: drivers/infiniband/core/uverbs_main.c: In function 'ib_uverbs_write': drivers/infiniband/core/uverbs_main.c:787:40: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:787:117: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:790:50: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] drivers/infiniband/core/uverbs_main.c:790:151: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context] This avoids the problem by using an inline function in place of the macro. Fixes: a96e4e2ffe43 ("IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()") Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://patchwork.kernel.org/patch/9940777/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/core: fix spelling mistake: "aceess" -> "access"Colin Ian King1-1/+1
Trivial fix to spelling mistake in WARN message Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/core: Fix qp_sec use after free accessParav Pandit1-1/+3
When security_ib_alloc_security fails, qp->qp_sec memory is freed. However ib_destroy_qp still tries to access this memory which result in kernel crash. So its initialized to NULL to avoid such access. Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/core: Fix typo in the name of the tag-matching cap structLeon Romanovsky1-7/+7
The tag matching functionality is implemented by mlx5 driver by extending XRQ, however this internal kernel information was exposed to user space applications with *xrq* name instead of *tm*. This patch renames *xrq* to *tm* to handle that. Fixes: 8d50505ada72 ("IB/uverbs: Expose XRQ capabilities") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-2/+2
Pull rdma fixes from Doug Ledford: - Smattering of miscellanous fixes - A five patch series for i40iw that had a patch (5/5) that was larger than I would like, but I took it because it's needed for large scale users - An 8 patch series for bnxt_re that landed right as I was leaving on PTO and so had to wait until now...they are all appropriate fixes for -rc IMO * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (22 commits) bnxt_re: Don't issue cmd to delete GID for QP1 GID entry before the QP is destroyed bnxt_re: Fix memory leak in FRMR path bnxt_re: Remove RTNL lock dependency in bnxt_re_query_port bnxt_re: Fix race between the netdev register and unregister events bnxt_re: Free up devices in module_exit path bnxt_re: Fix compare and swap atomic operands bnxt_re: Stop issuing further cmds to FW once a cmd times out bnxt_re: Fix update of qplib_qp.mtu when modified i40iw: Add support for port reuse on active side connections i40iw: Add missing VLAN priority i40iw: Call i40iw_cm_disconn on modify QP to disconnect i40iw: Prevent multiple netdev event notifier registrations i40iw: Fail open if there are no available MSI-X vectors RDMA/vmw_pvrdma: Fix reporting correct opcodes for completion IB/bnxt_re: Fix frame stack compilation warning IB/mlx5: fix debugfs cleanup IB/ocrdma: fix incorrect fall-through on switch statement IB/ipoib: Suppress the retry related completion errors iw_cxgb4: remove the stid on listen create failure iw_cxgb4: drop listen destroy replies if no ep found ...
2017-09-22IB/core: Fix for core panicAlex Estrin1-2/+2
Build with the latest patches resulted in panic: 11384.486289] BUG: unable to handle kernel NULL pointer dereference at (null) [11384.486293] IP: (null) [11384.486295] PGD 0 [11384.486295] P4D 0 [11384.486296] [11384.486299] Oops: 0010 [#1] SMP ......... snip ...... [11384.486401] CPU: 0 PID: 968 Comm: kworker/0:1H Tainted: G W O 4.13.0-a-stream-20170825 #1 [11384.486402] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015 [11384.486418] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [11384.486419] task: ffff880850579680 task.stack: ffffc90007fec000 [11384.486420] RIP: 0010: (null) [11384.486420] RSP: 0018:ffffc90007fef970 EFLAGS: 00010206 [11384.486421] RAX: ffff88084cfe8000 RBX: ffff88084dce4000 RCX: ffffc90007fef978 [11384.486422] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88084cfe8000 [11384.486422] RBP: ffffc90007fefab0 R08: 0000000000000000 R09: ffff88084dce4080 [11384.486423] R10: ffffffffa02d7f60 R11: 0000000000000000 R12: ffff88105af65a00 [11384.486423] R13: ffff88084dce4000 R14: 000000000000c000 R15: 000000000000c000 [11384.486424] FS: 0000000000000000(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000 [11384.486425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11384.486425] CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000001406f0 [11384.486426] Call Trace: [11384.486431] ? is_valid_mcast_lid.isra.21+0xfb/0x110 [ib_core] [11384.486436] ib_attach_mcast+0x6f/0xa0 [ib_core] [11384.486441] ipoib_mcast_attach+0x81/0x190 [ib_ipoib] [11384.486443] ipoib_mcast_join_complete+0x354/0xb40 [ib_ipoib] [11384.486448] mcast_work_handler+0x330/0x6c0 [ib_core] [11384.486452] join_handler+0x101/0x220 [ib_core] [11384.486455] ib_sa_mcmember_rec_callback+0x54/0x80 [ib_core] [11384.486459] recv_handler+0x3a/0x60 [ib_core] [11384.486462] ib_mad_recv_done+0x423/0x9b0 [ib_core] [11384.486466] __ib_process_cq+0x5d/0xb0 [ib_core] [11384.486469] ib_cq_poll_work+0x20/0x60 [ib_core] [11384.486472] process_one_work+0x149/0x360 [11384.486474] worker_thread+0x4d/0x3c0 [11384.486487] kthread+0x109/0x140 [11384.486488] ? rescuer_thread+0x380/0x380 [11384.486489] ? kthread_park+0x60/0x60 [11384.486490] ? kthread_park+0x60/0x60 [11384.486493] ret_from_fork+0x25/0x30 [11384.486493] Code: Bad RIP value. [11384.486493] Code: Bad RIP value. [11384.486496] RIP: (null) RSP: ffffc90007fef970 [11384.486497] CR2: 0000000000000000 [11384.486531] ---[ end trace b1acec6fb4ff6e75 ]--- [11384.532133] Kernel panic - not syncing: Fatal exception [11384.536541] Kernel Offset: disabled [11384.969491] ---[ end Kernel panic - not syncing: Fatal exception [11384.976875] sched: Unexpected reschedule of offline CPU#1! [11384.983646] ------------[ cut here ]------------ Rdma device driver may not have implemented (*get_link_layer)() so it can not be called directly. Should use appropriate helper function. Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Fixes: 523633359224 ("IB/core: Fix the validations of a multicast LID in attach or detach operations") Cc: stable@kernel.org # 4.13 Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-09Merge tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-0/+24
Pull nfsd updates from Bruce Fields: "More RDMA work and some op-structure constification from Chuck Lever, and a small cleanup to our xdr encoding" * tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux: svcrdma: Estimate Send Queue depth properly rdma core: Add rdma_rw_mr_payload() svcrdma: Limit RQ depth svcrdma: Populate tail iovec when receiving nfsd: Incoming xdr_bufs may have content in tail buffer svcrdma: Clean up svc_rdma_build_read_chunk() sunrpc: Const-ify struct sv_serv_ops nfsd: Const-ify NFSv4 encoding and decoding ops arrays sunrpc: Const-ify instances of struct svc_xprt_ops nfsd4: individual encoders no longer see error cases nfsd4: skip encoder in trivial error cases nfsd4: define ->op_release for compound ops nfsd4: opdesc will be useful outside nfs4proc.c nfsd4: move some nfsd4 op definitions to xdr4.h
2017-09-08lib/interval_tree: fast overlap detectionDavidlohr Bueso2-3/+3
Allow interval trees to quickly check for overlaps to avoid unnecesary tree lookups in interval_tree_iter_first(). As of this patch, all interval tree flavors will require using a 'rb_root_cached' such that we can have the leftmost node easily available. While most users will make use of this feature, those with special functions (in addition to the generic insert, delete, search calls) will avoid using the cached option as they can do funky things with insertions -- for example, vma_interval_tree_insert_after(). [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()] Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Doug Ledford <dledford@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Jason Wang <jasowang@redhat.com> Cc: Christian Benvenuti <benve@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08RDMA/netlink: clean up message validity array initializerLinus Torvalds1-6/+5
The fix in the parent made me look at that function, and react to how illogical and illegible the array initializer was. Use named array indexes to make it clearer what is going on, and make the initializer not depend silently on the exact index numbers. [ The initializer now also shows an odd inconsistency in the naming: note the IWCM vs IWPM.. - Linus ] Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Doug Ledford <dledford@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08RDAM/netlink: Fix out-of-bound access while checking message validityLeon Romanovsky1-3/+4
The netlink message sent with type == 0, which doesn't have any client behind it, caused to the overflow in max_num_ops array. Fix it by declaring zero number of ops for the first client. Fixes: c9901724a2f1 ("RDMA/netlink: Remove netlink clients infrastructure") Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-05rdma core: Add rdma_rw_mr_payload()Chuck Lever1-0/+24
The amount of payload per MR depends on device capabilities and the memory registration mode in use. The new rdma_rw API hides both, making it difficult for ULPs to determine how large their transport send queues need to be. Expose the MR payload information via a new API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-03Merge tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds29-578/+2718
Pull rdma updates from Doug Ledford: "This is a big pull request. Of note is that I'm sending you the new ioctl API for the rdma subsystem. We put it up on linux-api@, but didn't get much response. The API is complex, but it solves two different problems in one go: 1) The bi-directional nature of the RDMA file write calls, which created the security hole we had to handle (and for which the fix is now causing problems for systems in production, we were a bit over zealous in the fix and the ability to open a device, then fork, then create new queue pairs on the device and use them is broken). 2) The bloat caused by different vendors implementing extensions to the base verbs API. Each vendor's hardware is slightly different, and the hardware might be suitable for one extension but not another. By the time we add generic extensions for all the different ways that the different hardware can offload things, the API becomes bloated. Things like our completion structs have started to exceed a cache line in size because of all the elements needed to support this. That in turn shows up heavily in the performance graphs with a noticable drop in performance on 100Gigabit links as our completion structs go from occupying one cache line to 1+. This API makes things like the completion structs modular in a very similar way to netlink so that your structs can only include the items needed for the offloads/features you are actually using on a given queue pair. In that way we support everything, but only use what we need, and our structs stay smaller. The ioctl API is better explained by the posting on linux-api@ than I can explain it here, so I'll just leave it at that. The rest of the pull request is typical stuff. Updates for 4.14 kernel merge window - Lots of hfi1 driver updates (mixed with a few qib and core updates as well) - rxe updates - various mlx updates - Set default roce type to RoCEv2 - Several larger fixes for bnxt_re that were too big for -rc - Several larger fixes for qedr that, likewise, were too big for -rc - Misc core changes - Make the hns_roce driver compilable on arches other than aarch64 so we can more easily debug build issues related to it - Add rdma-netlink infrastructure updates - Add automatic IRQ affinity infrastructure - Add 32bit lid support - Lots of misc fixes across the subsystem from random people - Autoloading of RDMA netlink modules - PCI pool cleanups from Romain Perier - mlx5 driver feature additions and fixes - Hardware tag matchine feature - Fix sleeping in atomic when resolving roce ah - Add experimental ioctl interface as posted to linux-api@" * tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits) IB/core: Expose ioctl interface through experimental Kconfig IB/core: Assign root to all drivers IB/core: Add completion queue (cq) object actions IB/core: Add legacy driver's user-data IB/core: Export ioctl enum types to user-space IB/core: Explicitly destroy an object while keeping uobject IB/core: Add macros for declaring methods and attributes IB/core: Add uverbs merge trees functionality IB/core: Add DEVICE object and root tree structure IB/core: Declare an object instead of declaring only type attributes IB/core: Add new ioctl interface RDMA/vmw_pvrdma: Fix a signedness RDMA/vmw_pvrdma: Report network header type in WC IB/core: Add might_sleep() annotation to ib_init_ah_from_wc() IB/cm: Fix sleeping in atomic when RoCE is used IB/core: Add support to finalize objects in one transaction IB/core: Add a generic way to execute an operation on a uobject Documentation: Hardware tag matching IB/mlx5: Support IB_SRQT_TM net/mlx5: Add XRQ support ...
2017-08-31IB/umem: update to new mmu_notifier semanticJérôme Glisse1-19/+0
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Leon Romanovsky <leonro@mellanox.com> Cc: linux-rdma@vger.kernel.org Cc: Artemy Kovalyov <artemyko@mellanox.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31IB/core: Expose ioctl interface through experimental KconfigMatan Barak2-0/+8
Add CONFIG_INFINIBAND_EXP_USER_ACCESS that enables the ioctl interface. This interface is experimental and is subject to change. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Assign root to all driversMatan Barak2-0/+19
In order to use the parsing tree, we need to assign the root to all drivers. Currently, we just assign the default parsing tree via ib_uverbs_add_one. The driver could override this by assigning a parsing tree prior to registering the device. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Add completion queue (cq) object actionsMatan Barak1-1/+137
Adding CQ ioctl actions: 1. create_cq 2. destroy_cq This requires adding the following: 1. A specification describing the method a. Handler b. Attributes specification Each attribute is one of the following: a. PTR_IN - input data Note: This could be encoded inlined for data < 64bit b. PTR_OUT - response data c. IDR - idr based object d. FD - fd based object Blobs attributes (clauses a and b) contain their type, while objects specifications (clauses c and d) contains the expected object type (for example, the given id should be UVERBS_TYPE_PD) and the required access (READ, WRITE, NEW or DESTROY). If a NEW is required, the new object's id will be assigned to this attribute. All attributes could get UA_FLAGS attribute. Currently we support stating that an attribute is mandatory or that the specification size corresponds to a lower bound (and that this attribute could be extended). We currently add both default attributes and the two generic UHW_IN and UHW_OUT driver specific attributes. 2. Handler A handler gets a uverbs_attr_bundle. The handler developer uses uverbs_attr_get to fetch an attribute of a given id. Each of these attribute groups correspond to the specification group defined in the action (clauses 1.b and 1.c respectively). The indices of these arrays corresponds to the attribute ids declared in the specifications (clause 2). The handler is quite simple. It assumes the infrastructure fetched all objects and locked, created or destroyed them as required by the specification. Pointer (or blob) attributes were validated to match their required sizes. After the handler finished, the infrastructure commits or rollbacks the objects. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Add legacy driver's user-dataMatan Barak1-0/+40
In this phase, we don't want to change all the drivers to use flexible driver's specific attributes. Therefore, we add two default attributes: UHW_IN and UHW_OUT. These attributes are optional in some methods and they encode the driver specific command data. We add a function that extract this data and creates the legacy udata over it. Driver's data should start from UVERBS_UDATA_DRIVER_DATA_FLAG. This turns on the first bit of the namespace, indicating this attribute belongs to the driver's namespace. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Explicitly destroy an object while keeping uobjectMatan Barak1-0/+35
When some objects are destroyed, we need to extract their status at destruction. After object's destruction, this status (e.g. events_reported) relies in the uobject. In order to have the latest and correct status, the underlying object should be destroyed, but we should keep the uobject alive and read this information off the uobject. We introduce a rdma_explicit_destroy function. This function destroys the class type object (for example, the IDR class type which destroys the underlying object as well) and then convert the uobject to be of a null class type. This uobject will then be destroyed as any other uobject once uverbs_finalize_object[s] is called. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Add uverbs merge trees functionalityMatan Barak2-1/+667
Different drivers support different features and even subset of the common uverbs implementation. Currently, this is handled as bitmask in every driver that represents which kind of methods it supports, but doesn't go down to attributes granularity. Moreover, drivers might want to add their specific types, methods and attributes to let their user-space counter-parts be exposed to some more efficient abstractions. It means that existence of different features is validated syntactically via the parsing infrastructure rather than using a complex in-handler logic. In order to do that, we allow defining features and abstractions as parsing trees. These per-feature parsing tree could be merged to an efficient (perfect-hash based) parsing tree, which is later used by the parsing infrastructure. To sum it up, this makes a parse tree unique for a device and represents only the features this particular device supports. This is done by having a root specification tree per feature. Before a device registers itself as an IB device, it merges all these trees into one parsing tree. This parsing tree is used to parse all user-space commands. A future user-space application could read this parse tree. This tree represents which objects, methods and attributes are supported by this device. This is based on the idea of Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Add DEVICE object and root tree structureMatan Barak1-0/+17
This adds the DEVICE object. This object supports creating the context that all objects are created from. Moreover, it supports executing methods which are related to the device itself, such as QUERY_DEVICE. This is a singleton object (per file instance). All standard objects are put in the root structure. This root will later on be used in drivers as the source for their whole parsing tree. Later on, when new features are added, these drivers could mix this root with other customized objects. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Declare an object instead of declaring only type attributesMatan Barak1-64/+48
Switch all uverbs_type_attrs_xxxx with DECLARE_UVERBS_OBJECT macros. This will be later used in order to embed the object specific methods in the objects as well. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Add new ioctl interfaceMatan Barak4-1/+416
In this ioctl interface, processing the command starts from properties of the command and fetching the appropriate user objects before calling the handler. Parsing and validation is done according to a specifier declared by the driver's code. In the driver, all supported objects are declared. These objects are separated to different object namepsaces. Dividing objects to namespaces is done at initialization by using the higher bits of the object ids. This initialization can mix objects declared in different places to one parsing tree using in this ioctl interface. For each object we list all supported methods. Similarly to objects, methods are separated to method namespaces too. Namespacing is done similarly to the objects case. This could be used in order to add methods to an existing object. Each method has a specific handler, which could be either a default handler or a driver specific handler. Along with the handler, a bunch of attributes are specified as well. Similarly to objects and method, attributes are namespaced and hashed by their ids at initialization too. All supported attributes are subject to automatic fetching and validation. These attributes include the command, response and the method's related objects' ids. When these entities (objects, methods and attributes) are used, the high bits of the entities ids are used in order to calculate the hash bucket index. Then, these high bits are masked out in order to have a zero based index. Since we use these high bits for both bucketing and namespacing, we get a compact representation and O(1) array access. This is mandatory for efficient dispatching. Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length. Attributes could be validated through some attributes, like: (*) Minimum size / Exact size (*) Fops for FD (*) Object type for IDR If an IDR/fd attribute is specified, the kernel also states the object type and the required access (NEW, WRITE, READ or DESTROY). All uobject/fd management is done automatically by the infrastructure, meaning - the infrastructure will fail concurrent commands that at least one of them requires concurrent access (WRITE/DESTROY), synchronize actions with device removals (dissociate context events) and take care of reference counting (increase/decrease) for concurrent actions invocation. The reference counts on the actual kernel objects shall be handled by the handlers. objects +--------+ | | | | methods +--------+ | | ns method method_spec +-----+ |len | +--------+ +------+[d]+-------+ +----------------+[d]+------------+ |attr1+-> |type | | object +> |method+-> | spec +-> + attr_buckets +-> |default_chain+--> +-----+ |idr_type| +--------+ +------+ |handler| | | +------------+ |attr2| |access | | | | | +-------+ +----------------+ |driver chain| +-----+ +--------+ | | | | +------------+ | | +------+ | | | | | | | | | | | | | | | | | | | | +--------+ [d] = Hash ids to groups using the high order bits The right types table is also chosen by using the high bits from the ids. Currently we have either default or driver specific groups. Once validation and object fetching (or creation) completed, we call the handler: int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile, struct uverbs_attr_bundle *ctx); ctx bundles attributes of different namespaces. Each element there is an array of attributes which corresponds to one namespaces of attributes. For example, in the usually used case: ctx core +----------------------------+ +------------+ | core: +---> | valid | +----------------------------+ | cmd_attr | | driver: | +------------+ |----------------------------+--+ | valid | | | cmd_attr | | +------------+ | | valid | | | obj_attr | | +------------+ | | drivers | +------------+ +> | valid | | cmd_attr | +------------+ | valid | | cmd_attr | +------------+ | valid | | obj_attr | +------------+ Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()Roland Dreier1-0/+2
For RoCE, ib_init_ah_from_wc() can follow the path ib_init_ah_from_wc() -> rdma_addr_find_l2_eth_by_grh() -> rdma_resolve_ip() and rdma_resolve_ip() will sleep in kzalloc() and wait_for_completion(). However, developers will not see any warnings if they use ib_init_ah_from_wc() in an atomic context and test only on IB, because the function doesn't sleep in that case. Add a might_sleep() so that lockdep will catch bugs no matter what hardware is used to test. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31IB/cm: Fix sleeping in atomic when RoCE is usedRoland Dreier1-19/+44
A couple of places in the CM do spin_lock_irq(&cm_id_priv->lock); ... if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg)) However when the underlying transport is RoCE, this leads to a sleeping function being called with the lock held - the callchain is cm_alloc_response_msg() -> ib_create_ah_from_wc() -> ib_init_ah_from_wc() -> rdma_addr_find_l2_eth_by_grh() -> rdma_resolve_ip() and rdma_resolve_ip() starts out by doing req = kzalloc(sizeof *req, GFP_KERNEL); not to mention rdma_addr_find_l2_eth_by_grh() doing wait_for_completion(&ctx.comp); to wait for the task that rdma_resolve_ip() queues up. Fix this by moving the AH creation out of the lock. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-30IB/core: Add support to finalize objects in one transactionMatan Barak2-1/+61
The new ioctl based infrastructure either commits or rollbacks all objects of the method as one transaction. In order to do that, we introduce a notion of dealing with a collection of objects that are related to a specific method. This also requires adding a notion of a method and attribute. A method contains a hash of attributes, where each bucket contains several attributes. The attributes are hashed according to their namespace which resides in the four upper bits of the id. For example, an object could be a CQ, which has an action of CREATE_CQ. This action has multiple attributes. For example, the CQ's new handle and the comp_channel. Each layer in this hierarchy - objects, methods and attributes is split into namespaces. The basic example for that is one namespace representing the default entities and another one representing the driver specific entities. When declaring these methods and attributes, we actually declare their specifications. When a method is executed, we actually allocates some space to hold auxiliary information. This auxiliary information contains meta-data about the required objects, such as pointers to their type information, pointers to the uobjects themselves (if exist), etc. The specification, along with the auxiliary information we allocated and filled is given to the finalize_objects function. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-30IB/core: Add a generic way to execute an operation on a uobjectMatan Barak2-0/+75
The ioctl infrastructure treats all user-objects in the same manner. It gets objects ids from the user-space and by using the object type and type attributes mentioned in the object specification, it executes this required method. Passing an object id from the user-space as an attribute is carried out in three stages. The first is carried out before the actual handler and the last is carried out afterwards. The different supported operations are read, write, destroy and create. In the first stage, the former three actions just fetches the object from the repository (by using its id) and locks it. The last action allocates a new uobject. Afterwards, the second stage is carried out when the handler itself carries out the required modification of the object. The last stage is carried out after the handler finishes and commits the result. The former two operations just unlock the object. Destroy calls the "free object" operation, taking into account the object's type and releases the uobject as well. Creation just adds the new uobject to the repository, making the object visible to the application. In order to abstract these details from the ioctl infrastructure layer, we add uverbs_get_uobject_from_context and uverbs_finalize_object functions which corresponds to the first and last stages respectively. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/uverbs: Expose XRQ capabilitiesArtemy Kovalyov1-0/+10
Make XRQ capabilities available via ibv_query_device() verb. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/uverbs: Add new SRQ type IB_SRQT_TMArtemy Kovalyov1-1/+5
Add new SRQ type capable of new tag matching feature. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29IB/core: Separate CQ handle in SRQ contextArtemy Kovalyov2-17/+26
Before this change CQ attached to SRQ was part of XRC specific extension. Moving CQ handle out makes it available to other types extending SRQ functionality. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24Merge branch 'mellanox' into k.o/for-nextDoug Ledford7-135/+77
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB: Avoid ib_modify_port() failure for RoCE devicesSelvin Xavier1-4/+7
IB CM calls ib_modify_port() irrespective of link layer. If the failure is returned, the mad agent gets unregistered for those devices. Recently, modify_port() hook was removed from some of the low level drivers as it was always returning success. This breaks rdma connection establishment over those devices. For ethernet devices, Qkey violation and port capabilities are not applicable. So returning success for RoCE when modify_port hook is is not implemented. Cc: Leon Romanovsky <leon@kernel.org> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/core: Refactor get link layer wrapperLeon Romanovsky1-9/+5
The return values from rdma_node_get_transport() are strict and IB_LINK_LAYER_UNSPECIFIED is unreachable in this flow. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/core: Delete BUG() from unreachable flowLeon Romanovsky1-13/+7
Remove call to BUG() in case wrong node_type was provided. This flow is unreachable, because node_types are supplied from specific enum. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/(core, ulp): Convert register/unregister event handler to be voidLeon Romanovsky4-35/+12
The functions ib_register_event_handler() and ib_unregister_event_handler() always returned success and they can't fail. Let's convert those functions to be void, remove redundant checks and cleanup tons of goto statements. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/uverbs: Introduce and use helper functions to copy ah attributesParav Pandit1-76/+51
This patch introduces two helper functions to copy ah attributes from uverbs to internal ib_ah_attr structure and the other way during modify qp and query qp respectively. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/cma: Fix erroneous validation of supported default GID typeLeon Romanovsky1-2/+2
When rdma_cm is initializing a cma_device it checks if this device supports the preferred default GID type. This check was done in a wrong way and therefore sometimes rdma_cm is coming up with default GID type that is not supported by the device. Fix that by checking for supported GID type properly. Fixes: 3c7f67d1880d ("IB/cma: Fix default RoCE type setting") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24Merge branch 'k.o/for-4.13-rc' into k.o/for-nextDoug Ledford2-5/+13
Pick up -rc fixes. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/core: Avoid accessing non-allocated memory when inferring port typeNoa Osherovich2-5/+13
Commit 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") introduced the concept of type in ah_attr: * During ib_register_device, each port is checked for its type which is stored in ib_device's port_immutable array. * During uverbs' modify_qp, the type is inferred using the port number in ib_uverbs_qp_dest struct (address vector) by accessing the relevant port_immutable array and the type is passed on to providers. IB spec (version 1.3) enforces a valid port value only in Reset to Init. During Init to RTR, the address vector must be valid but port number is not mentioned as a field in the address vector, so its value is not validated, which leads to accesses to a non-allocated memory when inferring the port type. Save the real port number in ib_qp during modify to Init (when the comp_mask indicates that the port number is valid) and use this value to infer the port type. Avoid copying the address vector fields if the matching bit is not set in the attr_mask. Address vector can't be modified before the port, so no valid flow is affected. Fixes: 44c58487d51a ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types') Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22rdma: Autoload netlink client modulesJason Gunthorpe5-0/+18
If a message comes in and we do not have the client in the table, then try to load the module supplying that client using MODULE_ALIAS to find it. This duplicates the scheme seen in other netlink muxes (eg nfnetlink). Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22rdma: Allow demand loading of NETLINK_RDMAJason Gunthorpe1-0/+3
Provide a module alias so that if userspace opens a netlink socket for RDMA the kernel support is loaded automatically. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Determine 9B/16B L2 header type based on Address handleDon Hiatt2-6/+18
When address handle attributes are initialized, the LIDs are transformed to be in the 32 bit LID space. When constructing the header, hfi1 driver will look at the LID to determine the packet header to be created. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/core: Fix input len in multiple user verbsAmrani, Ram1-30/+40
Most user verbs pass user data to the kernel with the inclusion of the ib_uverbs_cmd_hdr structure. This is problematic because the vendor has no ideas if the verb was called by a legacy verb or an extended verb. Also, the incosistency between the verbs is confusing. Fixes: 565197dd8fb1 ("IB/core: Extend ib_uverbs_create_cq") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22RDMA/uverbs: Initialize cq_context appropriatelyBharat Potnuri1-1/+1
Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer dereference in ib_uverbs_comp_handler(), if application doesnot use completion channel. This patch fixes the cq_context initialization. Fixes: 1e7710f3f65 ("IB/core: Change completion channel to use the reworked") Cc: stable@vger.kernel.org # 4.12 Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com> (cherry picked from commit 699a2d5b1b880b4e4e1c7d55fa25659322cf5b51)
2017-08-18Add OPA extended LID supportHiatt, Don3-8/+9
This patch series primarily increases sizes of variables that hold lid values from 16 to 32 bits. Additionally, it adds a check in the IB mad stack to verify a properly formatted MAD when OPA extended LIDs are used. Signed-off-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18Merge branch 'k.o/for-4.13-rc' into k.o/for-nextDoug Ledford5-21/+54
Merging our (hopefully) final -rc pull branch into our for-next branch because some of our pending patches won't apply cleanly without having the -rc patches in our tree. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18Merge branch 'misc' into k.o/for-nextDoug Ledford4-5/+5
Conflicts: drivers/infiniband/core/iwcm.c - The rdma_netlink patches in HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM, we aren't safe for that context) touched the same code. Signed-off-by: Doug Ledford <dledford@redhat.com>