aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-03-04IB/mlx5: Fix implicit ODP raceArtemy Kovalyov2-10/+8
Following race may occur because of the call_srcu and the placement of the synchronize_srcu vs the xa_erase. CPU0 CPU1 mlx5_ib_free_implicit_mr: destroy_unused_implicit_child_mr: xa_erase(odp_mkeys) synchronize_srcu() xa_lock(implicit_children) if (still in xarray) atomic_inc() call_srcu() xa_unlock(implicit_children) xa_erase(implicit_children): xa_lock(implicit_children) __xa_erase() xa_unlock(implicit_children) flush_workqueue() [..] free_implicit_child_mr_rcu: (via call_srcu) queue_work() WARN_ON(atomic_read()) [..] free_implicit_child_mr_work: (via wq) free_implicit_child_mr() mlx5_mr_cache_invalidate() mlx5_ib_update_xlt() <-- UMR QP fail atomic_dec() The wait_event() solves the race because it blocks until free_implicit_child_mr_work() completes. Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") Link: https://lore.kernel.org/r/20200227113918.94432-1-leon@kernel.org Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-02IB/hfi1, qib: Ensure RCU is locked when accessing listDennis Dalessandro2-1/+5
The packet handling function, specifically the iteration of the qp list for mad packet processing misses locking RCU before running through the list. Not only is this incorrect, but the list_for_each_entry_rcu() call can not be called with a conditional check for lock dependency. Remedy this by invoking the rcu lock and unlock around the critical section. This brings MAD packet processing in line with what is done for non-MAD packets. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20200225195445.140896.41873.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-14IB/mlx5: Use div64_u64 for num_var_hw_entries calculationJason Gunthorpe1-1/+1
On i386: ERROR: "__udivdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! ERROR: "__divdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! Fixes: f164be8c0366 ("IB/mlx5: Extend caps stage to handle VAR capabilities") Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reported-by: Alexander Lobakin <alobakin@dlink.ru> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/mlx5: Prevent overflow in mmap offset calculationsLeon Romanovsky1-2/+2
The cmd and index variables declared as u16 and the result is supposed to be stored in u64. The C arithmetic rules doesn't promote "(index >> 8) << 16" to be u64 and leaves the end result to be u16. Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods") Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/mlx5: Fix async events cleanup flowsYishai Hadas1-23/+28
As in the prior patch, the devx code is not fully cleaning up its event_lists before finishing driver_destroy allowing a later read to trigger user after free conditions. Re-arrange things so that the event_list is always empty after destroy and ensure it remains empty until the file is closed. Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects") Link: https://lore.kernel.org/r/20200212072635.682689-7-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11RDMA/iw_cxgb4: initiate CLOSE when entering TERMKrishnamraju Eraparaju2-2/+6
As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE when entering into TERM state. In c4iw_modify_qp(), disconnect operation should only be performed when the modify_qp call is invoked from ib_core. And all other internal modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks like below can occur: Call Trace: schedule+0x2f/0xa0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.5+0x2d0/0x4a0 c4iw_ep_disconnect+0x39/0x430 => tries to reacquire ep lock again c4iw_modify_qp+0x468/0x10d0 rx_data+0x218/0x570 => acquires ep lock process_work+0x5f/0x70 process_one_work+0x1a7/0x3b0 worker_thread+0x30/0x390 kthread+0x112/0x130 ret_from_fork+0x35/0x40 Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state") Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supportedMark Zhang1-3/+6
When binding a QP with a counter and the QP state is not RESET, return failure if the rts2rts_qp_counters_set_id is not supported by the device. This is to prevent cases like manual bind for Connect-IB devices from returning success when the feature is not supported. Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter") Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/hfi1: Close window for pq and request colidingMike Marciniszyn4-29/+48
Cleaning up a pq can result in the following warning and panic: WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0 list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100) Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G OE ------------ 3.10.0-957.38.3.el7.x86_64 #1 Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 Call Trace: [<ffffffff90365ac0>] dump_stack+0x19/0x1b [<ffffffff8fc98b78>] __warn+0xd8/0x100 [<ffffffff8fc98bff>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff8ff970c3>] __list_del_entry+0x63/0xd0 [<ffffffff8ff9713d>] list_del+0xd/0x30 [<ffffffff8fddda70>] kmem_cache_destroy+0x50/0x110 [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1] [<ffffffff8fe4519c>] __fput+0xec/0x260 [<ffffffff8fe453fe>] ____fput+0xe/0x10 [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0 [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0 [<ffffffff90379134>] int_signal+0x12/0x17 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 PGD 2cdab19067 PUD 2f7bfdb067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.38.3.el7.x86_64 #1 Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000 RIP: 0010:[<ffffffff8fe1f93e>] [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 RSP: 0018:ffff88b5393abd60 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003 RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800 RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19 R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000 R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000 FS: 00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: [<ffffffff8fe20d44>] __kmem_cache_shutdown+0x14/0x80 [<ffffffff8fddda78>] kmem_cache_destroy+0x58/0x110 [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1] [<ffffffff8fe4519c>] __fput+0xec/0x260 [<ffffffff8fe453fe>] ____fput+0xe/0x10 [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0 [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0 [<ffffffff90379134>] int_signal+0x12/0x17 Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39 RIP [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 RSP <ffff88b5393abd60> CR2: 0000000000000010 The panic is the result of slab entries being freed during the destruction of the pq slab. The code attempts to quiesce the pq, but looking for n_req == 0 doesn't account for new requests. Fix the issue by using SRCU to get a pq pointer and adjust the pq free logic to NULL the fd pq pointer prior to the quiesce. Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized") Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/hfi1: Acquire lock to release TID entries when user file is closedKaike Wan1-0/+2
Each user context is allocated a certain number of RcvArray (TID) entries and these entries are managed through TID groups. These groups are put into one of three lists in each user context: tid_group_list, tid_used_list, and tid_full_list, depending on the number of used TID entries within each group. When TID packets are expected, one or more TID groups will be allocated. After the packets are received, the TID groups will be freed. Since multiple user threads may access the TID groups simultaneously, a mutex exp_mutex is used to synchronize the access. However, when the user file is closed, it tries to release all TID groups without acquiring the mutex first, which risks a race condition with another thread that may be releasing its TID groups, leading to data corruption. This patch addresses the issue by acquiring the mutex first before releasing the TID groups when the file is closed. Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs") Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11RDMA/hfi1: Fix memory leak in _dev_comp_vect_mappings_createKamal Heib1-0/+2
Make sure to free the allocated cpumask_var_t's to avoid the following reported memory leak by kmemleak: $ cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8897f812d6a8 (size 8): comm "kworker/1:1", pid 347, jiffies 4294751400 (age 101.703s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000bff49664>] alloc_cpumask_var_node+0x4c/0xb0 [<0000000075d3ca81>] hfi1_comp_vectors_set_up+0x20f/0x800 [hfi1] [<0000000098d420df>] hfi1_init_dd+0x3311/0x4960 [hfi1] [<0000000071be7e52>] init_one+0x25e/0xf10 [hfi1] [<000000005483d4c2>] local_pci_probe+0xd4/0x180 [<000000007c3cbc6e>] work_for_cpu_fn+0x51/0xa0 [<000000001d626905>] process_one_work+0x8f0/0x17b0 [<000000007e569e7e>] worker_thread+0x536/0xb50 [<00000000fd39a4a5>] kthread+0x30c/0x3d0 [<0000000056f2edb3>] ret_from_fork+0x3a/0x50 Fixes: 5d18ee67d4c1 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support") Link: https://lore.kernel.org/r/20200205110530.12129-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds41-972/+1977
Pull rdma updates from Jason Gunthorpe: "A very quiet cycle with few notable changes. Mostly the usual list of one or two patches to drivers changing something that isn't quite rc worthy. The subsystem seems to be seeing a larger number of rework and cleanup style patches right now, I feel that several vendors are prepping their drivers for new silicon. Summary: - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4, rxe, i40iw - Larger series doing cleanup and rework for hns and hfi1. - Some general reworking of the CM code to make it a little more understandable - Unify the different code paths connected to the uverbs FD scheme - New UAPI ioctls conversions for get context and get async fd - Trace points for CQ and CM portions of the RDMA stack - mlx5 driver support for virtio-net formatted rings as RDMA raw ethernet QPs - verbs support for setting the PCI-E relaxed ordering bit on DMA traffic connected to a MR - A couple of bug fixes that came too late to make rc7" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits) RDMA/core: Make the entire API tree static RDMA/efa: Mask access flags with the correct optional range RDMA/cma: Fix unbalanced cm_id reference count during address resolve RDMA/umem: Fix ib_umem_find_best_pgsz() IB/mlx4: Fix leak in id_map_find_del IB/opa_vnic: Spelling correction of 'erorr' to 'error' IB/hfi1: Fix logical condition in msix_request_irq RDMA/cm: Remove CM message structs RDMA/cm: Use IBA functions for complex structure members RDMA/cm: Use IBA functions for simple structure members RDMA/cm: Use IBA functions for swapping get/set acessors RDMA/cm: Use IBA functions for simple get/set acessors RDMA/cm: Add SET/GET implementations to hide IBA wire format RDMA/cm: Add accessors for CM_REQ transport_type IB/mlx5: Return the administrative GUID if exists RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence IB/mlx4: Fix memory leak in add_gid error flow IB/mlx5: Expose RoCE accelerator counters RDMA/mlx5: Set relaxed ordering when requested RDMA/core: Add the core support field to METHOD_GET_CONTEXT ...
2020-01-31mm, tree-wide: rename put_user_page*() to unpin_user_page*()John Hubbard5-9/+9
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODPJohn Hubbard5-5/+5
Convert infiniband to use the new pin_user_pages*() calls. Also, revert earlier changes to Infiniband ODP that had it using put_user_page(). ODP is "Case 3" in Documentation/core-api/pin_user_pages.rst, which is to say, normal get_user_pages() and put_page() is the API to use there. The new pin_user_pages*() calls replace corresponding get_user_pages*() calls, and set the FOLL_PIN flag. The FOLL_PIN flag requires that the caller must return the pages via put_user_page*() calls, but infiniband was already doing that as part of an earlier commit. Link: http://lkml.kernel.org/r/20200107224558.2362728-14-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-30RDMA/core: Make the entire API tree staticJason Gunthorpe2-6/+3
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates the following error. on x86_64: ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC': main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file' ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class' ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler' This is happening because some parts of the UAPI description are not static. This is a hold over from earlier code that relied on struct pointers to refer to object types, now object types are referenced by number. Remove the unused globals and add statics to the remaining UAPI description elements. Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete mlx5_ib_get_devx_tree(). The compiler now trims alot more unused code, including the above problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS. Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods") Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-29RDMA/efa: Mask access flags with the correct optional rangeGal Pressman1-1/+1
The uapi value IB_UVERBS_ACCESS_OPTIONAL_RANGE shouldn't be used inside the driver, use IB_ACCESS_OPTIONAL instead. Fixes: 86dd738cf20c ("RDMA/efa: Allow passing of optional access flags for MR registration") Link: https://lore.kernel.org/r/20200129071803.40117-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds30-155/+237
Pull networking updates from David Miller: 1) Add WireGuard 2) Add HE and TWT support to ath11k driver, from John Crispin. 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca. 4) Add variable window congestion control to TIPC, from Jon Maloy. 5) Add BCM84881 PHY driver, from Russell King. 6) Start adding netlink support for ethtool operations, from Michal Kubecek. 7) Add XDP drop and TX action support to ena driver, from Sameeh Jubran. 8) Add new ipv4 route notifications so that mlxsw driver does not have to handle identical routes itself. From Ido Schimmel. 9) Add BPF dynamic program extensions, from Alexei Starovoitov. 10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes. 11) Add support for macsec HW offloading, from Antoine Tenart. 12) Add initial support for MPTCP protocol, from Christoph Paasch, Matthieu Baerts, Florian Westphal, Peter Krystad, and many others. 13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu Cherian, and others. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits) net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC udp: segment looped gso packets correctly netem: change mailing list qed: FW 8.42.2.0 debug features qed: rt init valid initialization changed qed: Debug feature: ilt and mdump qed: FW 8.42.2.0 Add fw overlay feature qed: FW 8.42.2.0 HSI changes qed: FW 8.42.2.0 iscsi/fcoe changes qed: Add abstraction for different hsi values per chip qed: FW 8.42.2.0 Additional ll2 type qed: Use dmae to write to widebus registers in fw_funcs qed: FW 8.42.2.0 Parser offsets modified qed: FW 8.42.2.0 Queue Manager changes qed: FW 8.42.2.0 Expose new registers and change windows qed: FW 8.42.2.0 Internal ram offsets modifications MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver Documentation: net: octeontx2: Add RVU HW and drivers overview octeontx2-pf: ethtool RSS config support octeontx2-pf: Add basic ethtool support ...
2020-01-28Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-5/+5
Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ...
2020-01-28Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Cleanup of the GOP [graphics output] handling code in the EFI stub - Complete refactoring of the mixed mode handling in the x86 EFI stub - Overhaul of the x86 EFI boot/runtime code - Increase robustness for mixed mode code - Add the ability to disable DMA at the root port level in the EFI stub - Get rid of RWX mappings in the EFI memory map and page tables, where possible - Move the support code for the old EFI memory mapping style into its only user, the SGI UV1+ support code. - plus misc fixes, updates, smaller cleanups. ... and due to interactions with the RWX changes, another round of PAT cleanups make a guest appearance via the EFI tree - with no side effects intended" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) efi/x86: Disable instrumentation in the EFI runtime handling code efi/libstub/x86: Fix EFI server boot failure efi/x86: Disallow efi=old_map in mixed mode x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping efi: Fix handling of multiple efi_fake_mem= entries efi: Fix efi_memmap_alloc() leaks efi: Add tracking for dynamically allocated memmaps efi: Add a flags parameter to efi_memory_map efi: Fix comment for efi_mem_type() wrt absent physical addresses efi/arm: Defer probe of PCIe backed efifb on DT systems efi/x86: Limit EFI old memory map to SGI UV machines efi/x86: Avoid RWX mappings for all of DRAM efi/x86: Don't map the entire kernel text RW for mixed mode x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd efi/libstub/x86: Fix unused-variable warning efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode efi/libstub/x86: Use const attribute for efi_is_64bit() efi: Allow disabling PCI busmastering on bridges during boot efi/x86: Allow translating 64-bit arguments for mixed mode calls ...
2020-01-27Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremapLinus Torvalds7-10/+10
Pull ioremap updates from Christoph Hellwig: "Remove the ioremap_nocache API (plus wrappers) that are always identical to ioremap" * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap: remove ioremap_nocache and devm_ioremap_nocache MIPS: define ioremap_nocache to ioremap
2020-01-27IB/mlx4: Fix leak in id_map_find_delHåkon Bugge1-26/+3
Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted from the cache. When a DREP is sent from mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is called. This function wipes out the TID in use from the IDR or XArray and removes the id_map_entry from the table. In short, it does everything except the topping of the cake, which is to remove the entry from the list and free it. In other words, for the REJ case enumerated above, one id_map_entry will be leaked. For the other case above, a DREQ has been received first. The reception of the DREQ will trigger queuing of a delayed work to delete the id_map_entry, for the case where the VM doesn't send back a DREP. In the normal case, the VM _will_ send back a DREP, and id_map_find_del() will be called. But this scenario introduces a secondary leak. First, when the DREQ is received, a delayed work is queued. The VM will then return a DREP, which will call id_map_find_del(). As stated above, this will free the TID used from the XArray or IDR. Now, there is window where that particular TID can be re-allocated, lets say by an outgoing REQ. This TID will later be wiped out by the delayed work, when the function id_map_ent_timeout() is called. But the id_map_entry allocated by the outgoing REQ will not be de-allocated, and we have a leak. Both leaks are fixed by removing the id_map_find_del() function and only using schedule_delayed(). Of course, a check in schedule_delayed() to see if the work already has been queued, has been added. Another benefit of always using the delayed version for deleting entries, is that we do get a TimeWait effect; a TID no longer in use, will occupy the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability of being re-used for that time period. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25IB/hfi1: Fix logical condition in msix_request_irqNathan Chancellor1-1/+1
Clang warns: drivers/infiniband/hw/hfi1/msix.c:136:22: warning: overlapping comparisons always evaluate to false [-Wtautological-overlap-compare] if (type < IRQ_SDMA && type >= IRQ_OTHER) ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ 1 warning generated. It is impossible for something to be less than 0 (IRQ_SDMA) and greater than or equal to 3 (IRQ_OTHER) at the same time. A logical OR should have been used to keep the same logic as before. Link: https://lore.kernel.org/r/20200116222658.5285-1-natechancellor@gmail.com Link: https://github.com/ClangBuiltLinux/linux/issues/841 Fixes: 13d2a8384bd9 ("IB/hfi1: Decouple IRQ name from type") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25IB/mlx5: Return the administrative GUID if existsDanit Goldberg1-16/+12
A user can change the operational GUID (a.k.a affective GUID) through link/infiniband. Therefore it is preferred to return the currently set GUID if it exists instead of the operational. This way the PF can query which VF GUID will be set in the next bind. In order to align with MAC address, zero is returned if administrative GUID is not set. For example, before setting administrative GUID: $ ip link show ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off Then: $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00 $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00 After setting administrative GUID: $ ip link show ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off Fixes: 9c0015ef0928 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes") Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21Merge tag 'rds-odp-for-5.5' into rdma.git for-nextJason Gunthorpe33-167/+258
From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== * tag 'rds-odp-for-5.5': net/rds: Use prefetch for On-Demand-Paging MR net/rds: Handle ODP mr registration/unregistration net/rds: Detect need of On-Demand-Paging memory registration RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs RDMA/mlx5: Don't fake udata for kernel path IB/mlx5: Add ODP WQE handlers for kernel QPs IB/core: Add interface to advise_mr for kernel users IB/core: Introduce ib_reg_user_mr IB: Allow calls to ib_umem_get from kernel ULPs Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdmaDavid S. Miller30-151/+231
Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-20Merge tag 'v5.5-rc7' into perf/core, to pick up fixesIngo Molnar5-16/+27
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-19Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller5-16/+27
2020-01-16Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxSaeed Mahameed1-4/+6
This merge syncs with mlx5-next latest HW bits and layout updates for next features, in addition one patch that improves mlx5_create_auto_grouped_flow_table() API across all mlx5 users. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Refactor mlx5_create_auto_grouped_flow_table net/mlx5e: Add discard counters per priority net/mlx5e: Expose FEC feilds and related capability bit net/mlx5: Add mlx5_ifc definitions for connection tracking support net/mlx5: Add copy header action struct layout net/mlx5: Expose resource dump register mapping net/mlx5: Add structures and defines for MIRC register net/mlx5: Read MCAM register groups 1 and 2 net/mlx5: Add structures layout for new MCAM access reg groups net/mlx5: Expose vDPA emulation device capabilities net/mlx5: Add Virtio Emulation related device capabilities Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Refactor mlx5_create_auto_grouped_flow_tablePaul Blakey1-4/+6
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param which already carries the max_fte, prio and flags memebers, and is used the same in similar mlx5_create_flow_table() function. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16IB/mlx4: Fix memory leak in add_gid error flowJack Morgenstein1-4/+16
In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW gid table, there is a memory leak in the driver's copy of the gid table: the gid entry's context buffer is not freed. If such an error occurs, free the entry's context buffer, and mark the entry as available (by setting its context pointer to NULL). Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks") Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB/mlx5: Expose RoCE accelerator countersAvihai Horon1-0/+18
Introduce the following RoCE accelerator counters: * roce_adp_retrans - number of adaptive retransmission for RoCE traffic. * roce_adp_retrans_to - number of times RoCE traffic reached time out due to adaptive retransmission. * roce_slow_restart - number of times RoCE slow restart was used. * roce_slow_restart_cnps - number of times RoCE slow restart generate CNP packets. * roce_slow_restart_trans - number of times RoCE slow restart change state to slow restart. Link: https://lore.kernel.org/r/20200115145459.83280-3-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/mlx5: Set relaxed ordering when requestedMichael Guralnik4-5/+23
Enable relaxed ordering in the mkey context when requested. As relaxed ordering is not currently supported in UMR, disable UMR usage for relaxed ordering MRs. Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/efa: Allow passing of optional access flags for MR registrationMichael Guralnik1-0/+1
As part of adding a range of optional access flags that drivers need to be able to accept, mask this range inside efa driver. This will prevent the driver from failing when an access flag from that range is passed. Link: https://lore.kernel.org/r/1578506740-22188-8-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/mlx5: Fix handling of IOVA != user_va in ODP pathsJason Gunthorpe2-6/+15
Till recently it was not possible for userspace to specify a different IOVA, but with the new ibv_reg_mr_iova() library call this can be done. To compute the user_va we must compute: user_va = (iova - iova_start) + user_va_start while being cautious of overflow and other math problems. The iova is not reliably stored in the mmkey when the MR is created. Only the cached creation path (the common one) set it, so it must also be set when creating uncached MRs. Fix the weird use of iova when computing the starting page index in the MR. In the normal case, when iova == umem.address: iova & (~(BIT(page_shift) - 1)) == ALIGN_DOWN(umem.address, odp->page_size) == ib_umem_start(odp) And when iova is different using it in math with a user_va is wrong. Finally, do not allow an implicit ODP to be created with a non-zero IOVA as we have no support for that. Fixes: 7bdf65d411c1 ("IB/mlx5: Handle page faults") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB/mlx5: Mask out unsupported ODP capabilities for kernel QPsMoni Shoua1-0/+17
The ODP handler for WQEs in RQ or SRQ is not implented for kernel QPs. Therefore don't report support in these if query comes from a kernel user. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16RDMA/mlx5: Don't fake udata for kernel pathLeon Romanovsky1-18/+16
Kernel paths must not set udata and provide NULL pointer, instead of faking zeroed udata struct. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB/mlx5: Add ODP WQE handlers for kernel QPsMoni Shoua3-70/+117
One of the steps in ODP page fault handler for WQEs is to read a WQE from a QP send queue or receive queue buffer at a specific index. Since the implementation of this buffer is different between kernel and user QP the implementation of the handler needs to be aware of that and handle it in a different way. ODP for kernel MRs is currently supported only for RDMA_READ and RDMA_WRITE operations so change the handler to - read a WQE from a kernel QP send queue - fail if access to receive queue or shared receive queue is required for a kernel QP Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB/core: Introduce ib_reg_user_mrMoni Shoua2-1/+4
Add ib_reg_user_mr() for kernel ULPs to register user MRs. The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP that acts as an agent for the userspace application. This function is intended to be used without a user context so vendor drivers need to be aware of calling reg_user_mr() device operation with udata equal to NULL. Among all drivers, i40iw is the only driver which relies on presence of udata, so check udata existence for that driver. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16IB: Allow calls to ib_umem_get from kernel ULPsMoni Shoua28-56/+62
So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-15RDMA/efa: Remove unused ucontext parameter from efa_qp_user_mmap_entries_removeGal Pressman1-7/+4
The ucontext parameter is unused, remove it. Link: https://lore.kernel.org/r/20200114085706.82229-6-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15RDMA/efa: Remove {} brackets from single statement ifGal Pressman1-2/+1
The {} brackets are not needed according to the Linux coding style. Link: https://lore.kernel.org/r/20200114085706.82229-5-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15RDMA/efa: Device definitions documentation updatesGal Pressman1-13/+24
Various clarifications and updates to the documentation of the device definitions. No functional changes in this patch. Link: https://lore.kernel.org/r/20200114085706.82229-4-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15RDMA/hns: Add support for extended atomic in userspaceJiaran Zhang2-2/+17
To support extended atomic operations including cmp & swap and fetch & add of 8 bytes, 16 bytes, 32 bytes, 64 bytes in userspace, some field in qpc should be configured. Link: https://lore.kernel.org/r/1579052546-11746-1-git-send-email-liweihang@huawei.com Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15RDMA/hns: Get pf capabilities from firmwareLijun Ou2-109/+6
Get pf capabilities from firmware according to different hardwares, if it fails, all capabilities will be set with a default value. Link: https://lore.kernel.org/r/1578738761-3176-4-git-send-email-liweihang@huawei.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15RDMA/hns: Add interfaces to get pf capabilities from firmwareLijun Ou3-0/+527
pf capabilities are set by default for hip08 previously which should depends on different types of hardware. So add new interfaces to get them from firmware. Link: https://lore.kernel.org/r/1578738761-3176-3-git-send-email-liweihang@huawei.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15RDMA/hns: Remove some redundant variables related to capabilitiesWeihang Li3-7/+1
In struct hns_roce_caps, max_srq_sg and max_srqwqes is unused, and max_srqs has the same effect with num_srqs. So remove them from this structrue. Link: https://lore.kernel.org/r/1578738761-3176-2-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/mlx5: Simplify devx async commandsJason Gunthorpe1-14/+10
With the new FD structure the async commands do not need to hold any references while running. The existing mlx5_cmd_exec_cb() and mlx5_cmd_cleanup_async_ctx() provide enough synchronization to ensure that all outstanding commands are completed before the uobject can be destructed. Remove the now confusing get_file() and the type erasure of the devx_async_cmd_event_file. Link: https://lore.kernel.org/r/1578504126-9400-4-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/core: Simplify destruction of FD uobjectsJason Gunthorpe1-63/+48
FD uobjects have a weird split between the struct file and uobject world. Simplify this to make them pure uobjects and use a generic release method for all struct file operations. This fixes the control flow so that mlx5_cmd_cleanup_async_ctx() is always called before erasing the linked list contents to make the concurrancy simpler to understand. For this to work the uobject destruction must fence anything that it is cleaning up - the design must not rely on struct file lifetime. Only deliver_event() relies on the struct file to when adding new events to the queue, add a is_destroyed check under lock to block it. Link: https://lore.kernel.org/r/1578504126-9400-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13RDMA/mlx5: Use RCU and direct refcounts to keep memory aliveJason Gunthorpe1-17/+17
dispatch_event_fd() runs from a notifier with minimal locking, and relies on RCU and a file refcount to keep the uobject and eventfd alive. As the next patch wants to remove the file_operations release function from the drivers, re-organize things so that the devx_event_notifier() path uses the existing RCU to manage the lifetime of the uobject and eventfd. Move the refcount puts to a call_rcu so that the objects are guaranteed to exist and remove the indirect file refcount. Link: https://lore.kernel.org/r/1578504126-9400-2-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12IB/mlx5: Add mmap support for VARYishai Hadas1-1/+4
Add mmap support for VAR, it uses the 'offset' command mode with involvement of IB core APIs to find the previously allocated mmap entry. Link: https://lore.kernel.org/r/20191212110928.334995-6-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12IB/mlx5: Introduce VAR object and its alloc/destroy methodsYishai Hadas2-0/+164
Introduce VAR object and its alloc/destroy KABI methods. The internal implementation uses the IB core API to manage mmap/munamp calls. Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>