aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-26Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-12/+0
Pull SCSI fixes from James Bottomley: "Two last minute fixes, both in drivers. The fnic one is a highly unlikely condition, but the RDMA one is a recently introduced regression that causes a kernel warning to trigger in every RDMA logon, which would be unsightly if it got into the final release" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: RDMA/isert: Fix a recently introduced regression related to logout scsi: fnic: do not queue commands during fwreset
2020-01-21scsi: RDMA/isert: Fix a recently introduced regression related to logoutBart Van Assche1-12/+0
iscsit_close_connection() calls isert_wait_conn(). Due to commit e9d3009cb936 both functions call target_wait_for_sess_cmds() although that last function should be called only once. Fix this by removing the target_wait_for_sess_cmds() call from isert_wait_conn() and by only calling isert_wait_conn() after target_wait_for_sess_cmds(). Fixes: e9d3009cb936 ("scsi: target: iscsi: Wait for all commands to finish before freeing a session"). Link: https://lore.kernel.org/r/20200116044737.19507-1-bvanassche@acm.org Reported-by: Rahul Kundu <rahul.kundu@chelsio.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-07i40iw: Remove setting of VMA private data and use rdma_user_mmap_ioShiraz Saleem1-8/+6
vm_ops is now initialized in ib_uverbs_mmap() with the recent rdma mmap API changes. Earlier it was done in rdma_umap_priv_init() which would not be called unless a driver called rdma_user_mmap_io() in its mmap. i40iw does not use the rdma_user_mmap_io API but sets the vma's vm_private_data to a driver object. This now conflicts with the vm_op rdma_umap_close as priv pointer points to the i40iw driver object instead of the private data setup by core when rdma_user_mmap_io is called. This leads to a crash in rdma_umap_close with a mmap put being called when it should not have. Remove the redundant setting of the vma private_data in i40iw as it is not used. Also move i40iw over to use the rdma_user_mmap_io API. This gives the extra protection of having the mappings zapped when the context is detsroyed. BUG: unable to handle page fault for address: 0000000100000001 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 6 PID: 9528 Comm: rping Kdump: loaded Not tainted 5.5.0-rc4+ #117 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H, BIOS F7 01/17/2014 RIP: 0010:rdma_user_mmap_entry_put+0xa/0x30 [ib_core] RSP: 0018:ffffb340c04c7c38 EFLAGS: 00010202 RAX: 00000000ffffffff RBX: ffff9308e7be2a00 RCX: 000000000000cec0 RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000100000001 RBP: ffff9308dc7641f0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffffffff8d4414d8 R12: ffff93075182c780 R13: 0000000000000001 R14: ffff93075182d2a8 R15: ffff9308e2ddc840 FS: 0000000000000000(0000) GS:ffff9308fdc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000100000001 CR3: 00000002e0412004 CR4: 00000000001606e0 Call Trace: rdma_umap_close+0x40/0x90 [ib_uverbs] remove_vma+0x43/0x80 exit_mmap+0xfd/0x1b0 mmput+0x6e/0x130 do_exit+0x290/0xcc0 ? get_signal+0x152/0xc40 do_group_exit+0x46/0xc0 get_signal+0x1bd/0xc40 ? prepare_to_wait_event+0x97/0x190 do_signal+0x36/0x630 ? remove_wait_queue+0x60/0x60 ? __audit_syscall_exit+0x1d9/0x290 ? rcu_read_lock_sched_held+0x52/0x90 ? kfree+0x21c/0x2e0 exit_to_usermode_loop+0x4f/0xc3 do_syscall_64+0x1ed/0x270 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fae715a81fd Code: Bad RIP value. RSP: 002b:00007fae6e163cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: fffffffffffffe00 RBX: 00007fae6e163d30 RCX: 00007fae715a81fd RDX: 0000000000000010 RSI: 00007fae6e163cf0 RDI: 0000000000000003 RBP: 00000000013413a0 R08: 00007fae68000000 R09: 0000000000000017 R10: 0000000000000001 R11: 0000000000000293 R12: 00007fae680008c0 R13: 00007fae6e163cf0 R14: 00007fae717c9804 R15: 00007fae6e163ed0 CR2: 0000000100000001 ---[ end trace b33d58d3a06782cb ]--- RIP: 0010:rdma_user_mmap_entry_put+0xa/0x30 [ib_core] Fixes: b86deba977a9 ("RDMA/core: Move core content from ib_uverbs to ib_core") Link: https://lore.kernel.org/r/20200107162223.1745-1-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03IB/hfi1: Adjust flow PSN with the correct resync_psnKaike Wan1-0/+9
When a TID RDMA ACK to RESYNC request is received, the flow PSNs for pending TID RDMA WRITE segments will be adjusted with the next flow generation number, based on the resync_psn value extracted from the flow PSN of the TID RDMA ACK packet. The resync_psn value indicates the last flow PSN for which a TID RDMA WRITE DATA packet has been received by the responder and the requester should resend TID RDMA WRITE DATA packets, starting from the next flow PSN. However, if resync_psn points to the last flow PSN for a segment and the next segment flow PSN starts with a new generation number, use of the old resync_psn to adjust the flow PSN for the next segment will lead to miscalculation, resulting in WARN_ON and sge rewinding errors: WARNING: CPU: 4 PID: 146961 at /nfs/site/home/phcvs2/gitrepo/ifs-all/components/Drivers/tmp/rpmbuild/BUILD/ifs-kernel-updates-3.10.0_957.el7.x86_64/hfi1/tid_rdma.c:4764 hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1] Modules linked in: ib_ipoib(OE) hfi1(OE) rdmavt(OE) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel ib_isert iscsi_target_mod target_core_mod aesni_intel lrw gf128mul glue_helper ablk_helper cryptd rpcrdma sunrpc opa_vnic ast ttm ib_iser libiscsi drm_kms_helper scsi_transport_iscsi ipmi_ssif syscopyarea sysfillrect sysimgblt fb_sys_fops drm joydev ipmi_si pcspkr sg drm_panel_orientation_quirks ipmi_devintf lpc_ich i2c_i801 ipmi_msghandler wmi rdma_ucm ib_ucm ib_uverbs acpi_cpufreq acpi_power_meter ib_umad rdma_cm ib_cm iw_cm ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul i2c_algo_bit crct10dif_common crc32c_intel e1000e ib_core ahci libahci ptp libata pps_core nfit libnvdimm [last unloaded: rdmavt] CPU: 4 PID: 146961 Comm: kworker/4:0H Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.0X.02.0117.040420182310 04/04/2018 Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1] Call Trace: <IRQ> [<ffffffff9e361dc1>] dump_stack+0x19/0x1b [<ffffffff9dc97648>] __warn+0xd8/0x100 [<ffffffff9dc9778d>] warn_slowpath_null+0x1d/0x20 [<ffffffffc05d28c6>] hfi1_rc_rcv_tid_rdma_ack+0x8f6/0xa90 [hfi1] [<ffffffffc05c21cc>] hfi1_kdeth_eager_rcv+0x1dc/0x210 [hfi1] [<ffffffffc05c23ef>] ? hfi1_kdeth_expected_rcv+0x1ef/0x210 [hfi1] [<ffffffffc0574f15>] kdeth_process_eager+0x35/0x90 [hfi1] [<ffffffffc0575b5a>] handle_receive_interrupt_nodma_rtail+0x17a/0x2b0 [hfi1] [<ffffffffc056a623>] receive_context_interrupt+0x23/0x40 [hfi1] [<ffffffff9dd4a294>] __handle_irq_event_percpu+0x44/0x1c0 [<ffffffff9dd4a442>] handle_irq_event_percpu+0x32/0x80 [<ffffffff9dd4a4cc>] handle_irq_event+0x3c/0x60 [<ffffffff9dd4d27f>] handle_edge_irq+0x7f/0x150 [<ffffffff9dc2e554>] handle_irq+0xe4/0x1a0 [<ffffffff9e3795dd>] do_IRQ+0x4d/0xf0 [<ffffffff9e36b362>] common_interrupt+0x162/0x162 <EOI> [<ffffffff9dfa0f79>] ? swiotlb_map_page+0x49/0x150 [<ffffffffc05c2ed1>] hfi1_verbs_send_dma+0x291/0xb70 [hfi1] [<ffffffffc05c2c40>] ? hfi1_wait_kmem+0xf0/0xf0 [hfi1] [<ffffffffc05c3f26>] hfi1_verbs_send+0x126/0x2b0 [hfi1] [<ffffffffc05ce683>] _hfi1_do_tid_send+0x1d3/0x320 [hfi1] [<ffffffff9dcb9d4f>] process_one_work+0x17f/0x440 [<ffffffff9dcbade6>] worker_thread+0x126/0x3c0 [<ffffffff9dcbacc0>] ? manage_workers.isra.25+0x2a0/0x2a0 [<ffffffff9dcc1c31>] kthread+0xd1/0xe0 [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40 [<ffffffff9e374c1d>] ret_from_fork_nospec_begin+0x7/0x21 [<ffffffff9dcc1b60>] ? insert_kthread_work+0x40/0x40 This patch fixes the issue by adjusting the resync_psn first if the flow generation has been advanced for a pending segment. Fixes: 9e93e967f7b4 ("IB/hfi1: Add a function to receive TID RDMA ACK packet") Link: https://lore.kernel.org/r/20191219231920.51069.37147.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03IB/hfi1: Don't cancel unused work itemKaike Wan1-1/+3
In the iowait structure, two iowait_work entries were included to queue a given object: one for normal IB operations, and the other for TID RDMA operations. For non-TID RDMA operations, the iowait_work structure for TID RDMA is initialized to contain a NULL function (not used). When the QP is reset, the function iowait_cancel_work will be called to cancel any pending work. The problem is that this function will call cancel_work_sync() for both iowait_work entries, even though the one for TID RDMA is not used at all. Eventually, the call cascades to __flush_work(), wherein a WARN_ON will be triggered due to the fact that work->func is NULL. The WARN_ON was introduced in commit 4d43d395fed1 ("workqueue: Try to catch flush_work() without INIT_WORK().") This patch fixes the issue by making sure that a work function is present for TID RDMA before calling cancel_work_sync in iowait_cancel_work. Fixes: 4d43d395fed1 ("workqueue: Try to catch flush_work() without INIT_WORK().") Fixes: 5da0fc9dbf89 ("IB/hfi1: Prepare resource waits for dual leg") Link: https://lore.kernel.org/r/20191219211941.58387.39883.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/bnxt_re: Fix Send Work Entry state check while polling completionsSelvin Xavier1-6/+6
Some adapters need a fence Work Entry to handle retransmission. Currently the driver checks for this condition, only if the Send queue entry is signalled. Implement the condition check, irrespective of the signalled state of the Work queue entries Failure to add the fence can result in access to memory that is already marked as completed, triggering data corruption, transmission failure, IOMMU failures, etc. Fixes: 9152e0b722b2 ("RDMA/bnxt_re: HW workarounds for handling specific conditions") Link: https://lore.kernel.org/r/1574671174-5064-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/bnxt_re: Avoid freeing MR resources if dereg failsSelvin Xavier1-1/+3
The driver returns an error code for MR dereg, but frees the MR structure. When the MR dereg is retried due to previous error, the system crashes as the structure is already freed. BUG: unable to handle kernel NULL pointer dereference at 00000000000001b8 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 12178 Comm: ib_send_bw Kdump: loaded Not tainted 4.18.0-124.el8.x86_64 #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.1.10 03/10/2015 RIP: 0010:__dev_printk+0x2a/0x70 Code: 0f 1f 44 00 00 49 89 d1 48 85 f6 0f 84 f6 2b 00 00 4c 8b 46 70 4d 85 c0 75 04 4c 8b 46 10 48 8b 86 a8 00 00 00 48 85 c0 74 16 <48> 8b 08 0f be 7f 01 48 c7 c2 13 ac ac 83 83 ef 30 e9 10 fe ff ff RSP: 0018:ffffaf7c04607a60 EFLAGS: 00010006 RAX: 00000000000001b8 RBX: ffffa0010c91c488 RCX: 0000000000000246 RDX: ffffaf7c04607a68 RSI: ffffa0010c91caa8 RDI: ffffffff83a788eb RBP: ffffaf7c04607ac8 R08: 0000000000000000 R09: ffffaf7c04607a68 R10: 0000000000000000 R11: 0000000000000001 R12: ffffaf7c04607b90 R13: 000000000000000e R14: 0000000000000000 R15: 00000000ffffa001 FS: 0000146fa1f1cdc0(0000) GS:ffffa0012fac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000001b8 CR3: 000000007680a003 CR4: 00000000001606e0 Call Trace: dev_err+0x6c/0x90 ? dev_printk_emit+0x4e/0x70 bnxt_qplib_rcfw_send_message+0x594/0x660 [bnxt_re] ? dev_err+0x6c/0x90 bnxt_qplib_free_mrw+0x80/0xe0 [bnxt_re] bnxt_re_dereg_mr+0x2e/0xd0 [bnxt_re] ib_dereg_mr+0x2f/0x50 [ib_core] destroy_hw_idr_uobject+0x20/0x70 [ib_uverbs] uverbs_destroy_uobject+0x2e/0x170 [ib_uverbs] __uverbs_cleanup_ufile+0x6e/0x90 [ib_uverbs] uverbs_destroy_ufile_hw+0x61/0x130 [ib_uverbs] ib_uverbs_close+0x1f/0x80 [ib_uverbs] __fput+0xb7/0x230 task_work_run+0x8a/0xb0 do_exit+0x2da/0xb40 ... RIP: 0033:0x146fa113a387 Code: Bad RIP value. RSP: 002b:00007fff945d1478 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02 RAX: 0000000000000000 RBX: 000055a248908d70 RCX: 0000000000000000 RDX: 0000146fa1f2b000 RSI: 0000000000000001 RDI: 000055a248906488 RBP: 000055a248909630 R08: 0000000000010000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 000055a248906488 R13: 0000000000000001 R14: 0000000000000000 R15: 000055a2489095f0 Do not free the MR structures, when driver returns error to the stack. Fixes: 872f3578241d ("RDMA/bnxt_re: Add support for MRs with Huge pages") Link: https://lore.kernel.org/r/1574671174-5064-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds11-73/+173
Pull rdma fixes from Doug Ledford: "A small collection of -rc fixes. Mostly. One API addition, but that's because we wanted to use it in a fix. There's also a bug fix that is going to render the 5.5 kernel's soft-RoCE driver incompatible with all soft-RoCE versions prior, but it's required to actually implement the protocol according to the RoCE spec and required in order for the soft-RoCE driver to be able to successfully work with actual RoCE hardware. Summary: - Update Steve Wise info - Fix for soft-RoCE crc calculations (will break back compatibility, but only with the soft-RoCE driver, which has had this bug since it was introduced and it is an on-the-wire bug, but will make soft-RoCE fully compatible with real RoCE hardware) - cma init fixup - counters oops fix - fix for mlx4 init/teardown sequence - fix for mkx5 steering rules - introduce a cleanup API, which isn't a fix, but we want to use it in the next fix - fix for mlx5 memory management that uses API in previous patch" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/mlx5: Fix device memory flows IB/core: Introduce rdma_user_mmap_entry_insert_range() API IB/mlx5: Fix steering rule of drop and count IB/mlx4: Follow mirror sequence of device add during device removal RDMA/counter: Prevent auto-binding a QP which are not tracked with res rxe: correctly calculate iCRC for unaligned payloads Update mailmap info for Steve Wise RDMA/cma: add missed unregister_pernet_subsys in init failure
2019-12-12IB/mlx5: Fix device memory flowsYishai Hadas4-52/+105
Fix device memory flows so that only once there will be no live mmaped VA to a given allocation the matching object will be destroyed. This prevents a potential scenario that existing VA that was mmaped by one process might still be used post its deallocation despite that it's owned now by other process. The above is achieved by integrating with IB core APIs to manage mmap/munmap. Only once the refcount will become 0 the DM object and its underlay area will be freed. Fixes: 3b113a1ec3d4 ("IB/mlx5: Support device memory type attribute") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212100237.330654-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12IB/core: Introduce rdma_user_mmap_entry_insert_range() APIYishai Hadas1-9/+39
Introduce rdma_user_mmap_entry_insert_range() API to be used once the required key for the given entry should be in a given range. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212100237.330654-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12IB/mlx5: Fix steering rule of drop and countMaor Gottlieb1-7/+6
There are two flow rule destinations: QP and packet. While users are setting DROP packet rule, the QP should not be set as a destination. Fixes: 3b3233fbf02e ("IB/mlx5: Add flow counters binding support") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12IB/mlx4: Follow mirror sequence of device add during device removalParav Pandit1-4/+5
Current code device add sequence is: ib_register_device() ib_mad_init() init_sriov_init() register_netdev_notifier() Therefore, the remove sequence should be, unregister_netdev_notifier() close_sriov() mad_cleanup() ib_unregister_device() However it is not above. Hence, make do above remove sequence. Fixes: fa417f7b520ee ("IB/mlx4: Add support for IBoE") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12RDMA/counter: Prevent auto-binding a QP which are not tracked with resMark Zhang1-0/+3
Some QPs (e.g. XRC QP) are not tracked in kernel, in this case they have an invalid res and should not be bound to any dynamically-allocated counter in auto mode. This fixes below call trace: BUG: kernel NULL pointer dereference, address: 0000000000000390 PGD 80000001a7233067 P4D 80000001a7233067 PUD 1a7215067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 24822 Comm: ibv_xsrq_pingpo Not tainted 5.4.0-rc5+ #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 RIP: 0010:rdma_counter_bind_qp_auto+0x142/0x270 [ib_core] Code: e1 48 85 c0 48 89 c2 0f 84 bc 00 00 00 49 8b 06 48 39 42 48 75 d6 40 3a aa 90 00 00 00 75 cd 49 8b 86 00 01 00 00 48 8b 4a 28 <8b> 80 90 03 00 00 39 81 90 03 00 00 75 b4 85 c0 74 b0 48 8b 04 24 RSP: 0018:ffffc900003f39c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff88820020ec00 RSI: 0000000000000004 RDI: ffffffffffffffc0 RBP: 0000000000000001 R08: ffff888224149ff0 R09: ffffc900003f3968 R10: ffffffffffffffff R11: ffff8882249c5848 R12: ffffffffffffffff R13: ffff88821d5aca50 R14: ffff8881f7690800 R15: ffff8881ff890000 FS: 00007fe53a3e1740(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000390 CR3: 00000001a7292006 CR4: 00000000003606a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: _ib_modify_qp+0x3a4/0x3f0 [ib_core] ? lookup_get_idr_uobject.part.8+0x23/0x40 [ib_uverbs] modify_qp+0x322/0x3e0 [ib_uverbs] ib_uverbs_modify_qp+0x43/0x70 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs] ib_uverbs_run_method+0x6be/0x760 [ib_uverbs] ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs] ib_uverbs_cmd_verbs+0x18d/0x3a0 [ib_uverbs] ? get_acl+0x1a/0x120 ? __alloc_pages_nodemask+0x15d/0x2c0 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs] do_vfs_ioctl+0xa5/0x610 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x48/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 99fa331dc862 ("RDMA/counter: Add "auto" configuration mode support") Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Ido Kalir <idok@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-09rxe: correctly calculate iCRC for unaligned payloadsSteve Wise3-1/+14
If RoCE PDUs being sent or received contain pad bytes, then the iCRC is miscalculated, resulting in PDUs being emitted by RXE with an incorrect iCRC, as well as ingress PDUs being dropped due to erroneously detecting a bad iCRC in the PDU. The fix is to include the pad bytes, if any, in iCRC computations. Note: This bug has caused broken on-the-wire compatibility with actual hardware RoCE devices since the soft-RoCE driver was first put into the mainstream kernel. Fixing it will create an incompatibility with the original soft-RoCE devices, but is necessary to be compatible with real hardware devices. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Steve Wise <larrystevenwise@gmail.com> Link: https://lore.kernel.org/r/20191203020319.15036-2-larrystevenwise@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-09treewide: Use sizeof_field() macroPankaj Bharadiya4-5/+5
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09RDMA/cma: add missed unregister_pernet_subsys in init failureChuhong Yuan1-0/+1
The driver forgets to call unregister_pernet_subsys() in the error path of cma_init(). Add the missed call to fix it. Fixes: 4be74b42a6d0 ("IB/cma: Separate port allocation to network namespaces") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Link: https://lore.kernel.org/r/20191206012426.12744-1-hslester96@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-04net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca2-7/+8
ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-01Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playgroundLinus Torvalds1-2/+2
Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-30Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds9-389/+128
Pull hmm updates from Jason Gunthorpe: "This is another round of bug fixing and cleanup. This time the focus is on the driver pattern to use mmu notifiers to monitor a VA range. This code is lifted out of many drivers and hmm_mirror directly into the mmu_notifier core and written using the best ideas from all the driver implementations. This removes many bugs from the drivers and has a very pleasing diffstat. More drivers can still be converted, but that is for another cycle. - A shared branch with RDMA reworking the RDMA ODP implementation - New mmu_interval_notifier API. This is focused on the use case of monitoring a VA and simplifies the process for drivers - A common seq-count locking scheme built into the mmu_interval_notifier API usable by drivers that call get_user_pages() or hmm_range_fault() with the VA range - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev drivers to the new API. This deletes a lot of wonky driver code. - Two improvements for hmm_range_fault(), from testing done by Ralph" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap mm/hmm: make full use of walk_page_range() xen/gntdev: use mmu_interval_notifier_insert mm/hmm: remove hmm_mirror and related drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror drm/amdgpu: Call find_vma under mmap_sem nouveau: use mmu_interval_notifier instead of hmm_mirror nouveau: use mmu_notifier directly for invalidate_range_start drm/radeon: use mmu_interval_notifier_insert RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv RDMA/odp: Use mmu_interval_notifier_insert() mm/hmm: define the pre-processor related parts of hmm.h even if disabled mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror mm/mmu_notifier: add an interval tree notifier mm/mmu_notifier: define the header pre-processor parts even if disabled mm/hmm: allow snapshot of the special zero page
2019-11-27Merge tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds2-55/+16
Pull driver core updates from Greg KH: "Here is the "big" set of driver core patches for 5.5-rc1 There's a few minor cleanups and fixes in here, but the majority of the patches in here fall into two buckets: - debugfs api cleanups and fixes - driver core device link support for boot dependancy issues The debugfs api cleanups are working to slowly refactor the debugfs apis so that it is even harder to use incorrectly. That work has been happening for the past few kernel releases and will continue over time, it's a long-term project/goal The driver core device link support missed 5.4 by just a bit, so it's been sitting and baking for many months now. It's from Saravana Kannan to help resolve the problems that DT-based systems have at boot time with dependancy graphs and kernel modules. Turns out that no one has actually tried to build a generic arm64 kernel with loads of modules and have it "just work" for a variety of platforms (like a distro kernel). The big problem turned out to be a lack of dependency information between different areas of DT entries, and the work here resolves that problem and now allows devices to boot properly, and quicker than a monolith kernel. All of these patches have been in linux-next for a long time with no reported issues" * tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits) tracing: Remove unnecessary DEBUG_FS dependency of: property: Add device link support for interrupt-parent, dmas and -gpio(s) debugfs: Fix !DEBUG_FS debugfs_create_automount of: property: Add device link support for "iommu-map" of: property: Fix the semantics of of_is_ancestor_of() i2c: of: Populate fwnode in of_i2c_get_board_info() drivers: base: Fix Kconfig indentation firmware_loader: Fix labels with comma for builtin firmware driver core: Allow device link operations inside sync_state() driver core: platform: Declare ret variable only once cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h> crypto: hisilicon: no need to check return value of debugfs_create functions driver core: platform: use the correct callback type for bus_find_device firmware_class: make firmware caching configurable driver core: Clarify documentation for fwnode_operations.add_links() mailbox: tegra: Fix superfluous IRQ error message net: caif: Fix debugfs on 64-bit platforms mac80211: Use debugfs_create_xul() helper media: c8sectpfe: no need to check return value of debugfs_create functions of: property: Add device link support for iommus, mboxes and io-channels ...
2019-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds158-12467/+3934
Pull rdma updates from Jason Gunthorpe: "Again another fairly quiet cycle with few notable core code changes and the usual variety of driver bug fixes and small improvements. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits) RDMA/hns: Delete unnecessary callback functions for cq RDMA/hns: Rename the functions used inside creating cq RDMA/hns: Redefine the member of hns_roce_cq struct RDMA/hns: Redefine interfaces used in creating cq RDMA/efa: Expose RDMA read related attributes RDMA/efa: Support remote read access in MR registration RDMA/efa: Store network attributes in device attributes IB/hfi1: remove redundant assignment to variable ret RDMA/bnxt_re: Fix missing le16_to_cpu RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series RDMA/bnxt_re: Fix Kconfig indentation IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset RDMA/cm: Use refcount_t type for refcount variable IB/mlx5: Support extended number of strides for Striding RQ IB/mlx4: Update HW GID table while adding vlan GID ...
2019-11-25RDMA/hns: Delete unnecessary callback functions for cqYixian Liu2-58/+36
Currently, when cq event occurred, we first call our own callback functions in the event process function, then call ib callback functions. Actually, we can directly call ib callback functions. Link: https://lore.kernel.org/r/1574044493-46984-5-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/hns: Rename the functions used inside creating cqYixian Liu5-56/+35
Current names of functions are not proper, such as hns_roce_free_cq, actually it means free cqc, thus we rename them. Furthermore, functions used inside one file can be named without the prefix hns_roce_ which will make the functions for verbs symbols more eye-catching. Link: https://lore.kernel.org/r/1574044493-46984-4-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/hns: Redefine the member of hns_roce_cq structYixian Liu4-56/+38
There is no need to package buf and mtt into hns_roce_cq_buf, which will make code more complex, just delete this struct and move buf and mtt into hns_roce_cq. Furthermore, we add size member for hns_roce_buf to avoid repeatly calculating where needed it. Link: https://lore.kernel.org/r/1574044493-46984-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/hns: Redefine interfaces used in creating cqYixian Liu4-55/+53
Some interfaces defined with unnecessary input parameters, such as "nent" and "vector". This patch redefined these interfaces to make the code more readable and simple. Link: https://lore.kernel.org/r/1574044493-46984-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/efa: Expose RDMA read related attributesDaniel Kranzdorf4-5/+40
Query the device attributes for RDMA operations, including maximum transfer size and maximum number of SGEs per RDMA WR, and report them back to the userspace library. Link: https://lore.kernel.org/r/20191121141509.59297-4-galpress@amazon.com Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/efa: Support remote read access in MR registrationDaniel Kranzdorf4-13/+14
Enable remote read access for memory regions in order to support RDMA operations. Link: https://lore.kernel.org/r/20191121141509.59297-3-galpress@amazon.com Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/efa: Store network attributes in device attributesGal Pressman5-51/+18
There's no reason to separate the network attributes from all other device attributes. Embed the fields inside the device attributes and query them all in one function. Link: https://lore.kernel.org/r/20191121141509.59297-2-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25IB/hfi1: remove redundant assignment to variable retColin Ian King1-1/+1
The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Link: https://lore.kernel.org/r/20191122154814.87257-1-colin.king@canonical.com Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/bnxt_re: Fix missing le16_to_cpuDevesh Sharma1-4/+4
From sparse: drivers/infiniband/hw/bnxt_re/main.c:1274:18: warning: cast from restricted __le16 drivers/infiniband/hw/bnxt_re/main.c:1275:18: warning: cast from restricted __le16 drivers/infiniband/hw/bnxt_re/main.c:1276:18: warning: cast from restricted __le16 drivers/infiniband/hw/bnxt_re/main.c:1277:21: warning: restricted __le16 degrades to integer Fixes: 2b827ea1926b ("RDMA/bnxt_re: Query HWRM Interface version from FW") Link: https://lore.kernel.org/r/1574317343-23300-4-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devicesDevesh Sharma1-0/+1
Due to recent advances in the firmware for Broadcom's gen p5 series of adaptors the driver code to report hardware counters has been broken w.r.t. roce devices. The new firmware command expects dma length to be specified during stat dma buffer allocation. Fixes: 2792b5b95ed5 ("bnxt_en: Update firmware interface spec. to 1.10.0.89.") Link: https://lore.kernel.org/r/1574317343-23300-3-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 seriesLuke Starrett1-2/+6
In the first version of Gen P5 ASIC, chip-id was always set to 0x1750 for all adaptor port configurations. This has been fixed in the new chip rev. Due to this missing fix users are not able to use adaptors based on latest chip rev of Broadcom's Gen P5 adaptors. Fixes: ae8637e13185 ("RDMA/bnxt_re: Add chip context to identify 57500 series") Link: https://lore.kernel.org/r/1574317343-23300-2-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Luke Starrett <luke.starrett@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25RDMA/bnxt_re: Fix Kconfig indentationKrzysztof Kozlowski1-6/+6
Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Link: https://lore.kernel.org/r/20191120134138.15245-1-krzk@kernel.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-25Merge branch 'ib-guids' into rdma.git for-nextJason Gunthorpe8-22/+72
Danit Goldberg says: ==================== This series extends RTNETLINK to provide IB port and node GUIDs, which were configured for Infiniband VFs. The functionality to set VF GUIDs already existed for a long time, and here we are adding the missing "get" so that netlink will be symmetric and various cloud orchestration tools will be able to manage such VFs more naturally. The iproute2 was extended too to present those GUIDs. - ip link show <device> For example: - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33 - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10 - ip link show ib4 ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'ib-guids': (35 commits) IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs net/mlx5: Add new chain for netfilter flow table offload net/mlx5: Refactor creating fast path prio chains net/mlx5: Accumulate levels for chains prio namespaces net/mlx5: Define fdb tc levels per prio net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines net/mlx5: Simplify fdb chain and prio eswitch defines IB/mlx5: Load profile according to RoCE enablement state IB/mlx5: Rename profile and init methods net/mlx5: Handle "enable_roce" devlink param net/mlx5: Document flow_steering_mode devlink param devlink: Add new "enable_roce" generic device param net/mlx5: fix spelling mistake "metdata" -> "metadata" net/mlx5: fix kvfree of uninitialized pointer spec IB/mlx5: Introduce and use mlx5_core_is_vf() net/mlx5: E-switch, Enable metadata on own vport net/mlx5: Refactor ingress acl configuration ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcvJason Gunthorpe4-93/+60
This converts one of the two users of mmu_notifiers to use the new API. The conversion is fairly straightforward, however the existing use of notifiers here seems to be racey. Link: https://lore.kernel.org/r/20191112202231.3856-7-jgg@ziepe.ca Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23RDMA/odp: Use mmu_interval_notifier_insert()Jason Gunthorpe5-296/+68
Replace the internal interval tree based mmu notifier with the new common mmu_interval_notifier_insert() API. This removes a lot of code and fixes a deadlock that can be triggered in ODP: zap_page_range() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) unmap_single_vma() [..] __split_huge_page_pmd() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) // DEADLOCK mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) The umem_rwsem is held across the range_start/end as the ODP algorithm for invalidate_range_end cannot tolerate changes to the interval tree. However, due to the nested invalidation regions the second down_read() can deadlock if there are competing writers. The new core code provides an alternative scheme to solve this problem. Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count") Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca Tested-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23net: use rhashtable_lookup() instead of rhashtable_lookup_fast()Taehee Yoo1-2/+2
rhashtable_lookup_fast() internally calls rcu_read_lock() then, calls rhashtable_lookup(). So if rcu_read_lock() is already held, rhashtable_lookup() is enough. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-22IB/mlx5: Implement callbacks for getting VFs GUID attributesDanit Goldberg3-0/+28
Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW for VFs attributes and return node and port GUIDs. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-22IB/ipoib: Add ndo operation for getting VFs GUID attributesDanit Goldberg1-0/+10
Add ndo operation to the network driver that enables configuring ipoib_get_vf_guid operation. The operation allows to get a VF port and node GUIDs. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-22IB/core: Add interfaces to get VF node and port GUIDsDanit Goldberg2-0/+11
Provide ability to get node and port GUIDs of VFs to be symmetrical to already existing set option. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-19RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offsetMichal Kalderon1-5/+10
When running against rdma-core that doesn't support doorbell recovery, the rdma_user_mmap_entry won't be allocated for doorbell recovery related mappings. We have a flag indicating whether rdma-core supports doorbell recovery or not which was used during initialization, however some cases didn't check that the rdma_user_mmap_entry exists before attempting to acquire it's offset. Fixes: 97f612509294 ("RDMA/qedr: Add doorbell overflow recovery support") Link: https://lore.kernel.org/r/20191118150645.26602-1-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19RDMA/cm: Use refcount_t type for refcount variableDanit Goldberg1-12/+12
This atomic in struct cm_id_private is being used as a refcount, change it to refcount_t for better clarity and to get the refcount protections. Link: https://lore.kernel.org/r/1573997601-4502-1-git-send-email-danitg@mellanox.com Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19IB/mlx5: Support extended number of strides for Striding RQMark Zhang3-12/+43
Extends the minimum single WQE strides from 64 to 8, which is exposed by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps. Choose right number of strides based on FW capability. Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-19IB/mlx4: Update HW GID table while adding vlan GIDDanit Goldberg2-1/+9
When adding a new GID compare the vlan along with the GID and type. This allows vlan's to have GIDs that alias each other, such as the default GID. Otherwise they the GID cache view can become inconsistent with the HW view. Link: https://lore.kernel.org/r/20191115154457.247763-1-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()'Christophe JAILLET1-2/+2
We should jump to fail3 in order to undo the 'xa_insert_irq()' call. Link: https://lore.kernel.org/r/20190923190746.10964-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17RDMA/cma: Use ACK timeout for RoCE packetLifeTimeDag Moxnes1-2/+13
The cma is currently using a hard-coded value, CMA_IBOE_PACKET_LIFETIME, for the PacketLifeTime, as it can not be determined from the network. This value might not be optimal for all networks. The cma module supports the function rdma_set_ack_timeout to set the ACK timeout for a QP associated with a connection. As per IBTA 12.7.34 local ACK timeout = (2 * PacketLifeTime + Local CA’s ACK delay). Assuming a negligible local ACK delay, we can use PacketLifeTime = local ACK timeout/2 as a reasonable approximation for RoCE networks. Link: https://lore.kernel.org/r/1572439440-17416-1-git-send-email-dag.moxnes@oracle.com Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17IB/umem: remove the dmasync argument to ib_umem_getChristoph Hellwig30-59/+52
The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller7-39/+46
Lots of overlapping changes and parallel additions, stuff like that. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-14dma-mapping: remove the DMA_ATTR_WRITE_BARRIER flagChristoph Hellwig1-7/+2
This flag is not implemented by any backend and only set by the ib_umem module in a single instance. Link: https://lore.kernel.org/r/20191113073214.9514-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14RDMA/efa: Clear the admin command buffer prior to its submissionGal Pressman1-1/+4
We cannot rely on the entry memcpy as we only copy the actual size of the command, the rest of the bytes must be memset to zero. Currently providing non-zero memory will not have any user visible impact. However, since admin commands are extendable (in a backwards compatible way) everything beyond the size of the command must be cleared to prevent issues in the future. Fixes: 0420e542569b ("RDMA/efa: Implement functions that submit and complete admin commands") Link: https://lore.kernel.org/r/20191112092608.46964-1-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>