aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-02-14IB/mlx5: Use div64_u64 for num_var_hw_entries calculationJason Gunthorpe1-1/+1
On i386: ERROR: "__udivdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! ERROR: "__divdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! Fixes: f164be8c0366 ("IB/mlx5: Extend caps stage to handle VAR capabilities") Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reported-by: Alexander Lobakin <alobakin@dlink.ru> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/core: Fix protection fault in get_pkey_idx_qp_listLeon Romanovsky1-15/+9
We don't need to set pkey as valid in case that user set only one of pkey index or port number, otherwise it will be resulted in NULL pointer dereference while accessing to uninitialized pkey list. The following crash from Syzkaller revealed it. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0 Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48 8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8 RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010 RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430 R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000 FS: 00007f20777de700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: port_pkey_list_insert+0xd7/0x7c0 ib_security_modify_qp+0x6fa/0xfc0 _ib_modify_qp+0x8c4/0xbf0 modify_qp+0x10da/0x16d0 ib_uverbs_modify_qp+0x9a/0x100 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Message-Id: <20200212080651.GB679970@unreal>
2020-02-13RDMA/rxe: Fix soft lockup problem due to using tasklets in softirqZhu Yanjun1-4/+4
When run stress tests with RXE, the following Call Traces often occur watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0] ... Call Trace: <IRQ> create_object+0x3f/0x3b0 kmem_cache_alloc_node_trace+0x129/0x2d0 __kmalloc_reserve.isra.52+0x2e/0x80 __alloc_skb+0x83/0x270 rxe_init_packet+0x99/0x150 [rdma_rxe] rxe_requester+0x34e/0x11a0 [rdma_rxe] rxe_do_task+0x85/0xf0 [rdma_rxe] tasklet_action_common.isra.21+0xeb/0x100 __do_softirq+0xd0/0x298 irq_exit+0xc5/0xd0 smp_apic_timer_interrupt+0x68/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> ... The root cause is that tasklet is actually a softirq. In a tasklet handler, another softirq handler is triggered. Usually these softirq handlers run on the same cpu core. So this will cause "soft lockup Bug". Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/mlx5: Prevent overflow in mmap offset calculationsLeon Romanovsky1-2/+2
The cmd and index variables declared as u16 and the result is supposed to be stored in u64. The C arithmetic rules doesn't promote "(index >> 8) << 16" to be u64 and leaves the end result to be u16. Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods") Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13IB/umad: Fix kernel crash while unloading ib_umadYonatan Cohen1-2/+3
When disassociating a device from umad we must ensure that the sysfs access is prevented before blocking the fops, otherwise assumptions in syfs don't hold: CPU0 CPU1 ib_umad_kill_port() ibdev_show() port->ib_dev = NULL dev_name(port->ib_dev) The prior patch made an error in moving the device_destroy(), it should have been split into device_del() (above) and put_device() (below). At this point we already have the split, so move the device_del() back to its original place. kernel stack PF: error_code(0x0000) - not-present page Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 Call Trace: dev_attr_show+0x15/0x50 sysfs_kf_seq_show+0xb8/0x1a0 seq_read+0x12d/0x350 vfs_read+0x89/0x140 ksys_read+0x55/0xd0 do_syscall_64+0x55/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9: Fixes: cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed") Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/mlx5: Fix async events cleanup flowsYishai Hadas1-23/+28
As in the prior patch, the devx code is not fully cleaning up its event_lists before finishing driver_destroy allowing a later read to trigger user after free conditions. Re-arrange things so that the event_list is always empty after destroy and ensure it remains empty until the file is closed. Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects") Link: https://lore.kernel.org/r/20200212072635.682689-7-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/core: Add missing list deletion on freeing event queueMichael Guralnik1-0/+1
When the uobject file scheme was revised to allow device disassociation from the file it became possible for read() to still happen the driver destroys the uobject. The old clode code was not tolerant to concurrent read, and when it was moved to the driver destroy it creates a bug. Ensure the event_list is empty after driver destroy by adding the missing list_del(). Otherwise read() can trigger a use after free and double kfree. Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects") Link: https://lore.kernel.org/r/20200212072635.682689-6-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11RDMA/siw: Remove unwanted WARN_ON in siw_cm_llp_data_ready()Krishnamraju Eraparaju1-3/+2
Warnings like below can fill up the dmesg while disconnecting RDMA connections. Hence, remove the unwanted WARN_ON. WARNING: CPU: 6 PID: 0 at drivers/infiniband/sw/siw/siw_cm.c:1229 siw_cm_llp_data_ready+0xc1/0xd0 [siw] RIP: 0010:siw_cm_llp_data_ready+0xc1/0xd0 [siw] Call Trace: <IRQ> tcp_data_queue+0x226/0xb40 tcp_rcv_established+0x220/0x620 tcp_v4_do_rcv+0x12a/0x1e0 tcp_v4_rcv+0xb05/0xc00 ip_local_deliver_finish+0x69/0x210 ip_local_deliver+0x6b/0xe0 ip_rcv+0x273/0x362 __netif_receive_skb_core+0xb35/0xc30 netif_receive_skb_internal+0x3d/0xb0 napi_gro_frags+0x13b/0x200 t4_ethrx_handler+0x433/0x7d0 [cxgb4] process_responses+0x318/0x580 [cxgb4] napi_rx_handler+0x14/0x100 [cxgb4] net_rx_action+0x149/0x3b0 __do_softirq+0xe3/0x30a irq_exit+0x100/0x110 do_IRQ+0x7f/0xe0 common_interrupt+0xf/0xf </IRQ> Link: https://lore.kernel.org/r/20200207141429.27927-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11RDMA/iw_cxgb4: initiate CLOSE when entering TERMKrishnamraju Eraparaju2-2/+6
As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE when entering into TERM state. In c4iw_modify_qp(), disconnect operation should only be performed when the modify_qp call is invoked from ib_core. And all other internal modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks like below can occur: Call Trace: schedule+0x2f/0xa0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.5+0x2d0/0x4a0 c4iw_ep_disconnect+0x39/0x430 => tries to reacquire ep lock again c4iw_modify_qp+0x468/0x10d0 rx_data+0x218/0x570 => acquires ep lock process_work+0x5f/0x70 process_one_work+0x1a7/0x3b0 worker_thread+0x30/0x390 kthread+0x112/0x130 ret_from_fork+0x35/0x40 Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state") Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supportedMark Zhang1-3/+6
When binding a QP with a counter and the QP state is not RESET, return failure if the rts2rts_qp_counters_set_id is not supported by the device. This is to prevent cases like manual bind for Connect-IB devices from returning success when the feature is not supported. Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter") Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11RDMA/core: Fix invalid memory access in spec_filter_sizeAvihai Horon1-8/+7
Add a check that the size specified in the flow spec header doesn't cause an overflow when calculating the filter size, and thus prevent access to invalid memory. The following crash from syzkaller revealed it. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:memchr_inv+0xd3/0x330 Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85 ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 <80> 3c 01 00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01 RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04 RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820 RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009 R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004 R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040 FS: 00007f9ee0e51700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: spec_filter_size.part.16+0x34/0x50 ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770 ib_uverbs_ex_create_flow+0x9ea/0x1b40 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49 RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) Fixes: 94e03f11ad1f ("IB/uverbs: Add support for flow tag") Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/rdmavt: Reset all QPs when the device is shut downKaike Wan1-33/+51
When the hfi1 device is shut down during a system reboot, it is possible that some QPs might have not not freed by ULPs. More requests could be post sent and a lingering timer could be triggered to schedule more packet sends, leading to a crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0 PGD 0 Oops: 0000 1 SMP Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1 Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018 task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000 RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0 RSP: 0018:ffff88105df43d48 EFLAGS: 00010046 RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000 RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000 R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0 R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000 FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000 ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98 ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde Call Trace: IRQ [ffffffff810a6b85] queue_work_on+0x45/0x50 [ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1] [ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffff81097316] call_timer_fn+0x36/0x110 [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffff8109982d] run_timer_softirq+0x22d/0x310 [ffffffff81090b3f] __do_softirq+0xef/0x280 [ffffffff816b6a5c] call_softirq+0x1c/0x30 [ffffffff8102d3c5] do_softirq+0x65/0xa0 [ffffffff81090ec5] irq_exit+0x105/0x110 [ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50 [ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80 EOI [ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0 [ffffffff81527b48] cpuidle_idle_call+0xd8/0x210 [ffffffff81034fee] arch_cpu_idle+0xe/0x30 [ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0 [ffffffff81051af6] start_secondary+0x1b6/0x230 Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00 RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0 RSP ffff88105df43d48 CR2: 0000000000000102 The solution is to reset the QPs before the device resources are freed. This reset will change the QP state to prevent post sends and delete timers to prevent callbacks. Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table") Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/hfi1: Close window for pq and request colidingMike Marciniszyn4-29/+48
Cleaning up a pq can result in the following warning and panic: WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0 list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100) Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G OE ------------ 3.10.0-957.38.3.el7.x86_64 #1 Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 Call Trace: [<ffffffff90365ac0>] dump_stack+0x19/0x1b [<ffffffff8fc98b78>] __warn+0xd8/0x100 [<ffffffff8fc98bff>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff8ff970c3>] __list_del_entry+0x63/0xd0 [<ffffffff8ff9713d>] list_del+0xd/0x30 [<ffffffff8fddda70>] kmem_cache_destroy+0x50/0x110 [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1] [<ffffffff8fe4519c>] __fput+0xec/0x260 [<ffffffff8fe453fe>] ____fput+0xe/0x10 [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0 [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0 [<ffffffff90379134>] int_signal+0x12/0x17 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 PGD 2cdab19067 PUD 2f7bfdb067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.38.3.el7.x86_64 #1 Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000 RIP: 0010:[<ffffffff8fe1f93e>] [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 RSP: 0018:ffff88b5393abd60 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003 RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800 RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19 R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000 R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000 FS: 00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: [<ffffffff8fe20d44>] __kmem_cache_shutdown+0x14/0x80 [<ffffffff8fddda78>] kmem_cache_destroy+0x58/0x110 [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1] [<ffffffff8fe4519c>] __fput+0xec/0x260 [<ffffffff8fe453fe>] ____fput+0xe/0x10 [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0 [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0 [<ffffffff90379134>] int_signal+0x12/0x17 Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39 RIP [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 RSP <ffff88b5393abd60> CR2: 0000000000000010 The panic is the result of slab entries being freed during the destruction of the pq slab. The code attempts to quiesce the pq, but looking for n_req == 0 doesn't account for new requests. Fix the issue by using SRCU to get a pq pointer and adjust the pq free logic to NULL the fd pq pointer prior to the quiesce. Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized") Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.com Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11IB/hfi1: Acquire lock to release TID entries when user file is closedKaike Wan1-0/+2
Each user context is allocated a certain number of RcvArray (TID) entries and these entries are managed through TID groups. These groups are put into one of three lists in each user context: tid_group_list, tid_used_list, and tid_full_list, depending on the number of used TID entries within each group. When TID packets are expected, one or more TID groups will be allocated. After the packets are received, the TID groups will be freed. Since multiple user threads may access the TID groups simultaneously, a mutex exp_mutex is used to synchronize the access. However, when the user file is closed, it tries to release all TID groups without acquiring the mutex first, which risks a race condition with another thread that may be releasing its TID groups, leading to data corruption. This patch addresses the issue by acquiring the mutex first before releasing the TID groups when the file is closed. Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs") Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-11RDMA/hfi1: Fix memory leak in _dev_comp_vect_mappings_createKamal Heib1-0/+2
Make sure to free the allocated cpumask_var_t's to avoid the following reported memory leak by kmemleak: $ cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8897f812d6a8 (size 8): comm "kworker/1:1", pid 347, jiffies 4294751400 (age 101.703s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000bff49664>] alloc_cpumask_var_node+0x4c/0xb0 [<0000000075d3ca81>] hfi1_comp_vectors_set_up+0x20f/0x800 [hfi1] [<0000000098d420df>] hfi1_init_dd+0x3311/0x4960 [hfi1] [<0000000071be7e52>] init_one+0x25e/0xf10 [hfi1] [<000000005483d4c2>] local_pci_probe+0xd4/0x180 [<000000007c3cbc6e>] work_for_cpu_fn+0x51/0xa0 [<000000001d626905>] process_one_work+0x8f0/0x17b0 [<000000007e569e7e>] worker_thread+0x536/0xb50 [<00000000fd39a4a5>] kthread+0x30c/0x3d0 [<0000000056f2edb3>] ret_from_fork+0x3a/0x50 Fixes: 5d18ee67d4c1 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support") Link: https://lore.kernel.org/r/20200205110530.12129-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds87-3099/+3760
Pull rdma updates from Jason Gunthorpe: "A very quiet cycle with few notable changes. Mostly the usual list of one or two patches to drivers changing something that isn't quite rc worthy. The subsystem seems to be seeing a larger number of rework and cleanup style patches right now, I feel that several vendors are prepping their drivers for new silicon. Summary: - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4, rxe, i40iw - Larger series doing cleanup and rework for hns and hfi1. - Some general reworking of the CM code to make it a little more understandable - Unify the different code paths connected to the uverbs FD scheme - New UAPI ioctls conversions for get context and get async fd - Trace points for CQ and CM portions of the RDMA stack - mlx5 driver support for virtio-net formatted rings as RDMA raw ethernet QPs - verbs support for setting the PCI-E relaxed ordering bit on DMA traffic connected to a MR - A couple of bug fixes that came too late to make rc7" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits) RDMA/core: Make the entire API tree static RDMA/efa: Mask access flags with the correct optional range RDMA/cma: Fix unbalanced cm_id reference count during address resolve RDMA/umem: Fix ib_umem_find_best_pgsz() IB/mlx4: Fix leak in id_map_find_del IB/opa_vnic: Spelling correction of 'erorr' to 'error' IB/hfi1: Fix logical condition in msix_request_irq RDMA/cm: Remove CM message structs RDMA/cm: Use IBA functions for complex structure members RDMA/cm: Use IBA functions for simple structure members RDMA/cm: Use IBA functions for swapping get/set acessors RDMA/cm: Use IBA functions for simple get/set acessors RDMA/cm: Add SET/GET implementations to hide IBA wire format RDMA/cm: Add accessors for CM_REQ transport_type IB/mlx5: Return the administrative GUID if exists RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence IB/mlx4: Fix memory leak in add_gid error flow IB/mlx5: Expose RoCE accelerator counters RDMA/mlx5: Set relaxed ordering when requested RDMA/core: Add the core support field to METHOD_GET_CONTEXT ...
2020-01-31mm, tree-wide: rename put_user_page*() to unpin_user_page*()John Hubbard7-11/+11
In order to provide a clearer, more symmetric API for pinning and unpinning DMA pages. This way, pin_user_pages*() calls match up with unpin_user_pages*() calls, and the API is a lot closer to being self-explanatory. Link: http://lkml.kernel.org/r/20200107224558.2362728-23-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODPJohn Hubbard8-14/+13
Convert infiniband to use the new pin_user_pages*() calls. Also, revert earlier changes to Infiniband ODP that had it using put_user_page(). ODP is "Case 3" in Documentation/core-api/pin_user_pages.rst, which is to say, normal get_user_pages() and put_page() is the API to use there. The new pin_user_pages*() calls replace corresponding get_user_pages*() calls, and set the FOLL_PIN flag. The FOLL_PIN flag requires that the caller must return the pages via put_user_page*() calls, but infiniband was already doing that as part of an earlier commit. Link: http://lkml.kernel.org/r/20200107224558.2362728-14-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31IB/umem: use get_user_pages_fast() to pin DMA pagesJohn Hubbard1-11/+6
And get rid of the mmap_sem calls, as part of that. Note that get_user_pages_fast() will, if necessary, fall back to __gup_longterm_unlocked(), which takes the mmap_sem as needed. Link: http://lkml.kernel.org/r/20200107224558.2362728-10-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-30RDMA/core: Make the entire API tree staticJason Gunthorpe3-23/+3
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates the following error. on x86_64: ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC': main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file' ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class' ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler' This is happening because some parts of the UAPI description are not static. This is a hold over from earlier code that relied on struct pointers to refer to object types, now object types are referenced by number. Remove the unused globals and add statics to the remaining UAPI description elements. Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete mlx5_ib_get_devx_tree(). The compiler now trims alot more unused code, including the above problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS. Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods") Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-29RDMA/efa: Mask access flags with the correct optional rangeGal Pressman1-1/+1
The uapi value IB_UVERBS_ACCESS_OPTIONAL_RANGE shouldn't be used inside the driver, use IB_ACCESS_OPTIONAL instead. Fixes: 86dd738cf20c ("RDMA/efa: Allow passing of optional access flags for MR registration") Link: https://lore.kernel.org/r/20200129071803.40117-1-galpress@amazon.com Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds36-198/+297
Pull networking updates from David Miller: 1) Add WireGuard 2) Add HE and TWT support to ath11k driver, from John Crispin. 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca. 4) Add variable window congestion control to TIPC, from Jon Maloy. 5) Add BCM84881 PHY driver, from Russell King. 6) Start adding netlink support for ethtool operations, from Michal Kubecek. 7) Add XDP drop and TX action support to ena driver, from Sameeh Jubran. 8) Add new ipv4 route notifications so that mlxsw driver does not have to handle identical routes itself. From Ido Schimmel. 9) Add BPF dynamic program extensions, from Alexei Starovoitov. 10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes. 11) Add support for macsec HW offloading, from Antoine Tenart. 12) Add initial support for MPTCP protocol, from Christoph Paasch, Matthieu Baerts, Florian Westphal, Peter Krystad, and many others. 13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu Cherian, and others. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits) net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC udp: segment looped gso packets correctly netem: change mailing list qed: FW 8.42.2.0 debug features qed: rt init valid initialization changed qed: Debug feature: ilt and mdump qed: FW 8.42.2.0 Add fw overlay feature qed: FW 8.42.2.0 HSI changes qed: FW 8.42.2.0 iscsi/fcoe changes qed: Add abstraction for different hsi values per chip qed: FW 8.42.2.0 Additional ll2 type qed: Use dmae to write to widebus registers in fw_funcs qed: FW 8.42.2.0 Parser offsets modified qed: FW 8.42.2.0 Queue Manager changes qed: FW 8.42.2.0 Expose new registers and change windows qed: FW 8.42.2.0 Internal ram offsets modifications MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver Documentation: net: octeontx2: Add RVU HW and drivers overview octeontx2-pf: ethtool RSS config support octeontx2-pf: Add basic ethtool support ...
2020-01-28RDMA/cma: Fix unbalanced cm_id reference count during address resolveParav Pandit1-0/+2
Below commit missed the AF_IB and loopback code flow in rdma_resolve_addr(). This leads to an unbalanced cm_id refcount in cma_work_handler() which puts the refcount which was not incremented prior to queuing the work. A call trace is observed with such code flow: BUG: unable to handle kernel NULL pointer dereference at (null) [<ffffffff96b67e16>] __mutex_lock_slowpath+0x166/0x1d0 [<ffffffff96b6715f>] mutex_lock+0x1f/0x2f [<ffffffffc0beabb5>] cma_work_handler+0x25/0xa0 [<ffffffff964b9ebf>] process_one_work+0x17f/0x440 [<ffffffff964baf56>] worker_thread+0x126/0x3c0 Hence, hold the cm_id reference when scheduling the resolve work item. Fixes: 722c7b2bfead ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()") Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28RDMA/umem: Fix ib_umem_find_best_pgsz()Artemy Kovalyov1-3/+6
Except for the last entry, the ending iova alignment sets the maximum possible page size as the low bits of the iova must be zero when starting the next chunk. Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR") Link: https://lore.kernel.org/r/20200128135612.174820-1-leon@kernel.org Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-5/+5
Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ...
2020-01-28Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Cleanup of the GOP [graphics output] handling code in the EFI stub - Complete refactoring of the mixed mode handling in the x86 EFI stub - Overhaul of the x86 EFI boot/runtime code - Increase robustness for mixed mode code - Add the ability to disable DMA at the root port level in the EFI stub - Get rid of RWX mappings in the EFI memory map and page tables, where possible - Move the support code for the old EFI memory mapping style into its only user, the SGI UV1+ support code. - plus misc fixes, updates, smaller cleanups. ... and due to interactions with the RWX changes, another round of PAT cleanups make a guest appearance via the EFI tree - with no side effects intended" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) efi/x86: Disable instrumentation in the EFI runtime handling code efi/libstub/x86: Fix EFI server boot failure efi/x86: Disallow efi=old_map in mixed mode x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping efi: Fix handling of multiple efi_fake_mem= entries efi: Fix efi_memmap_alloc() leaks efi: Add tracking for dynamically allocated memmaps efi: Add a flags parameter to efi_memory_map efi: Fix comment for efi_mem_type() wrt absent physical addresses efi/arm: Defer probe of PCIe backed efifb on DT systems efi/x86: Limit EFI old memory map to SGI UV machines efi/x86: Avoid RWX mappings for all of DRAM efi/x86: Don't map the entire kernel text RW for mixed mode x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd efi/libstub/x86: Fix unused-variable warning efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode efi/libstub/x86: Use const attribute for efi_is_64bit() efi: Allow disabling PCI busmastering on bridges during boot efi/x86: Allow translating 64-bit arguments for mixed mode calls ...
2020-01-27Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremapLinus Torvalds7-10/+10
Pull ioremap updates from Christoph Hellwig: "Remove the ioremap_nocache API (plus wrappers) that are always identical to ioremap" * tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap: remove ioremap_nocache and devm_ioremap_nocache MIPS: define ioremap_nocache to ioremap
2020-01-27IB/mlx4: Fix leak in id_map_find_delHåkon Bugge1-26/+3
Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted from the cache. When a DREP is sent from mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is called. This function wipes out the TID in use from the IDR or XArray and removes the id_map_entry from the table. In short, it does everything except the topping of the cake, which is to remove the entry from the list and free it. In other words, for the REJ case enumerated above, one id_map_entry will be leaked. For the other case above, a DREQ has been received first. The reception of the DREQ will trigger queuing of a delayed work to delete the id_map_entry, for the case where the VM doesn't send back a DREP. In the normal case, the VM _will_ send back a DREP, and id_map_find_del() will be called. But this scenario introduces a secondary leak. First, when the DREQ is received, a delayed work is queued. The VM will then return a DREP, which will call id_map_find_del(). As stated above, this will free the TID used from the XArray or IDR. Now, there is window where that particular TID can be re-allocated, lets say by an outgoing REQ. This TID will later be wiped out by the delayed work, when the function id_map_ent_timeout() is called. But the id_map_entry allocated by the outgoing REQ will not be de-allocated, and we have a leak. Both leaks are fixed by removing the id_map_find_del() function and only using schedule_delayed(). Of course, a check in schedule_delayed() to see if the work already has been queued, has been added. Another benefit of always using the delayed version for deleting entries, is that we do get a TimeWait effect; a TID no longer in use, will occupy the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability of being re-used for that time period. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-26Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-12/+0
Pull SCSI fixes from James Bottomley: "Two last minute fixes, both in drivers. The fnic one is a highly unlikely condition, but the RDMA one is a recently introduced regression that causes a kernel warning to trigger in every RDMA logon, which would be unsightly if it got into the final release" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: RDMA/isert: Fix a recently introduced regression related to logout scsi: fnic: do not queue commands during fwreset
2020-01-25IB/opa_vnic: Spelling correction of 'erorr' to 'error'Dillon Brock1-1/+1
Correcting a minor spelling mistake in the comments. Link: https://lore.kernel.org/r/20200118162542.15188-1-dab9861@gmail.com Signed-off-by: Dillon Brock <dab9861@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25IB/hfi1: Fix logical condition in msix_request_irqNathan Chancellor1-1/+1
Clang warns: drivers/infiniband/hw/hfi1/msix.c:136:22: warning: overlapping comparisons always evaluate to false [-Wtautological-overlap-compare] if (type < IRQ_SDMA && type >= IRQ_OTHER) ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ 1 warning generated. It is impossible for something to be less than 0 (IRQ_SDMA) and greater than or equal to 3 (IRQ_OTHER) at the same time. A logical OR should have been used to keep the same logic as before. Link: https://lore.kernel.org/r/20200116222658.5285-1-natechancellor@gmail.com Link: https://github.com/ClangBuiltLinux/linux/issues/841 Fixes: 13d2a8384bd9 ("IB/hfi1: Decouple IRQ name from type") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Remove CM message structsJason Gunthorpe2-288/+0
All accesses now use the new IBA acessor scheme, so delete the structs entirely and generate the structures from the schema file. Link: https://lore.kernel.org/r/20200116170037.30109-8-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Use IBA functions for complex structure membersJason Gunthorpe1-64/+107
Use a Coccinelle spatch to replace CM structure members used as structures, arrays, or pointers with IBA_GET/SET versions. Applied with $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c The spatch file was generated using the template pattern: @@ expression src; expression len; {struct} *msg; @@ - memcpy(msg->{old_name}, src, len) + IBA_SET_MEM({new_name}, msg, src, len) @@ {struct} *msg; identifier x; @@ - msg->{old_name}.x + IBA_GET_MEM_PTR({new_name}, msg)->x @@ {struct} *msg; @@ - &msg->{old_name} + IBA_GET_MEM_PTR({new_name}, msg) For GIDs: @@ {struct} *msg; @@ - msg->{old_name} + *IBA_GET_MEM_PTR({new_name}, msg) For non-GIDs: @@ {struct} *msg; @@ - msg->{old_name} + IBA_GET_MEM_PTR({new_name}, msg) Iterated for every remaining IBA_CHECK_OFF()/IBA_CHECK_GET() pairing. Touched up with clang-format after. Link: https://lore.kernel.org/r/20200116170037.30109-7-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Use IBA functions for simple structure membersJason Gunthorpe1-139/+245
Use a Coccinelle spatch script to replace use of simple CM structure members with IBA_GET/SET versions. Applied with $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c The spatch file was generated using the template pattern: @@ expression val; {struct} *msg; @@ - msg->{old_name} = val + IBA_SET({new_name}, msg, be{bits}_to_cpu(val)) @@ {struct} *msg; @@ - msg->{old_name} + cpu_to_be{bits}(IBA_GET({new_name}, msg)) Iterated for every IBA_CHECK_OFF that isn't a CM_FIELD_MLOC. And the below iterated over all byte sizes to remove doubled byte swaps: @@ expression val; @@ -be{bits}_to_cpu(cpu_to_be{bits}(val)) +val (and __be_to_cpu and ntoh varients) Touched up with clang-format after. Link: https://lore.kernel.org/r/20200116170037.30109-6-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Use IBA functions for swapping get/set acessorsJason Gunthorpe2-269/+36
Use a Coccinelle spatch script to replace CM helper functions that return/accept BE values with IBA_GET/SET versions. Applied with $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c The spatch file was generated using the template pattern: @@ expression val; {struct} *msg; @@ - {old_setter}(msg, val) + IBA_SET({new_name}, msg, be{bits}_to_cpu(val)) @@ {struct} *msg; @@ - {old_getter}(msg) + cpu_to_be{bits}(IBA_GET({new_name}, msg)) Iterated for every IBA_CHECK_GET_BE()/IBA_CHECK_SET_BE() pairing. And the below iterated over all byte sizes to remove doubled byte swaps: @@ expression val; @@ -be{bits}_to_cpu(cpu_to_be{bits}(val)) +val (and __be_to_cpu and ntoh varients) Touched up with clang-format after. Link: https://lore.kernel.org/r/20200116170037.30109-5-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Use IBA functions for simple get/set acessorsJason Gunthorpe2-509/+104
Use a Coccinelle spatch to replace CM helper functions with IBA_GET/SET versions. Applied with $ spatch --sp-file edits.sp --in-place drivers/infiniband/core/cm.c The spatch file was generated using the template pattern: @@ expression val; {struct} *msg; @@ - {old_setter} + IBA_SET({new_name}, msg, val) @@ {struct} *msg; @@ - {old_getter} + IBA_GET({new_name}, msg) Iterated for every IBA_CHECK_GET()/IBA_CHECK_GET() pairing. Touched up with clang-format after. Link: https://lore.kernel.org/r/20200116170037.30109-4-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Add SET/GET implementations to hide IBA wire formatLeon Romanovsky2-0/+268
There is no separation between RDMA-CM wire format as it is declared in IBTA and kernel logic which implements needed support. Such situation causes to many mistakes in conversion between big-endian (wire format) and CPU format used by kernel. It also mixes RDMA core code with combination of uXX and beXX variables. The idea that all accesses to IBA definitions will go through special GET/SET macros to ensure that no conversion mistakes are made. The shifting and masking required to read the value is automatically deduced using the field offset description from the tables in the IBA specification. This starts with the CM MADs described in IBTA release 1.3 volume 1. To confirm that the new macros behave the same as the old accessors a self-test is included in this patch. Each macro replacing a straightforward struct field compile-time tests that the new field has the same offsetof() and width as the old field. For the fields with accessor functions a runtime test, the 'all ones' value is placed in a dummy message and read back in several ways to confirm that both approaches give identical results. Later patches in this series delete the self test. This creates a tested table of new field name, old field name(s) and some meta information like BE coding for the functions which will be used in the next patches. Link: https://lore.kernel.org/r/20200116170037.30109-3-jgg@ziepe.ca Link: https://lore.kernel.org/r/20191212093830.316934-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/cm: Add accessors for CM_REQ transport_typeJason Gunthorpe1-12/+29
Access the two fields through wrappers, like all other fields, to make it clearer what is happening. Link: https://lore.kernel.org/r/20200116170037.30109-2-jgg@ziepe.ca Tested-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25IB/mlx5: Return the administrative GUID if existsDanit Goldberg1-16/+12
A user can change the operational GUID (a.k.a affective GUID) through link/infiniband. Therefore it is preferred to return the currently set GUID if it exists instead of the operational. This way the PF can query which VF GUID will be set in the next bind. In order to align with MAC address, zero is returned if administrative GUID is not set. For example, before setting administrative GUID: $ ip link show ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off Then: $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00 $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00 After setting administrative GUID: $ ip link show ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off Fixes: 9c0015ef0928 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes") Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-25RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fenceJason Gunthorpe1-0/+2
The set of entry->driver_removed is missing locking, protect it with xa_lock() which is held by the only reader. Otherwise readers may continue to see driver_removed = false after rdma_user_mmap_entry_remove() returns and may continue to try and establish new mmaps. Fixes: 3411f9f01b76 ("RDMA/core: Create mmap database and cookie helper functions") Link: https://lore.kernel.org/r/20200115202041.GA17199@ziepe.ca Reviewed-by: Gal Pressman <galpress@amazon.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21Merge tag 'rds-odp-for-5.5' into rdma.git for-nextJason Gunthorpe38-209/+317
From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== * tag 'rds-odp-for-5.5': net/rds: Use prefetch for On-Demand-Paging MR net/rds: Handle ODP mr registration/unregistration net/rds: Detect need of On-Demand-Paging memory registration RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs RDMA/mlx5: Don't fake udata for kernel path IB/mlx5: Add ODP WQE handlers for kernel QPs IB/core: Add interface to advise_mr for kernel users IB/core: Introduce ib_reg_user_mr IB: Allow calls to ib_umem_get from kernel ULPs Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdmaDavid S. Miller35-193/+290
Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21scsi: RDMA/isert: Fix a recently introduced regression related to logoutBart Van Assche1-12/+0
iscsit_close_connection() calls isert_wait_conn(). Due to commit e9d3009cb936 both functions call target_wait_for_sess_cmds() although that last function should be called only once. Fix this by removing the target_wait_for_sess_cmds() call from isert_wait_conn() and by only calling isert_wait_conn() after target_wait_for_sess_cmds(). Fixes: e9d3009cb936 ("scsi: target: iscsi: Wait for all commands to finish before freeing a session"). Link: https://lore.kernel.org/r/20200116044737.19507-1-bvanassche@acm.org Reported-by: Rahul Kundu <rahul.kundu@chelsio.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-20Merge tag 'v5.5-rc7' into perf/core, to pick up fixesIngo Molnar5-16/+27
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-19Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/netDavid S. Miller5-16/+27
2020-01-16Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxSaeed Mahameed1-4/+6
This merge syncs with mlx5-next latest HW bits and layout updates for next features, in addition one patch that improves mlx5_create_auto_grouped_flow_table() API across all mlx5 users. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Refactor mlx5_create_auto_grouped_flow_table net/mlx5e: Add discard counters per priority net/mlx5e: Expose FEC feilds and related capability bit net/mlx5: Add mlx5_ifc definitions for connection tracking support net/mlx5: Add copy header action struct layout net/mlx5: Expose resource dump register mapping net/mlx5: Add structures and defines for MIRC register net/mlx5: Read MCAM register groups 1 and 2 net/mlx5: Add structures layout for new MCAM access reg groups net/mlx5: Expose vDPA emulation device capabilities net/mlx5: Add Virtio Emulation related device capabilities Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Refactor mlx5_create_auto_grouped_flow_tablePaul Blakey1-4/+6
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param which already carries the max_fte, prio and flags memebers, and is used the same in similar mlx5_create_flow_table() function. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16IB/mlx4: Fix memory leak in add_gid error flowJack Morgenstein1-4/+16
In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW gid table, there is a memory leak in the driver's copy of the gid table: the gid entry's context buffer is not freed. If such an error occurs, free the entry's context buffer, and mark the entry as available (by setting its context pointer to NULL). Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks") Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB/mlx5: Expose RoCE accelerator countersAvihai Horon1-0/+18
Introduce the following RoCE accelerator counters: * roce_adp_retrans - number of adaptive retransmission for RoCE traffic. * roce_adp_retrans_to - number of times RoCE traffic reached time out due to adaptive retransmission. * roce_slow_restart - number of times RoCE slow restart was used. * roce_slow_restart_cnps - number of times RoCE slow restart generate CNP packets. * roce_slow_restart_trans - number of times RoCE slow restart change state to slow restart. Link: https://lore.kernel.org/r/20200115145459.83280-3-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16RDMA/mlx5: Set relaxed ordering when requestedMichael Guralnik4-5/+23
Enable relaxed ordering in the mkey context when requested. As relaxed ordering is not currently supported in UMR, disable UMR usage for relaxed ordering MRs. Link: https://lore.kernel.org/r/1578506740-22188-11-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>