aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-12RDMA/siw: replace redundant ternary operator with just rvColin Ian King1-1/+1
The use of the ternary operator on rv is redundant, rv is either the initialized value of 0 or a negative error return code, so it can never be greater than zero, and hence the zero assignment in ternary operator is redundant. Just return rv instead. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20250507131834.253823-1-colin.i.king@gmail.com Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-12RDMA/umem: Separate implicit ODP initialization from explicit ODPLeon Romanovsky1-45/+46
Create separate functions for the implicit ODP initialization which is different from the explicit ODP initialization. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-05-12RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkageLeon Romanovsky6-115/+73
Reuse newly added DMA API to cache IOVA and only link/unlink pages in fast path for UMEM ODP flow. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-05-12RDMA/umem: Store ODP access mask information in PFNLeon Romanovsky5-99/+70
As a preparation to remove dma_list, store access mask in PFN pointer and not in dma_addr_t. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-05-12mm/hmm: provide generic DMA managing logicLeon Romanovsky3-2/+251
HMM callers use PFN list to populate range while calling to hmm_range_fault(), the conversion from PFN to DMA address is done by the callers with help of another DMA list. However, it is wasteful on any modern platform and by doing the right logic, that DMA list can be avoided. Provide generic logic to manage these lists and gave an interface to map/unmap PFNs to DMA addresses, without requiring from the callers to be an experts in DMA core API. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-05-12mm/hmm: let users to tag specific PFN with DMA mapped bitLeon Romanovsky2-21/+50
Introduce new sticky flag (HMM_PFN_DMA_MAPPED), which isn't overwritten by HMM range fault. Such flag allows users to tag specific PFNs with information if this specific PFN was already DMA mapped. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-05-12RDMA/hns: Fix build error of hns_roce_traceJunxian Huang1-0/+1
Add include path to find hns_roce_trace.h to fix the following build error: In file included from drivers/infiniband/hw/hns/hns_roce_trace.h:213, from drivers/infiniband/hw/hns/hns_roce_hw_v2.c:53: ./include/trace/define_trace.h:110:42: fatal error: ./hns_roce_trace.h: No such file or directory 110 | #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) | ^ compilation terminated. Fixes: 02007e3ddc07 ("RDMA/hns: Add trace for flush CQE") Reported-by: Paul E. McKenney <paulmck@kernel.org> Closes: https://lore.kernel.org/linux-next/b7dd4dda-37d8-47e4-8d78-b6585be21cfd@paulmck-laptop/T/#t Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250507033903.2879433-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-06RDMA/siw: Remove unused siw_mem_addDr. David Alan Gilbert2-25/+0
siw_mem_add() was added in 2019 by commit 2251334dcac9 ("rdma/siw: application buffer management") but has remained unused. Remove it. Link: https://patch.msgid.link/r/20250505210226.88994-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-05-06IB/hfi1: Remove unused sc_drop and sdma_all_idleDr. David Alan Gilbert4-30/+0
sc_drop() and sdma_all_idle() were both added in 2015's commit 7724105686e7 ("IB/hfi1: add driver files") but have remained unused. Remove them. Link: https://patch.msgid.link/r/20250505205419.88131-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-05-06docs: core-api: document the IOVA-based APIChristoph Hellwig1-0/+71
Add an explanation of the newly added IOVA-based mapping API. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: add a dma_need_unmap helperChristoph Hellwig2-0/+23
Add helper that allows a driver to skip calling dma_unmap_* if the DMA layer can guarantee that they are no-nops. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: Implement link/unlink ranges APILeon Romanovsky2-1/+306
Introduce new DMA APIs to perform DMA linkage of buffers in layers higher than DMA. In proposed API, the callers will perform the following steps. In map path: if (dma_can_use_iova(...)) dma_iova_alloc() for (page in range) dma_iova_link_next(...) dma_iova_sync(...) else /* Fallback to legacy map pages */ for (all pages) dma_map_page(...) In unmap path: if (dma_can_use_iova(...)) dma_iova_destroy() else for (all pages) dma_unmap_page(...) Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06iommu/dma: Factor out a iommu_dma_map_swiotlb helperChristoph Hellwig1-32/+41
Split the iommu logic from iommu_dma_map_page into a separate helper. This not only keeps the code neatly separated, but will also allow for reuse in another caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: Provide an interface to allow allocate IOVALeon Romanovsky2-0/+134
The existing .map_pages() callback provides both allocating of IOVA and linking DMA pages. That combination works great for most of the callers who use it in control paths, but is less effective in fast paths where there may be multiple calls to map_page(). These advanced callers already manage their data in some sort of database and can perform IOVA allocation in advance, leaving range linkage operation to be in fast path. Provide an interface to allocate/deallocate IOVA and next patch link/unlink DMA ranges to that specific IOVA. In the new API a DMA mapping transaction is identified by a struct dma_iova_state, which holds some recomputed information for the transaction which does not change for each page being mapped, so add a check if IOVA can be used for the specific transaction. The API is exported from dma-iommu as it is the only implementation supported, the namespace is clearly different from iommu_* functions which are not allowed to be used. This code layout allows us to save function call per API call used in datapath as well as a lot of boilerplate code. Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06iommu: add kernel-doc for iommu_unmap_fastLeon Romanovsky1-0/+19
Add kernel-doc section for iommu_unmap_fast to document existing limitation of underlying functions which can't split individual ranges. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06iommu: generalize the batched sync after map interfaceChristoph Hellwig2-36/+33
For the upcoming IOVA-based DMA API we want to batch the ops->iotlb_sync_map() call after mapping multiple IOVAs from dma-iommu without having a scatterlist. Improve the API. Add a wrapper for the map_sync as iommu_sync_map() so that callers don't need to poke into the methods directly. Formalize __iommu_map() into iommu_map_nosync() which requires the caller to call iommu_sync_map() after all maps are completed. Refactor the existing sanity checks from all the different layers into iommu_map_nosync(). Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Will Deacon <will@kernel.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.hChristoph Hellwig4-85/+87
To support the upcoming non-scatterlist mapping helpers, we need to go back to have them called outside of the DMA API. Thus move them out of dma-map-ops.h, which is only for DMA API implementations to pci-p2pdma.h, which is for driver use. Note that the core helper is still not exported as the mapping is expected to be done only by very highlevel subsystem code at least for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06PCI/P2PDMA: Refactor the p2pdma mapping helpersChristoph Hellwig4-88/+91
The current scheme with a single helper to determine the P2P status and map a scatterlist segment force users to always use the map_sg helper to DMA map, which we're trying to get away from because they are very cache inefficient. Refactor the code so that there is a single helper that checks the P2P state for a page, including the result that it is not a P2P page to simplify the callers, and a second one to perform the address translation for a bus mapped P2P transfer that does not depend on the scatterlist structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-05RDMA/mlx5: Fix error flow upon firmware failure for RQ destructionPatrisious Haddad2-2/+29
Upon RQ destruction if the firmware command fails which is the last resource to be destroyed some SW resources were already cleaned regardless of the failure. Now properly rollback the object to its original state upon such failure. In order to avoid a use-after free in case someone tries to destroy the object again, which results in the following kernel trace: refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 37589 at lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148 Modules linked in: rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) rfkill mlx5_core(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) psample mlxfw(OE) mlx_compat(OE) macsec tls pci_hyperv_intf sunrpc vfat fat virtio_net net_failover failover fuse loop nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_console virtio_gpu virtio_blk virtio_dma_buf virtio_mmio dm_mirror dm_region_hash dm_log dm_mod xpmem(OE) CPU: 0 UID: 0 PID: 37589 Comm: python3 Kdump: loaded Tainted: G OE ------- --- 6.12.0-54.el10.aarch64 #1 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : refcount_warn_saturate+0xf4/0x148 lr : refcount_warn_saturate+0xf4/0x148 sp : ffff80008b81b7e0 x29: ffff80008b81b7e0 x28: ffff000133d51600 x27: 0000000000000001 x26: 0000000000000000 x25: 00000000ffffffea x24: ffff00010ae80f00 x23: ffff00010ae80f80 x22: ffff0000c66e5d08 x21: 0000000000000000 x20: ffff0000c66e0000 x19: ffff00010ae80340 x18: 0000000000000006 x17: 0000000000000000 x16: 0000000000000020 x15: ffff80008b81b37f x14: 0000000000000000 x13: 2e656572662d7265 x12: ffff80008283ef78 x11: ffff80008257efd0 x10: ffff80008283efd0 x9 : ffff80008021ed90 x8 : 0000000000000001 x7 : 00000000000bffe8 x6 : c0000000ffff7fff x5 : ffff0001fb8e3408 x4 : 0000000000000000 x3 : ffff800179993000 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000133d51600 Call trace: refcount_warn_saturate+0xf4/0x148 mlx5_core_put_rsc+0x88/0xa0 [mlx5_ib] mlx5_core_destroy_rq_tracked+0x64/0x98 [mlx5_ib] mlx5_ib_destroy_wq+0x34/0x80 [mlx5_ib] ib_destroy_wq_user+0x30/0xc0 [ib_core] uverbs_free_wq+0x28/0x58 [ib_uverbs] destroy_hw_idr_uobject+0x34/0x78 [ib_uverbs] uverbs_destroy_uobject+0x48/0x240 [ib_uverbs] __uverbs_cleanup_ufile+0xd4/0x1a8 [ib_uverbs] uverbs_destroy_ufile_hw+0x48/0x120 [ib_uverbs] ib_uverbs_close+0x2c/0x100 [ib_uverbs] __fput+0xd8/0x2f0 __fput_sync+0x50/0x70 __arm64_sys_close+0x40/0x90 invoke_syscall.constprop.0+0x74/0xd0 do_el0_svc+0x48/0xe8 el0_svc+0x44/0x1d0 el0t_64_sync_handler+0x120/0x130 el0t_64_sync+0x1a4/0x1a8 Fixes: e2013b212f9f ("net/mlx5_core: Add RQ and SQ event handling") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://patch.msgid.link/3181433ccdd695c63560eeeb3f0c990961732101.1745839855.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-05IB/cm: Drop lockdep assert and WARN when freeing old msgVlad Dumitrescu1-1/+2
The send completion handler can run after cm_id has advanced to another message. The cm_id lock is not needed in this case, but a recent change re-used cm_free_priv_msg(), which asserts that the lock is held and WARNs if the cm_id's currently outstanding msg is different than the one being freed. Fixes: 1e5159219076 ("IB/cm: Do not hold reference on cm_id unless needed") Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com> Reviewed-by: Sean Hefty <shefty@nvidia.com> Link: https://patch.msgid.link/0c364c29142f72b7875fdeba51f3c9bd6ca863ee.1745839788.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-27IB/hfi1: Adjust fd->entry_to_rb allocation typeKees Cook1-1/+1
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "struct tid_rb_node **", but the return type will be "struct rb_node **". These are the same allocation size (pointer size), but the types do not match. Adjust the allocation type to match the assignment. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250426061247.work.261-kees@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-27IB/mthca: Adjust buddy->bits allocation typeKees Cook1-1/+1
In preparation for making the kmalloc family of allocators type aware, we need to make sure that the returned type from the allocation matches the type of the variable being assigned. (Before, the allocator would always return "void *", which can be implicitly cast to any pointer type.) The assigned type is "unsigned long **", but the returned type will be "long **". These are the same allocation size (pointer size), but the types do not match. Adjust the allocation type to match the assignment. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20250426061208.work.000-kees@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for CMDQ dumpingJunxian Huang2-0/+39
Add trace for CMDQ dumping. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | kworker/u512:1-14003 [089] b..1. 50737.238304: hns_cmdq_req: 0000:bd:00.0 cmdq opcode:0x8500, flag:0x1, retval:0x0, data:{0x2,0x0,0x0,0xffff0000,0x32323232,0x0} kworker/u512:1-14003 [089] b..1. 50737.238316: hns_cmdq_resp: 0000:bd:00.0 cmdq opcode:0x8500, flag:0x2, retval:0x0, data:{0x2,0x0,0x0,0xffff0000,0x32323232,0x0} Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-7-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Include hnae3.h in hns_roce_hw_v2.hJunxian Huang5-4/+1
hns_roce_hw_v2.h has a direct dependency on hnae3.h due to the inline function hns_roce_write64(), but it doesn't include this header currently. This leads to that files including hns_roce_hw_v2.h must also include hnae3.h to avoid compilation errors, even if they themselves don't really rely on hnae3.h. This doesn't make sense, hns_roce_hw_v2.h should include hnae3.h directly. Fixes: d3743fa94ccd ("RDMA/hns: Fix the chip hanging caused by sending doorbell during reset") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for MR/MTR attribute dumpingJunxian Huang2-0/+68
Add trace for MR/MTR attribute dumping. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | ib_send_bw-14751 [111] ..... 8763.823038: hns_buf_attr: rg cnt:1, pg_sft:0xc, mtt_only:no, rg 0 (sz:131072, hop:2), rg 1 (sz:0, hop:0), rg 2 (sz:0, hop:0) ib_send_bw-14751 [111] ..... 8763.823118: hns_mr: iova:0xffffb2968000, size:131072, key:512, pd:1, pbl_hop:1, npages:4, type:0, status:0 Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for AEQE dumpingJunxian Huang2-0/+20
Add trace for AEQE dumping. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | <idle>-0 [120] d.h1. 7995.835587: hns_ae_info: event 19 aeqe: {0x80006013,0x0,0x0,0x10d2c,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0} Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for WQE dumpingJunxian Huang3-0/+55
Add trace for WQE dumping, including SQ, RQ and SRQ. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | roce_test_main-22730 [074] d..1. 16133.898282: hns_sq_wqe: SQ 0xc wqe (0x0/0xffff0820a6076060): {0x180,0x639c,0x0,0x1000000,0x0,0x0,0x0,0x0, 0x639c,0x300,0xf7e38000,0x0,0x0,0x0,0x0,0x0} Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/hns: Add trace for flush CQEJunxian Huang3-0/+72
Add trace to print the producer index of QP when triggering flush CQE. Output example: $ cat /sys/kernel/debug/tracing/trace tracer: nop entries-in-buffer/entries-written: 2/2 #P:128 _-----=> irqs-off/BH-disabled / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / _-=> migrate-disable |||| / delay TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | ib_send_bw-11474 [075] d..1. 2393.434738: hns_sq_flush_cqe: SQ 0x2 flush head 0xb5c7. ib_send_bw-11474 [075] d..1. 2393.434739: hns_rq_flush_cqe: RQ 0x2 flush head 0. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250421132750.1363348-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/core: Move ODP capability definitions to uapiDaisuke Matsuda2-10/+26
The bits are used from both kernel space and userland, so they should be placed in UAPI. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250418051345.1022339-2-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-21RDMA/rxe: Remove 32-bit architecture supportDaisuke Matsuda5-19/+3
Major linux distibutions have phased out support for 32-bit machines. Since rxe is primarily used for development and testing, the benefit of maintaining 32-bit support is minimal. This change simplifies ATOMIC WRITE implementations and improves maintainability of the driver. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250421025101.3588139-1-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/rxe: Remove unused rxe_run_taskDr. David Alan Gilbert2-31/+11
rxe_run_task() has been unused since 2024's commit 23bc06af547f ("RDMA/rxe: Don't call direct between tasks") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250419132725.199785-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bugZhu Yanjun1-1/+6
Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 assign_lock_key kernel/locking/lockdep.c:986 [inline] register_lock_class+0x4a3/0x4c0 kernel/locking/lockdep.c:1300 __lock_acquire+0x99/0x1ba0 kernel/locking/lockdep.c:5110 lock_acquire kernel/locking/lockdep.c:5866 [inline] lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5823 __timer_delete_sync+0x152/0x1b0 kernel/time/timer.c:1644 rxe_qp_do_cleanup+0x5c3/0x7e0 drivers/infiniband/sw/rxe/rxe_qp.c:815 execute_in_process_context+0x3a/0x160 kernel/workqueue.c:4596 __rxe_cleanup+0x267/0x3c0 drivers/infiniband/sw/rxe/rxe_pool.c:232 rxe_create_qp+0x3f7/0x5f0 drivers/infiniband/sw/rxe/rxe_verbs.c:604 create_qp+0x62d/0xa80 drivers/infiniband/core/verbs.c:1250 ib_create_qp_kernel+0x9f/0x310 drivers/infiniband/core/verbs.c:1361 ib_create_qp include/rdma/ib_verbs.h:3803 [inline] rdma_create_qp+0x10c/0x340 drivers/infiniband/core/cma.c:1144 rds_ib_setup_qp+0xc86/0x19a0 net/rds/ib_cm.c:600 rds_ib_cm_initiate_connect+0x1e8/0x3d0 net/rds/ib_cm.c:944 rds_rdma_cm_event_handler_cmn+0x61f/0x8c0 net/rds/rdma_transport.c:109 cma_cm_event_handler+0x94/0x300 drivers/infiniband/core/cma.c:2184 cma_work_handler+0x15b/0x230 drivers/infiniband/core/cma.c:3042 process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3319 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400 kthread+0x3c2/0x780 kernel/kthread.c:464 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> The root cause is as below: In the function rxe_create_qp, the function rxe_qp_from_init is called to create qp, if this function rxe_qp_from_init fails, rxe_cleanup will be called to handle all the allocated resources, including the timers: retrans_timer and rnr_nak_timer. The function rxe_qp_from_init calls the function rxe_qp_init_req to initialize the timers: retrans_timer and rnr_nak_timer. But these timers are initialized in the end of rxe_qp_init_req. If some errors occur before the initialization of these timers, this problem will occur. The solution is to check whether these timers are initialized or not. If these timers are not initialized, ignore these timers. Fixes: 8700e3e7c485 ("Soft RoCE driver") Reported-by: syzbot+4edb496c3cad6e953a31@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4edb496c3cad6e953a31 Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250419080741.1515231-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/cma: Remove unused rdma_res_to_idDr. David Alan Gilbert2-14/+0
The last use of rdma_res_to_id() was removed in 2020 by commi t211cd9459fda ("RDMA: Add dedicated CM_ID resource tracker function") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250418165848.241305-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/mana_ib: Add support of 4M, 1G, and 2G pagesKonstantin Taranov4-17/+12
Check PF capability flag whether the 4M, 1G, and 2G pages are supported. Add these pages sizes to mana_ib, if supported. Define possible page sizes in enum gdma_page_type and remove unused enum atb_page_size. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1744621234-26114-4-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/mana_ib: support of the zero based MRsKonstantin Taranov2-8/+27
Add IB_ZERO_BASED to the valid flags and use the corresponding MR creation request for the zero based memory. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1744621234-26114-3-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20RDMA/mana_ib: Access remote atomic for MRsKonstantin Taranov1-2/+5
Add IB_ACCESS_REMOTE_ATOMIC to the valid flags for MRs and use the corresponding flag bit during MR creation in the HW. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1744621234-26114-2-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-11RDMA/hns: initialize db in update_srq_db()Chen Linxuan1-1/+1
On x86_64 with gcc version 13.3.0, I compile drivers/infiniband/hw/hns/hns_roce_hw_v2.c with: make defconfig ./scripts/kconfig/merge_config.sh .config <( echo CONFIG_COMPILE_TEST=y echo CONFIG_HNS3=m echo CONFIG_INFINIBAND=m echo CONFIG_INFINIBAND_HNS_HIP08=m ) make KCFLAGS="-fno-inline-small-functions -fno-inline-functions-called-once" \ drivers/infiniband/hw/hns/hns_roce_hw_v2.o Then I get a compile error: CALL scripts/checksyscalls.sh DESCEND objtool INSTALL libsubcmd_headers CC [M] drivers/infiniband/hw/hns/hns_roce_hw_v2.o In file included from drivers/infiniband/hw/hns/hns_roce_hw_v2.c:47: drivers/infiniband/hw/hns/hns_roce_hw_v2.c: In function 'update_srq_db': drivers/infiniband/hw/hns/hns_roce_common.h:74:17: error: 'db' is used uninitialized [-Werror=uninitialized] 74 | *((__le32 *)_ptr + (field_h) / 32) &= \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_common.h:90:17: note: in expansion of macro '_hr_reg_clear' 90 | _hr_reg_clear(ptr, field_type, field_h, field_l); \ | ^~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write' 95 | #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val) | ^~~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:948:9: note: in expansion of macro 'hr_reg_write' 948 | hr_reg_write(&db, DB_TAG, srq->srqn); | ^~~~~~~~~~~~ drivers/infiniband/hw/hns/hns_roce_hw_v2.c:946:31: note: 'db' declared here 946 | struct hns_roce_v2_db db; | ^~ cc1: all warnings being treated as errors Signed-off-by: Chen Linxuan <chenlinxuan@uniontech.com> Co-developed-by: Winston Wen <wentao@uniontech.com> Signed-off-by: Winston Wen <wentao@uniontech.com> Link: https://patch.msgid.link/FF922C77946229B6+20250411105459.90782-5-chenlinxuan@uniontech.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-11RDMA/rxe: Fix mismatched type declarationsDaisuke Matsuda3-18/+19
Some functions return int values while they are defined as enum resp_states variables. This patch resolves the mismatches in rxe. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250409102701.1275265-1-matsuda-daisuke@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-09RDMA: Don't use %pK through printkThomas Weißschuh8-14/+14
In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping looks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250407-restricted-pointers-infiniband-v1-1-22b20504b84d@linutronix.de Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-09RDMA/rxe: Enable ODP in ATOMIC WRITE operationDaisuke Matsuda6-14/+69
Add rxe_odp_do_atomic_write() so that ODP specific steps are applied to ATOMIC WRITE requests. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250324075649.3313968-3-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-08RDMA/rxe: Enable ODP in RDMA FLUSH operationDaisuke Matsuda6-20/+91
For persistent memories, add rxe_odp_flush_pmem_iova() so that ODP specific steps are executed. Otherwise, no additional consideration is required. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250324075649.3313968-2-matsuda-daisuke@fujitsu.com Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-07IB/cm: use rwlock for MAD agent lockJacob Moroni1-8/+8
In workloads where there are many processes establishing connections using RDMA CM in parallel (large scale MPI), there can be heavy contention for mad_agent_lock in cm_alloc_msg. This contention can occur while inside of a spin_lock_irq region, leading to interrupts being disabled for extended durations on many cores. Furthermore, it leads to the serialization of rdma_create_ah calls, which has negative performance impacts for NICs which are capable of processing multiple address handle creations in parallel. The end result is the machine becoming unresponsive, hung task warnings, netdev TX timeouts, etc. Since the lock appears to be only for protection from cm_remove_one, it can be changed to a rwlock to resolve these issues. Reproducer: Server: for i in $(seq 1 512); do ucmatose -c 32 -p $((i + 5000)) & done Client: for i in $(seq 1 512); do ucmatose -c 32 -p $((i + 5000)) -s 10.2.0.52 & done Fixes: 76039ac9095f ("IB/cm: Protect cm_dev, cm_ports and mad_agent with kref and lock") Link: https://patch.msgid.link/r/20250220175612.2763122-1-jmoroni@google.com Signed-off-by: Jacob Moroni <jmoroni@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/hns: Remove unused parametersChengchang Tang1-3/+2
Remove unused parameters. Link: https://patch.msgid.link/r/20250327114724.3454268-2-huangjunxian6@hisilicon.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07IB/hfi1: Avoid -Wflex-array-member-not-at-end warningGustavo A. R. Silva1-1/+0
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Remove unused flex-array member `class_data` from `struct opa_mad_notice_attr`. Fix the following warning: drivers/infiniband/hw/hfi1/mad.c:23:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Link: https://patch.msgid.link/r/Z-wiYkll8Vo3ME3P@kspp Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/core: Convert to use ERR_CAST()Li Haoran1-1/+1
As opposed to open-code, using the ERR_CAST macro clearly indicates that this is a pointer to an error value and a type conversion was performed. Link: https://patch.msgid.link/r/20250401211015750qxOfU9XZ8QgKizM1Lcyq2@zte.com.cn Signed-off-by: Li Haoran <li.haoran7@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/uverbs: Convert to use ERR_CAST()Li Haoran1-1/+1
As opposed to open-code, using the ERR_CAST macro clearly indicates that this is a pointer to an error value and a type conversion was performed. Link: https://patch.msgid.link/r/202504012109233981_YPVbd4wQzmAzP3tA5IG@zte.com.cn Signed-off-by: Li Haoran <li.haoran7@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/core: Convert to use ERR_CAST()Li Haoran1-1/+1
As opposed to open-code, using the ERR_CAST macro clearly indicates that this is a pointer to an error value and a type conversion was performed. Link: https://patch.msgid.link/r/20250401210840146_IyrV3zlejzz3eAnDmMSB@zte.com.cn Signed-off-by: Li Haoran <li.haoran7@zte.com.cn> Signed-off-by: Shao Mingyin <shao.mingyin@zte.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA: Replace msecs_to_jiffies with secs_to_jiffies for timeoutPeng Jiang1-4/+4
In drivers/infiniband/hw/mlx4/mcg.c and drivers/infiniband/hw/mlx5/mr.c, `msecs_to_jiffies` is used to convert milliseconds to jiffies. For constant milliseconds, using `msecs_to_jiffies` introduces additional computational overhead. For example, it is unnecessary to check if m > jiffies_to_msecs(MAX_JIFFY_OFFSET) or (int)m < 0 for constants, while using `secs_to_jiffies` can avoid these extra calculations. Link: https://patch.msgid.link/r/20250326221955611qu6Ix3Pt5WgKvhL6sTySX@zte.com.cn Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn> Signed-off-by: Ye Xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-07RDMA/mlx5: convert timeouts to secs_to_jiffies()Easwar Hariharan1-3/+3
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced secs_to_jiffies(). As the value here is a multiple of 1000, use secs_to_jiffies() instead of msecs_to_jiffies to avoid the multiplication. This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with the following Coccinelle rules: @depends on patch@ expression E; @@ -msecs_to_jiffies(E * 1000) +secs_to_jiffies(E) -msecs_to_jiffies(E * MSEC_PER_SEC) +secs_to_jiffies(E) Link: https://patch.msgid.link/r/20250219-rdma-secs-to-jiffies-v1-2-b506746561a9@linux.microsoft.com Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-04-06Linux 6.15-rc1Linus Torvalds1-2/+2