aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-09-05RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profileMaher Sanalla1-1/+1
When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30RDMA/hns: Remove redundant member doorbell_qpn of struct hns_roce_qpWenpeng Liang3-5/+1
The value of doorbell_qpn is always equal to qpn on current hardware versions. So remove it. Link: https://lore.kernel.org/r/20220829105021.1427804-5-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30RDMA/hns: Remove the num_qpc_timer variableYixing Liu4-5/+3
The bt number of qpc_timer of HIP09 increases compared with that of HIP08. Therefore, qpc_timer_bt_num and num_qpc_timer do not match. As a result, the driver may fail to allocate qpc_timer. So the driver needs to uniquely uses qpc_timer_bt_num to represent the bt number of qpc_timer. Fixes: 0e40dc2f70cd ("RDMA/hns: Add timer allocation support for hip08") Link: https://lore.kernel.org/r/20220829105021.1427804-4-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shiftWenpeng Liang1-5/+2
The value of qp->rq.wqe_shift of HIP08 is always determined by the number of sge. So delete the wrong branch. Fixes: cfc85f3e4b7f ("RDMA/hns: Add profile support for hip08 driver") Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/20220829105021.1427804-3-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-30RDMA/hns: Fix supported page sizeChengchang Tang1-1/+1
The supported page size for hns is (4K, 128M), not (4K, 2G). Fixes: cfc85f3e4b7f ("RDMA/hns: Add profile support for hip08 driver") Link: https://lore.kernel.org/r/20220829105021.1427804-2-liangwenpeng@huawei.com Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-28RDMA/irdma: Fix drain SQ hang with no completionShiraz Saleem1-1/+1
SW generated completions for outstanding WRs posted on SQ after QP is in error target the wrong CQ. This causes the ib_drain_sq to hang with no completion. Fix this to generate completions on the right CQ. [ 863.969340] INFO: task kworker/u52:2:671 blocked for more than 122 seconds. [ 863.979224] Not tainted 5.14.0-130.el9.x86_64 #1 [ 863.986588] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 863.996997] task:kworker/u52:2 state:D stack: 0 pid: 671 ppid: 2 flags:0x00004000 [ 864.007272] Workqueue: xprtiod xprt_autoclose [sunrpc] [ 864.014056] Call Trace: [ 864.017575] __schedule+0x206/0x580 [ 864.022296] schedule+0x43/0xa0 [ 864.026736] schedule_timeout+0x115/0x150 [ 864.032185] __wait_for_common+0x93/0x1d0 [ 864.037717] ? usleep_range_state+0x90/0x90 [ 864.043368] __ib_drain_sq+0xf6/0x170 [ib_core] [ 864.049371] ? __rdma_block_iter_next+0x80/0x80 [ib_core] [ 864.056240] ib_drain_sq+0x66/0x70 [ib_core] [ 864.062003] rpcrdma_xprt_disconnect+0x82/0x3b0 [rpcrdma] [ 864.069365] ? xprt_prepare_transmit+0x5d/0xc0 [sunrpc] [ 864.076386] xprt_rdma_close+0xe/0x30 [rpcrdma] [ 864.082593] xprt_autoclose+0x52/0x100 [sunrpc] [ 864.088718] process_one_work+0x1e8/0x3c0 [ 864.094170] worker_thread+0x50/0x3b0 [ 864.099109] ? rescuer_thread+0x370/0x370 [ 864.104473] kthread+0x149/0x170 [ 864.109022] ? set_kthread_struct+0x40/0x40 [ 864.114713] ret_from_fork+0x22/0x30 Fixes: 81091d7696ae ("RDMA/irdma: Add SW mechanism to generate completions on error") Link: https://lore.kernel.org/r/20220824154358.117-1-shiraz.saleem@intel.com Reported-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-24RDMA/hns: Support MR's restrack raw ops for hns driverWenpeng Liang5-1/+66
The MR raw restrack attributes come from the queue context maintained by the ROCEE. For example: $ rdma res show mr dev hns_0 mrn 6 -dd -jp -r [ { "ifindex": 4, "ifname": "hns_0", "data": [ 1,0,0,0,2,0,0,0,0,3,0,0,0,0,2,0,0,0,0,0,32,0,0,0,2,0,0,0, 2,0,0,0,0,0,0,0 ] } ] Link: https://lore.kernel.org/r/20220822104455.2311053-8-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23RDMA/hns: Support MR's restrack ops for hns driverWenpeng Liang3-0/+32
The MR restrack attributes come from the queue information maintained by the driver. For example: $ rdma res show mr dev hns_0 mrn 6 -dd -jp [ { "ifindex": 4, "ifname": "hns_0", "mrn": 6, "rkey": "300", "lkey": "300", "mrlen": 131072, "pdn": 8, "pid": 1524, "comm": "ib_send_bw" }, "drv_pbl_hop_num": 2, "drv_ba_pg_shift": 14, "drv_buf_pg_shift": 12 } Link: https://lore.kernel.org/r/20220822104455.2311053-7-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23RDMA/hns: Support QP's restrack raw ops for hns driverWenpeng Liang4-6/+65
The QP raw restrack attributes come from the queue context maintained by the ROCEE. For example: $ rdma res show qp link hns_0 -jp -dd -r [ { "ifindex": 4, "ifname": "hns_0", "data": [ 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0, 5,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,255,156,0,0,63,156,0,0, 7,0,0,0,1,0,0,0,9,0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,156,0, 0,0,0,0,0 ] } ] Link: https://lore.kernel.org/r/20220822104455.2311053-6-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23RDMA/hns: Support QP's restrack ops for hns driverWenpeng Liang3-0/+36
The QP restrack attributes come from the queue information maintained by the driver. For example: $ rdma res show qp link hns_0 lqpn 41 -jp -dd [ { "ifindex": 4, "ifname": "hns_0", "port": 1, "lqpn": 41, "rqpn": 40, "type": "RC", "state": "RTR", "rq-psn": 12474738, "sq-psn": 0, "path-mig-state": "ARMED", "pdn": 9, "pid": 1523, "comm": "ib_send_bw" }, "drv_sq_wqe_cnt": 128, "drv_sq_max_gs": 1, "drv_rq_wqe_cnt": 512, "drv_rq_max_gs": 2, "drv_ext_sge_sge_cnt": 0 } Link: https://lore.kernel.org/r/20220822104455.2311053-5-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23RDMA/hns: Support CQ's restrack raw ops for hns driverWenpeng Liang3-0/+41
The CQ raw restrack attributes come from the queue context maintained by the ROCEE. For example: $ rdma res show cq dev hns_0 cqn 14 -dd -jp -r [ { "ifindex": 4, "ifname": "hns_0", "data": [ 1,0,0,0,7,0,0,0,0,0,0,0,0,82,6,0,0,82,6,0,0,82,6,0, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0, 6,0,0,0,0,0,0,0 ] } ] Link: https://lore.kernel.org/r/20220822104455.2311053-4-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23RDMA/hns: Add or remove CQ's restrack attributesWenpeng Liang1-57/+10
Remove the resttrack attributes from the queue context held by ROCEE, and add the resttrack attributes from the queue information maintained by the driver. For example: $ rdma res show cq dev hns_0 cqn 14 -dd -jp [ { "ifindex": 4, "ifname": "hns_0", "cqn": 14, "cqe": 127, "users": 1, "adaptive-moderation": false, "ctxn": 8, "pid": 1524, "comm": "ib_send_bw" }, "drv_cq_depth": 128, "drv_cons_index": 0, "drv_cqe_size": 32, "drv_arm_sn": 1 } Link: https://lore.kernel.org/r/20220822104455.2311053-3-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23RDMA/hns: Remove redundant DFX file and DFX ops structureWenpeng Liang7-75/+50
There is no need to use a dedicated DXF file and DFX structure to manage the interface of the query queue context. Link: https://lore.kernel.org/r/20220822104455.2311053-2-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-23IB/mlx5: Remove duplicate header inclusion related to ODPDaisuke Matsuda4-6/+1
rdma/ib_umem.h and rdma/ib_verbs.h are included by rdma/ib_umem_odp.h. This patch removes the redundant entries. Link: https://lore.kernel.org/r/20220823025131.862811-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21IB: move from strlcpy with unused retval to strscpyWolfram Sang6-6/+6
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Link: https://lore.kernel.org/r/20220818210018.6841-1-wsa+renesas@sang-engineering.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-21RDMA/efa: Support CQ receive entries with source GIDMichael Margolin5-5/+309
Add a parameter for create CQ admin command to set source address on receive completion descriptors. Report capability for this feature through query device verb. Link: https://lore.kernel.org/r/20220818140449.414-1-mrgolin@amazon.com Reviewed-by: Firas Jahjah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com> Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16RDMA/erdma: Correct the max_qp and max_cq capacities of the deviceCheng Xu1-2/+2
QP0 in HW is used for CMDQ, and the rest is for RDMA QPs. So the actual max_qp capacity reported to core should be max_qp (reported by HW) - 1. So does max_cq. Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Link: https://lore.kernel.org/all/20220810014320.88026-1-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-08-16RDMA/erdma: Using the key in FMR WR instead of MR structureCheng Xu1-1/+1
RDMA driver should get the MR key from FMR WR, not the MR structure passed in. Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Link: https://lore.kernel.org/r/20220810014320.88026-2-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16RDMA/cxgb4: fix accept failure due to increased cpl_t5_pass_accept_rpl sizePotnuri Bharat Teja1-16/+9
Commit 'c2ed5611afd7' has increased the cpl_t5_pass_accept_rpl{} structure size by 8B to avoid roundup. cpl_t5_pass_accept_rpl{} is a HW specific structure and increasing its size will lead to unwanted adapter errors. Current commit reverts the cpl_t5_pass_accept_rpl{} back to its original and allocates zeroed skb buffer there by avoiding the memset for iss field. Reorder code to minimize chip type checks. Fixes: c2ed5611afd7 ("iw_cxgb4: Use memset_startat() for cpl_t5_pass_accept_rpl") Link: https://lore.kernel.org/r/20220809184118.2029-1-rahul.lakkireddy@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16RDMA/mlx5: Use the proper number of portsMark Bloch1-18/+16
The cited commit allowed the driver to operate over HCAs that have 4 physical ports. Use the number of ports of the RDMA device in the for loop instead of using the struct size. Fixes: 4cd14d44b11d ("net/mlx5: Support devices with more than 2 ports") Link: https://lore.kernel.org/r/a54a56c2ede16044a29d119209b35189c662ac72.1659944855.git.leonro@nvidia.com Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16RDMA/mlx5: Don't compare mkey tags in DEVX indirect mkeyAharon Landau2-1/+5
According to the ib spec: If the CI supports the Base Memory Management Extensions defined in this specification, the L_Key format must consist of: 24 bit index in the most significant bits of the R_Key, and 8 bit key in the least significant bits of the R_Key Through a successful Allocate L_Key verb invocation, the CI must let the consumer own the key portion of the returned R_Key Therefore, when creating a mkey using DEVX, the consumer is allowed to change the key part. The kernel should compare only the index part of a R_Key to determine equality with another R_Key. Adding capability in order not to break backward compatibility. Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Link: https://lore.kernel.org/r/3d669aacea85a3a15c3b3b953b3eaba3f80ef9be.1659255945.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16IB/mlx5: Call io_stop_wc() after writing to WC MMIOJason Gunthorpe1-0/+1
This new function is defined only on ARM and serves to guarantee a barrier in the WC operation. The barrier means that another run of this loop will not combine with the stores this loop created. On x86 this is happening implicitly because of the spin_unlock(). Link: https://lore.kernel.org/r/0-v1-c5dade92f363+11-mlx5_io_stop_wc_jgg@nvidia.com Suggested-by: Pavel Shamis <Pavel.Shamis@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-05Merge tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-traceLinus Torvalds1-6/+2
Pull tracing updates from Steven Rostedt: - Runtime verification infrastructure This is the biggest change here. It introduces the runtime verification that is necessary for running Linux on safety critical systems. It allows for deterministic automata models to be inserted into the kernel that will attach to tracepoints, where the information on these tracepoints will move the model from state to state. If a state is encountered that does not belong to the model, it will then activate a given reactor, that could just inform the user or even panic the kernel (for which safety critical systems will detect and can recover from). - Two monitor models are also added: Wakeup In Preemptive (WIP - not to be confused with "work in progress"), and Wakeup While Not Running (WWNR). - Added __vstring() helper to the TRACE_EVENT() macro to replace several vsnprintf() usages that were all doing it wrong. - eprobes now can have their event autogenerated when the event name is left off. - The rest is various cleanups and fixes. * tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits) rv: Unlock on error path in rv_unregister_reactor() tracing: Use alignof__(struct {type b;}) instead of offsetof() tracing/eprobe: Show syntax error logs in error_log file scripts/tracing: Fix typo 'the the' in comment tracepoints: It is CONFIG_TRACEPOINTS not CONFIG_TRACEPOINT tracing: Use free_trace_buffer() in allocate_trace_buffers() tracing: Use a struct alignof to determine trace event field alignment rv/reactor: Add the panic reactor rv/reactor: Add the printk reactor rv/monitor: Add the wwnr monitor rv/monitor: Add the wip monitor rv/monitor: Add the wip monitor skeleton created by dot2k Documentation/rv: Add deterministic automata instrumentation documentation Documentation/rv: Add deterministic automata monitor synthesis documentation tools/rv: Add dot2k Documentation/rv: Add deterministic automaton documentation tools/rv: Add dot2c Documentation/rv: Add a basic documentation rv/include: Add instrumentation helper functions rv/include: Add deterministic automata monitor definition via C macros ...
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds44-413/+7244
Pull rdma updates from Jason Gunthorpe: "This cycle we got a new RDMA driver "ERDMA" for the Alibaba cloud environment. Otherwise the changes are dominated by rxe fixes. There is another RDMA driver on the list that might get merged next cycle, 'MANA' for the Azure cloud environment. Summary: - Bug fixes and small features for irdma, hns, siw, qedr, hfi1, mlx5 - General spelling/grammer fixes - rdma cm can follow changes in neighbours for control packets - Significant amounts of rxe fixes and spec compliance changes - Use the modern NAPI API - Use the bitmap API instead of open coding - Performance improvements for rtrs - Add the ERDMA driver for Alibaba cloud - Fix a use after free bug in SRP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA/ib_srpt: Unify checking rdma_cm_id condition in srpt_cm_req_recv() RDMA/rxe: Fix error unwind in rxe_create_qp() RDMA/mlx5: Add missing check for return value in get namespace flow RDMA/rxe: Split qp state for requester and completer RDMA/rxe: Generate error completion for error requester QP state RDMA/rxe: Update wqe_index for each wqe error completion RDMA/srpt: Fix a use-after-free RDMA/srpt: Introduce a reference count in struct srpt_device RDMA/srpt: Duplicate port name members IB/qib: Fix repeated "in" within comments RDMA/erdma: Add driver to kernel build environment RDMA/erdma: Add the ABI definitions RDMA/erdma: Add the erdma module RDMA/erdma: Add connection management (CM) support RDMA/erdma: Add verbs implementation RDMA/erdma: Add verbs header file RDMA/erdma: Add event queue implementation RDMA/erdma: Add cmdq implementation RDMA/erdma: Add main include file RDMA/erdma: Add the hardware related definitions ...
2022-08-02RDMA/mlx5: Add missing check for return value in get namespace flowMaor Gottlieb1-4/+2
Add missing check for return value when calling to mlx5_ib_ft_type_to_namespace, even though it can't really fail in this specific call. Fixes: 52438be44112 ("RDMA/mlx5: Allow inserting a steering rule to the FDB") Link: https://lore.kernel.org/r/7b9ceda217d9368a51dc47a46b769bad4af9ac92.1659256069.git.leonro@nvidia.com Reviewed-by: Itay Aveksis <itayav@nvidia.com> Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-28IB/qib: Fix repeated "in" within commentswangjianli1-1/+1
Delete the redundant word 'in'. Link: https://lore.kernel.org/r/20220724074407.18552-1-wangjianli@cdjrlc.com Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27Merge branch 'erdma' into rdma.git for-nextJason Gunthorpe14-0/+6412
Cheng Xu says ==================== This v14 patch set introduces the Elastic RDMA Adapter (ERDMA) driver, which released in Apsara Conference 2021 by Alibaba. The PR of ERDMA userspace provider has already been created [1]. ERDMA enables large-scale RDMA acceleration capability in Alibaba ECS environment, initially offered in g7re instance. It can improve the efficiency of large-scale distributed computing and communication significantly and expand dynamically with the cluster scale of Alibaba Cloud. ERDMA is a RDMA networking adapter based on the Alibaba MOC hardware. It works in the VPC network environment (overlay network), and uses iWarp transport protocol. ERDMA supports reliable connection (RC). ERDMA also supports both kernel space and user space verbs. Now we have already supported HPC/AI applications with libfabric, NoF and some other internal verbs libraries, such as xrdma, epsl, etc,. For the ECS instance with RDMA enabled, our MOC hardware generates two kinds of PCI devices: one for ERDMA, and one for the original net device (virtio-net). They are separated PCI devices. ==================== * branch 'erdma': RDMA/erdma: Add driver to kernel build environment RDMA/erdma: Add the ABI definitions RDMA/erdma: Add the erdma module RDMA/erdma: Add connection management (CM) support RDMA/erdma: Add verbs implementation RDMA/erdma: Add verbs header file RDMA/erdma: Add event queue implementation RDMA/erdma: Add cmdq implementation RDMA/erdma: Add main include file RDMA/erdma: Add the hardware related definitions RDMA: Add ERDMA to rdma_driver_id definition Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add driver to kernel build environmentCheng Xu3-0/+17
Add erdma to the kernel build environment, and sort the source order in drivers/infiniband/Kconfig. Link: https://lore.kernel.org/r/20220727014927.76564-12-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add the erdma moduleCheng Xu1-0/+608
Add the main erdma module, which provides interface to infiniband subsystem. This commit includes a modification from Christophe, that using the bitmap API to allocate bitmaps instead of hand-writing. And the commit also fixes warnings reported by static checkers. Link: https://lore.kernel.org/r/20220727014927.76564-10-chengyou@linux.alibaba.com Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add connection management (CM) supportCheng Xu2-0/+1597
ERDMA's transport protocol is iWarp, so the driver must support CM interface. In CM part, we use the same way as SoftiWarp: using kernel socket to set up the connection, then performing MPA negotiation in kernel. So, this part of code mainly comes from SoftiWarp, base on it, we add some more features, such as non-blocking iw_connect implementation. This commit also fixes a duplicated include issue reported by Abaci Robot. Link: https://lore.kernel.org/r/20220727014927.76564-9-chengyou@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add verbs implementationCheng Xu3-0/+2231
The RDMA verbs implementation of erdma is divided into three files: erdma_qp.c, erdma_cq.c, and erdma_verbs.c. Internal used functions and datapath functions of QP/CQ are put in erdma_qp.c and erdma_cq.c, the rest is in erdma_verbs.c. This commit also fixes some static check warnings. Link: https://lore.kernel.org/r/20220727014927.76564-8-chengyou@linux.alibaba.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add verbs header fileCheng Xu1-0/+342
This header file defines the main structures and functions used for RDMA Verbs, including qp, cq, mr, ucontext, etc,. Link: https://lore.kernel.org/r/20220727014927.76564-7-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add event queue implementationCheng Xu1-0/+329
Event queue (EQ) is the main notification way from erdma hardware to its driver. Each erdma device contains 2 kinds EQs: asynchronous EQ (AEQ) and completion EQ (CEQ). Per device has 1 AEQ, which used for RDMA async event report, and max to 32 CEQs (numbered for CEQ0 to CEQ31). CEQ0 is used for cmdq completion event report, and the rest CEQs are used for RDMA completion event report. Link: https://lore.kernel.org/r/20220727014927.76564-6-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add cmdq implementationCheng Xu1-0/+493
Cmdq is the main control plane channel between erdma driver and hardware. After erdma device is initialized, the cmdq channel will be active in the whole lifecycle of this driver. This commit also includes two modifications from Christophe, one is using the bitmap API to allocate bitmaps instead of hand-writing, and another is using the non-atomic bitmap API when applicable. Link: https://lore.kernel.org/r/20220727014927.76564-5-chengyou@linux.alibaba.com Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add main include fileCheng Xu1-0/+287
Add ERDMA driver main header file, defining internal used data structures and operations. The defined data structures includes *cmdq*, which is used as the communication channel between ERDMA driver and hardware. Link: https://lore.kernel.org/r/20220727014927.76564-4-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add the hardware related definitionsCheng Xu1-0/+508
ERDMA is a PCIe device, and this file provides ERDMA hardware related definitions, mainly including PCIe device capabilities and restrictions, device registers definitions, doorbell space, doorbell structure definitions and WQE definitions. Link: https://lore.kernel.org/r/20220727014927.76564-3-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Rename the mkey cache variables and functionsAharon Landau4-37/+37
After replacing the MR cache with an Mkey cache, rename the variables and functions to fit the new meaning. Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Store in the cache mkeys instead of mrsAharon Landau2-129/+97
Currently, the driver stores mlx5_ib_mr struct in the cache entries, although the only use of the cached MR is the mkey. Store only the mkey in the cache. Link: https://lore.kernel.org/r/20220726071911.122765-5-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Store the number of in_use cache mkeys instead of total_mrsAharon Landau2-18/+16
total_mrs is used only to calculate the number of mkeys currently in use. To simplify things, replace it with a new member called "in_use" and directly store the number of mkeys currently in use. Link: https://lore.kernel.org/r/20220726071911.122765-4-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Replace cache list with XarrayAharon Landau2-88/+152
The Xarray allows us to store the cached mkeys in memory efficient way. Entries are reserved in the Xarray using xa_cmpxchg before calling to the upcoming callbacks to avoid allocations in interrupt context. The xa_cmpxchg can sleep when using GFP_KERNEL, so we call it in a loop to ensure one reserved entry for each process trying to reserve. Link: https://lore.kernel.org/r/20220726071911.122765-3-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Replace ent->lock with xa_lockAharon Landau2-50/+47
In the next patch, ent->list will be replaced with an xarray. The xarray uses an internal lock to protect the indexes. Use it to protect all the entry fields, and get rid of ent->lock. Link: https://lore.kernel.org/r/20220726071911.122765-2-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22IB: Fix repeated words 'the the' commentsSlark Xiao1-1/+1
Replace 'the the' with 'the' in the comments. Signed-off-by: Slark Xiao <slark_xiao@163.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski5-52/+5
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19RDMA/mlx5: Expose steering anchor to userspaceMark Bloch2-5/+139
Expose a steering anchor per priority to allow users to re-inject packets back into default NIC pipeline for additional processing. MLX5_IB_METHOD_STEERING_ANCHOR_CREATE returns a flow table ID which a user can use to re-inject packets at a specific priority. A FTE (flow table entry) can be created and the flow table ID used as a destination. When a packet is taken into a RDMA-controlled steering domain (like software steering) there may be a need to insert the packet back into the default NIC pipeline. This exposes a flow table ID to the user that can be used as a destination in a flow table entry. With this new method priorities that are exposed to users via MLX5_IB_METHOD_FLOW_MATCHER_CREATE can be reached from a non-zero UID. As user-created flow tables (via RDMA DEVX) are created with a non-zero UID thus it's impossible to point to a NIC core flow table (core driver flow tables are created with UID value of zero) from userspace. Create flow tables that are exposed to users with the shared UID, this allows users to point to default NIC flow tables. Steering loops are prevented at FW level as FW enforces that no flow table at level X can point to a table at level lower than X. Link: https://lore.kernel.org/all/20220703205407.110890-6-saeed@kernel.org/ Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-19RDMA/mlx5: Refactor get flow table functionMark Bloch1-10/+11
_get_flow_table() requires the entire matcher being passed while all it needs is the priority and namespace type. Pass the priority and namespace type directly instead. Link: https://lore.kernel.org/all/20220703205407.110890-5-saeed@kernel.org/ Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-19IB/qib: Fix comment typoJason Wang1-1/+1
The double `are' is duplicated in line 156, remove one. Link: https://lore.kernel.org/r/20220715054007.5320-1-wangborong@cdjrlc.com Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-19RDMA/hfi1: fix potential memory leak in setup_base_ctxt()Jianglei Nie1-1/+3
setup_base_ctxt() allocates a memory chunk for uctxt->groups with hfi1_alloc_ctxt_rcv_groups(). When init_user_ctxt() fails, uctxt->groups is not released, which will lead to a memory leak. We should release the uctxt->groups with hfi1_free_ctxt_rcv_groups() when init_user_ctxt() fails. Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized") Link: https://lore.kernel.org/r/20220711070718.2318320-1-niejianglei2021@163.com Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Recover 1bit-ECC error of RAM on chipHaoyue Xu3-2/+193
Since ECC memory maintains a memory system immune to single-bit errors, add support for correcting the 1bit-ECC error, which prevents a 1bit-ECC error become an uncorrected type error. When a 1bit-ECC error happens in the internal ram of the ROCE engine, such as the QPC table, as a 1bit-ECC error caused by reading, the ROCE engine only corrects those 1bit ECC errors by writing. Link: https://lore.kernel.org/r/20220714134353.16700-6-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Refactor the abnormal interrupt handler functionHaoyue Xu1-12/+23
Use a single function to handle the same kind of abnormal interrupts. Link: https://lore.kernel.org/r/20220714134353.16700-5-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Fix incorrect clearing of interrupt status registerHaoyue Xu1-2/+2
The driver will clear all the interrupts in the same area when the driver handles the interrupt of type AEQ overflow. It should only set the interrupt status bit of type AEQ overflow. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Link: https://lore.kernel.org/r/20220714134353.16700-4-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>