aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/sw/rxe/rxe_verbs.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-03RDMA/rxe: Fix error type of mmap_offsetJiewei Ke1-1/+1
The type of mmap_offset should be u64 instead of int to match the type of mminfo.offset. If otherwise, after we create several thousands of CQs, it will run into overflow issues. Link: https://lore.kernel.org/r/20191227113613.5020-1-kejiewei.cn@gmail.com Signed-off-by: Jiewei Ke <kejiewei.cn@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/rxe: Increase DMA max_segment_size parameterBart Van Assche1-0/+1
Increase the DMA max_segment_size parameter from 64 KB to 2 GB. Link: https://lore.kernel.org/r/20191025225830.257535-3-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-08RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMMKonstantin Taranov1-0/+1
Calculate the correct byte_len on the receiving side when a work completion is generated with IB_WC_RECV_RDMA_WITH_IMM opcode. According to the IBA byte_len must indicate the number of written bytes, whereas it was always equal to zero for the IB_WC_RECV_RDMA_WITH_IMM opcode, even though data was transferred. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Konstantin Taranov <konstantin.taranov@inf.ethz.ch> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11RDMA: Convert CQ allocations to be under core responsibilityLeon Romanovsky1-1/+1
Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-04-08RDMA: Handle SRQ allocations by IB/coreLeon Romanovsky1-1/+1
Convert SRQ allocation from drivers to be in the IB/core Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA: Handle AH allocations by IB/coreLeon Romanovsky1-1/+1
Simplify drivers by ensuring lifetime of ib_ah object. The changes in .create_ah() go hand in hand with relevant update in .destroy_ah(). We will use this opportunity and convert .destroy_ah() to don't fail, as it was suggested a long time ago, because there is nothing to do in case of failure during destroy. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22RDMA: Handle ucontext allocations by IB/coreLeon Romanovsky1-1/+1
Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19rdma_rxe: Use netlink messages to add/delete linksSteve Wise1-1/+1
Add support for the RDMA_NLDEV_CMD_NEWLINK/DELLINK messages which allow dynamically adding new RXE links. Deprecate the old module options for now. Cc: Moni Shoua <monis@mellanox.com> Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Add ib_device_get_by_name() and use it in rxeJason Gunthorpe1-1/+0
rxe has an open coded version of this that is not as safe as the core version. This lets us eliminate the internal device list entirely from rxe. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Use driver_unregister and new unregistration APIJason Gunthorpe1-2/+0
rxe does not have correct locking for its registration/unregistration paths, use the core code to handle it instead. In this mode ib_unregister_device will also do the dealloc, so rxe is required to do clean up from a callback. The core code ensures that unregistration is done only once, and generally takes care of locking and concurrency problems for rxe. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08RDMA: Handle PD allocations by IB/coreLeon Romanovsky1-1/+1
The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20IB/rxe: Make counters thread safeParav Pandit1-3/+3
Current rxe device counters are not thread safe. When multiple QPs are used, they can be racy. Make them thread safe by making it atomic64. Fixes: 0b1e5b99a48b ("IB/rxe: Add port protocol stats") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-11-08IB/rxe: make rxe_unregister_device voidZhu Yanjun1-1/+1
Since the function rxe_unregister_device always returns 0, it is changed to void. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-30IB/rxe: avoid back-to-back retriesVijay Immanuel1-0/+1
Error retries can occur due to timeouts, NAKs or receiving packets beyond the current read request. Avoid back-to-back retries due to packet processing, by only retrying the initial attempt immediately. Subsequent retries must be due to timeouts. Continue to process completion packets after scheduling a retry. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-08-30IB/rxe: fix for duplicate request processing and ack psnsVijay Immanuel1-0/+2
Don't reset the resp opcode for a replayed read response. The resp opcode could be in the middle of a write or send sequence, when the duplicate read request was received. An example sequence is as follows: - Receive read request for 12KB PSN 20. Transmit read response first, middle and last with PSNs 20,21,22. - Receive write first PSN 23. At this point the resp psn is 24 and resp opcode is write first. - The sender notices that PSN 20 is dropped and retransmits. Receive read request for 12KB PSN 20. Transmit read response first, middle and last with PSNs 20,21,22. The resp opcode is set to -1, the resp psn remains 24. - Receive write first PSN 23. This is processed by duplicate_request(). The resp opcode remains -1 and resp psn remains 24. - Receive write middle PSN 24. check_op_seq() reports a missing first error since the resp opcode is -1. When sending an ack for a duplicate send or write request, use the psn of the previous ack sent. Do not use the psn of a read response for the ack. An example sequence is as follows: - Receive write PSN 30. Transmit ACK for PSN 30. - Receive read request 4KB PSN 31. Transmit read response with PSN 31. The resp psn is now 32. - The sender notices that PSN 30 is dropped and retransmits. Receive write PSN 30. duplicate_request() sends an ACK with PSN 31. That is incorrect since PSN 31 was a read request. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-08-30IB/rxe: vary the source udp port for receive scalingVijay Immanuel1-0/+1
Select the source udp port number for a QP based on the source QPN. This provides a better spread of traffic across NIC RX queues for RC/UC QPs. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-28IB/rxe: Remove unused variable (char *rxe_qp_state_name[])Hernán Gonzalez1-2/+0
Note: This is compile only tested as I have no access to the hw. This variable was not used anywhere in the code. Removing it saves 24 bytes. add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-24 (-24) Function old new delta rxe_qp_state_name 24 - -24 Total: Before=3348732, After=3348708, chg -0.00% Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-18RDMA/rxe: Fix rxe_qp_cleanup()Bart Van Assche1-0/+3
rxe_qp_cleanup() can sleep so it must be run in thread context and not in atomic context. This patch avoids that the following bug is triggered: Kernel BUG at 00000000560033f3 [verbose debug info unavailable] BUG: sleeping function called from invalid context at net/core/sock.c:2761 in_atomic(): 1, irqs_disabled(): 0, pid: 7, name: ksoftirqd/0 INFO: lockdep is turned off. Preemption disabled at: [<00000000b6e69628>] __do_softirq+0x4e/0x540 CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.15.0-rc7-dbg+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x85/0xbf ___might_sleep+0x177/0x260 lock_sock_nested+0x1d/0x90 inet_shutdown+0x2e/0xd0 rxe_qp_cleanup+0x107/0x140 [rdma_rxe] rxe_elem_release+0x18/0x80 [rdma_rxe] rxe_requester+0x1cf/0x11b0 [rdma_rxe] rxe_do_task+0x78/0xf0 [rdma_rxe] tasklet_action+0x99/0x270 __do_softirq+0xc0/0x540 run_ksoftirqd+0x1c/0x70 smpboot_thread_fn+0x1be/0x270 kthread+0x117/0x130 ret_from_fork+0x24/0x30 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Moni Shoua <monis@mellanox.com> Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28IB/rxe: Fix destination cache for IPv6Andrew Boyer1-0/+1
To successfully match an IPv6 path, the path cookie must match. Store it in the QP so that the IPv6 path can be reused. Replace open-coded version of dst_check() with the actual call, fixing the logic. The open-coded version skips the check call if dst->obsolete is 0 (DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE means that the route may continue to be used, though. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28IB/rxe: Disable completion upcalls when a CQ is destroyedAndrew Boyer1-0/+1
This prevents the stack from accessing userspace objects while they are being torn down. One possible sequence of events: - Userspace program exits - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(), ib_destroy_cq(), etc. and releasing/freeing the UCQ - The QP still has tasklets running, so it isn't destroyed yet - The CQ is referenced by the QP, so the CQ isn't destroyed yet - The UCQ is kfree()'d anyway - A send work request completes - rxe_send_complete() calls cq->ibcq.comp_handler() - ib_uverbs_comp_handler() runs and crashes; the event queue is checked for is_closed, but it has no way to check the ib_ucq_object before accessing it The reference counting on the CQ doesn't protect against this since the CQ hasn't been destroyed yet. There's no available interface to deregister the UCQ from the CQ, and it didn't appear that attempting to add reference counting to the UCQ was going to be a good way to go since this solution is much simpler. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Offload CRC calculation when possibleyonatanc1-0/+1
Use CPU ability to perform CRC calculations, by replacing direct calls to crc32_le() with crypto_shash_updata(). The overall performance gain measured with ib_send_bw tool is 10% and it was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU. ib_send_bw -d rxe0 -x 1 -n 9000 -e -s $((1024 * 1024 )) -l 100 --------------------------------------------------------------------------------------------- | | bytes | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] | --------------------------------------------------------------------------------------------- | crc32_le | 1048576 | 9000 | inf | 497.60 | 0.000498 | | CRC offload | 1048576 | 9000 | inf | 546.70 | 0.000547 | --------------------------------------------------------------------------------------------- Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Add port protocol statsYonatan Cohen1-0/+8
Expose new counters using the get_hw_stats callback. We expose the following counters: +---------------------+----------------------------------------+ | Name | Description | |---------------------+----------------------------------------| |sent_pkts | number of sent pkts | |---------------------+----------------------------------------| |rcvd_pkts | number of received packets | |---------------------+----------------------------------------| |out_of_sequence | number of errors due to packet | | | transport sequence number | |---------------------+----------------------------------------| |duplicate_request | number of received duplicated packets. | | | A request that previously executed is | | | named duplicated. | |---------------------+----------------------------------------| |rcvd_rnr_err | number of received RNR by completer | |---------------------+----------------------------------------| |send_rnr_err | number of sent RNR by responder | |---------------------+----------------------------------------| |rcvd_seq_err | number of out of sequence packets | | | received | |---------------------+----------------------------------------| |ack_deffered | number of deferred handling of ack | | | packets. | |---------------------+----------------------------------------| |retry_exceeded_err | number of times retry exceeded | |---------------------+----------------------------------------| |completer_retry_err | number of times completer decided to | | | retry | |---------------------+----------------------------------------| |send_err | number of failed send packet | +---------------------+----------------------------------------+ Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10IB/rxe: Remove a pointless indirection layerBart Van Assche1-22/+0
Neither rxe->ifc_ops nor any of the function pointers in struct struct rxe_ifc_ops ever change. Hence remove the rxe->ifc_ops indirection mechanism. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10IB/rxe: Let the compiler check the type of the cleanup functionsBart Van Assche1-1/+1
Change the argument type of these functions from void * into struct rxe_pool_entry *. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Soft RoCE driverMoni Shoua1-0/+480
Soft RoCE (RXE) - The software RoCE driver ib_rxe implements the RDMA transport and registers to the RDMA core device as a kernel verbs provider. It also implements the packet IO layer. On the other hand ib_rxe registers to the Linux netdev stack as a udp encapsulating protocol, in that case RDMA, for sending and receiving packets over any Ethernet device. This yields a RDMA transport over the UDP/Ethernet network layer forming a RoCEv2 compatible device. The configuration procedure of the Soft RoCE drivers requires binding to any existing Ethernet network device. This is done with /sys interface. A userspace Soft RoCE library (librxe) provides user applications the ability to run with Soft RoCE devices. The use of rxe verbs ins user space requires the inclusion of librxe as a device specifics plug-in to libibverbs. librxe is packaged separately. Architecture: +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ +---------------------------------------------------------------+ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Soft RoCE resources: [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in Github [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE Wiki page [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>