aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/sw/rxe/rxe_verbs.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-11-08IB/rxe: make rxe_unregister_device voidZhu Yanjun1-1/+1
Since the function rxe_unregister_device always returns 0, it is changed to void. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-30IB/rxe: avoid back-to-back retriesVijay Immanuel1-0/+1
Error retries can occur due to timeouts, NAKs or receiving packets beyond the current read request. Avoid back-to-back retries due to packet processing, by only retrying the initial attempt immediately. Subsequent retries must be due to timeouts. Continue to process completion packets after scheduling a retry. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-08-30IB/rxe: fix for duplicate request processing and ack psnsVijay Immanuel1-0/+2
Don't reset the resp opcode for a replayed read response. The resp opcode could be in the middle of a write or send sequence, when the duplicate read request was received. An example sequence is as follows: - Receive read request for 12KB PSN 20. Transmit read response first, middle and last with PSNs 20,21,22. - Receive write first PSN 23. At this point the resp psn is 24 and resp opcode is write first. - The sender notices that PSN 20 is dropped and retransmits. Receive read request for 12KB PSN 20. Transmit read response first, middle and last with PSNs 20,21,22. The resp opcode is set to -1, the resp psn remains 24. - Receive write first PSN 23. This is processed by duplicate_request(). The resp opcode remains -1 and resp psn remains 24. - Receive write middle PSN 24. check_op_seq() reports a missing first error since the resp opcode is -1. When sending an ack for a duplicate send or write request, use the psn of the previous ack sent. Do not use the psn of a read response for the ack. An example sequence is as follows: - Receive write PSN 30. Transmit ACK for PSN 30. - Receive read request 4KB PSN 31. Transmit read response with PSN 31. The resp psn is now 32. - The sender notices that PSN 30 is dropped and retransmits. Receive write PSN 30. duplicate_request() sends an ACK with PSN 31. That is incorrect since PSN 31 was a read request. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-08-30IB/rxe: vary the source udp port for receive scalingVijay Immanuel1-0/+1
Select the source udp port number for a QP based on the source QPN. This provides a better spread of traffic across NIC RX queues for RC/UC QPs. Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-28IB/rxe: Remove unused variable (char *rxe_qp_state_name[])Hernán Gonzalez1-2/+0
Note: This is compile only tested as I have no access to the hw. This variable was not used anywhere in the code. Removing it saves 24 bytes. add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-24 (-24) Function old new delta rxe_qp_state_name 24 - -24 Total: Before=3348732, After=3348708, chg -0.00% Signed-off-by: Hernán Gonzalez <hernan@vanguardiasur.com.ar> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-18RDMA/rxe: Fix rxe_qp_cleanup()Bart Van Assche1-0/+3
rxe_qp_cleanup() can sleep so it must be run in thread context and not in atomic context. This patch avoids that the following bug is triggered: Kernel BUG at 00000000560033f3 [verbose debug info unavailable] BUG: sleeping function called from invalid context at net/core/sock.c:2761 in_atomic(): 1, irqs_disabled(): 0, pid: 7, name: ksoftirqd/0 INFO: lockdep is turned off. Preemption disabled at: [<00000000b6e69628>] __do_softirq+0x4e/0x540 CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.15.0-rc7-dbg+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x85/0xbf ___might_sleep+0x177/0x260 lock_sock_nested+0x1d/0x90 inet_shutdown+0x2e/0xd0 rxe_qp_cleanup+0x107/0x140 [rdma_rxe] rxe_elem_release+0x18/0x80 [rdma_rxe] rxe_requester+0x1cf/0x11b0 [rdma_rxe] rxe_do_task+0x78/0xf0 [rdma_rxe] tasklet_action+0x99/0x270 __do_softirq+0xc0/0x540 run_ksoftirqd+0x1c/0x70 smpboot_thread_fn+0x1be/0x270 kthread+0x117/0x130 ret_from_fork+0x24/0x30 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Moni Shoua <monis@mellanox.com> Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28IB/rxe: Fix destination cache for IPv6Andrew Boyer1-0/+1
To successfully match an IPv6 path, the path cookie must match. Store it in the QP so that the IPv6 path can be reused. Replace open-coded version of dst_check() with the actual call, fixing the logic. The open-coded version skips the check call if dst->obsolete is 0 (DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE means that the route may continue to be used, though. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-28IB/rxe: Disable completion upcalls when a CQ is destroyedAndrew Boyer1-0/+1
This prevents the stack from accessing userspace objects while they are being torn down. One possible sequence of events: - Userspace program exits - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(), ib_destroy_cq(), etc. and releasing/freeing the UCQ - The QP still has tasklets running, so it isn't destroyed yet - The CQ is referenced by the QP, so the CQ isn't destroyed yet - The UCQ is kfree()'d anyway - A send work request completes - rxe_send_complete() calls cq->ibcq.comp_handler() - ib_uverbs_comp_handler() runs and crashes; the event queue is checked for is_closed, but it has no way to check the ib_ucq_object before accessing it The reference counting on the CQ doesn't protect against this since the CQ hasn't been destroyed yet. There's no available interface to deregister the UCQ from the CQ, and it didn't appear that attempting to add reference counting to the UCQ was going to be a good way to go since this solution is much simpler. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Offload CRC calculation when possibleyonatanc1-0/+1
Use CPU ability to perform CRC calculations, by replacing direct calls to crc32_le() with crypto_shash_updata(). The overall performance gain measured with ib_send_bw tool is 10% and it was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU. ib_send_bw -d rxe0 -x 1 -n 9000 -e -s $((1024 * 1024 )) -l 100 --------------------------------------------------------------------------------------------- | | bytes | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] | --------------------------------------------------------------------------------------------- | crc32_le | 1048576 | 9000 | inf | 497.60 | 0.000498 | | CRC offload | 1048576 | 9000 | inf | 546.70 | 0.000547 | --------------------------------------------------------------------------------------------- Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-21IB/rxe: Add port protocol statsYonatan Cohen1-0/+8
Expose new counters using the get_hw_stats callback. We expose the following counters: +---------------------+----------------------------------------+ | Name | Description | |---------------------+----------------------------------------| |sent_pkts | number of sent pkts | |---------------------+----------------------------------------| |rcvd_pkts | number of received packets | |---------------------+----------------------------------------| |out_of_sequence | number of errors due to packet | | | transport sequence number | |---------------------+----------------------------------------| |duplicate_request | number of received duplicated packets. | | | A request that previously executed is | | | named duplicated. | |---------------------+----------------------------------------| |rcvd_rnr_err | number of received RNR by completer | |---------------------+----------------------------------------| |send_rnr_err | number of sent RNR by responder | |---------------------+----------------------------------------| |rcvd_seq_err | number of out of sequence packets | | | received | |---------------------+----------------------------------------| |ack_deffered | number of deferred handling of ack | | | packets. | |---------------------+----------------------------------------| |retry_exceeded_err | number of times retry exceeded | |---------------------+----------------------------------------| |completer_retry_err | number of times completer decided to | | | retry | |---------------------+----------------------------------------| |send_err | number of failed send packet | +---------------------+----------------------------------------+ Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10IB/rxe: Remove a pointless indirection layerBart Van Assche1-22/+0
Neither rxe->ifc_ops nor any of the function pointers in struct struct rxe_ifc_ops ever change. Hence remove the rxe->ifc_ops indirection mechanism. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10IB/rxe: Let the compiler check the type of the cleanup functionsBart Van Assche1-1/+1
Change the argument type of these functions from void * into struct rxe_pool_entry *. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Andrew Boyer <andrew.boyer@dell.com> Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Soft RoCE driverMoni Shoua1-0/+480
Soft RoCE (RXE) - The software RoCE driver ib_rxe implements the RDMA transport and registers to the RDMA core device as a kernel verbs provider. It also implements the packet IO layer. On the other hand ib_rxe registers to the Linux netdev stack as a udp encapsulating protocol, in that case RDMA, for sending and receiving packets over any Ethernet device. This yields a RDMA transport over the UDP/Ethernet network layer forming a RoCEv2 compatible device. The configuration procedure of the Soft RoCE drivers requires binding to any existing Ethernet network device. This is done with /sys interface. A userspace Soft RoCE library (librxe) provides user applications the ability to run with Soft RoCE devices. The use of rxe verbs ins user space requires the inclusion of librxe as a device specifics plug-in to libibverbs. librxe is packaged separately. Architecture: +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ +---------------------------------------------------------------+ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-----------------------------------------------------------+ | Application | +-----------------------------------------------------------+ +-----------------------------------+ | libibverbs | User +-----------------------------------+ +----------------+ +----------------+ | librxe | | HW RoCE lib | +----------------+ +----------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------+ +------------+ | Sockets | | RDMA ULP | +--------------+ +------------+ +--------------+ +---------------------+ | TCP/IP | | ib_core | +--------------+ +---------------------+ +------------+ +----------------+ Kernel | ib_rxe | | HW RoCE driver | +------------+ +----------------+ +------------------------------------+ | NIC driver | +------------------------------------+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Soft RoCE resources: [1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in Github [2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE Wiki page [3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>