aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/sw (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-12-14Merge branch 'hfi1' into merge-testDoug Ledford9-187/+486
2016-12-14IB/rdmavt: Only put mmap_info ref if it existsJim Foraker1-1/+2
rvt_create_qp() creates qp->ip only when a qp creation request comes from userspace (udata is not NULL). If we exceed the number of available queue pairs however, the error path always attempts to put a kref to this structure. If the requestor is inside the kernel, this leads to a crash. We fix this by checking that qp->ip is not NULL before caling kref_put(). Signed-off-by: Jim Foraker <foraker1@llnl.gov> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Acked-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/rdmavt: Handle the kthread worker using the new APIPetr Mladek1-23/+11
Use the new API to create and destroy the cq kthread worker. The API hides some implementation details. In particular, kthread_create_worker() allocates and initializes struct kthread_worker. It runs the kthread the right way and stores task_struct into the worker structure. In addition, the *on_cpu() variant binds the kthread to the given cpu and the related memory node. kthread_destroy_worker() flushes all pending works, stops the kthread and frees the structure. This patch does not change the existing behavior. Note that we must use the on_cpu() variant because the function starts the kthread and it must bind it to the right CPU before waking. The numa node is associated for given CPU as well. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14IB/rdmavt: Avoid queuing work into a destroyed cq kthread workerPetr Mladek1-14/+16
The memory barrier is not enough to protect queuing works into a destroyed cq kthread. Just imagine the following situation: CPU1 CPU2 rvt_cq_enter() worker = cq->rdi->worker; rvt_cq_exit() rdi->worker = NULL; smp_wmb(); kthread_flush_worker(worker); kthread_stop(worker->task); kfree(worker); // nothing queued yet => // nothing flushed and // happily stopped and freed if (likely(worker)) { // true => read before CPU2 acted cq->notify = RVT_CQ_NONE; cq->triggered++; kthread_queue_work(worker, &cq->comptask); BANG: worker has been flushed/stopped/freed in the meantime. This patch solves this by protecting the critical sections by rdi->n_cqs_lock. It seems that this lock is not much contended and looks reasonable for this purpose. One catch is that rvt_cq_enter() might be called from IRQ context. Therefore we must always take the lock with IRQs disabled to avoid a possible deadlock. Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/core: Let create_ah return extended response to userMoni Shoua1-1/+3
Add struct ib_udata to the signature of create_ah callback that is implemented by IB device drivers. This allows HW drivers to return extra data to the userspace library. This patch prepares the ground for mlx5 driver to resolve destination mac address for a given GID and return it to userspace. This patch was previously submitted by Knut Omang as a part of the patch set to support Oracle's Infiniband HCA (SIF). Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-13IB/rxe: Increase max number of completions to 32kYonatan Cohen1-1/+1
Increase limit of max CQE from 8K to 32K to allow demanding applications to work over SoftRoCE with same configuration as most RoCEv2 HW vendors have. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Hold refs when running taskletsAndrew Boyer3-1/+11
It might be possible for all of a QP's references to be dropped while one of that QP's tasklets is running. For example, the completer might run during QP destroy. If qp->valid is false, it will drop all of the packets on the resp_pkts list, potentially removing the last reference. Then it tries to advance the SQ consumer pointer. If the SQ's buffer has already been destroyed, the system will panic. To be safe, hold a reference on the QP for the duration of each tasklet. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Wait for tasklets to finish before tearing down QPAndrew Boyer2-0/+20
The system may crash when a malformed request is received and the error is detected by the responder. NodeA: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 50000 NodeB: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 1024 <NodeA_ip> The responder generates a receive error on node B since the incoming SEND is oversized. If the client tears down the QP before the responder or the completer finish running, a page fault may occur. The fix makes the destroy operation spin until the tasks complete, which appears to be original intent of the design. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Fix ref leak in duplicate_request()Andrew Boyer1-0/+1
A ref was added after the call to skb_clone(). Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Fix ref leak in rxe_create_qp()Andrew Boyer1-3/+4
The udata->inlen error path needs to clean up the ref added by rxe_alloc(). Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Add support for IB_CQ_REPORT_MISSED_EVENTSAndrew Boyer1-1/+9
Peek at the CQ after arming it so that we can return a hint. This avoids missed completions due to a race between posting CQEs and arming the CQ. For example, CM teardown waits on MAD requests to complete with ib_cq_poll_work(). Without this fix, the last completion might be left on the CQ, hanging the kthread doing the teardown. The console backtraces look like this: [ 4199.911284] Call Trace: [ 4199.911401] [<ffffffff9657fe95>] schedule+0x35/0x80 [ 4199.911556] [<ffffffff965830df>] schedule_timeout+0x22f/0x2c0 [ 4199.911727] [<ffffffff9657f7a8>] ? __schedule+0x368/0xa20 [ 4199.911891] [<ffffffff96580903>] wait_for_completion+0xb3/0x130 [ 4199.912067] [<ffffffff960a17e0>] ? wake_up_q+0x70/0x70 [ 4199.912243] [<ffffffffc074a06d>] cm_destroy_id+0x13d/0x450 [ib_cm] [ 4199.912422] [<ffffffff961615d5>] ? printk+0x57/0x73 [ 4199.912578] [<ffffffffc074a390>] ib_destroy_cm_id+0x10/0x20 [ib_cm] [ 4199.912759] [<ffffffffc076098c>] rdma_destroy_id+0xac/0x340 [rdma_cm] [ 4199.912941] [<ffffffffc076f2cc>] 0xffffffffc076f2cc Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Add support for zero-byte operationsAndrew Boyer2-3/+18
The last_psn algorithm fails in the zero-byte case: it calculates first_psn = N, last_psn = N-1. This makes the operation unretryable since the res structure will fail the (first_psn <= psn <= last_psn) test in find_resource(). While here, use BTH_PSN_MASK to mask the calculated last_psn. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Unblock loopback by moving skb_out incrementAndrew Boyer2-2/+2
skb_out is decremented in rxe_skb_tx_dtor(), which is not called in the loopback() path. Move the increment to the send() path rather than rxe_xmit_packet(). Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Don't update the response PSN unless it's going forwardsAndrew Boyer1-1/+2
A client might post a read followed by a send. The partner receives and acknowledges both transactions, posting an RCQ entry for the send, but something goes wrong with the read ACK. When the client retries the read, the partner's responder processes the duplicate read but incorrectly resets the PSN to the value preceding the original send. When the duplicate send arrives, the responder cannot tell that it is a duplicate, so the responder generates a duplicate RCQ entry, confusing the client. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Advance the consumer pointer before posting the CQEAndrew Boyer1-2/+3
A simple userspace application might poll the CQ, find a completion, and then attempt to post a new WQE to the SQ. A spurious error can occur if the userspace application detects a full SQ in the instant before the kernel is able to advance the SQ consumer pointer. This is noticeable when using single-entry SQs with ibv_rc_pingpong if lots of kernel and userspace library debugging is enabled. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Remove buffer used for printing IP addressAndrew Boyer1-6/+5
Avoid smashing the stack when an ICRC error occurs on an IPv6 network. Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Remove unneeded cast in rxe_srq_from_attr()Dan Carpenter1-1/+1
It makes me nervous when we cast pointer parameters. I would estimate that around 50% of the time, it indicates a bug. Here the cast is not needed becaue u32 and and unsigned int are the same thing. Removing the cast makes the code more robust and future proof in case any of the types change. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: Use DEFINE_SPINLOCK() for spinlockWei Yongjun1-5/+1
spinlock can be initialized automatically with DEFINE_SPINLOCK() rather than explicitly calling spin_lock_init(). Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Leon Romanosky <leonro@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-12IB/rxe: avoid putting a large struct rxe_qp on stackArnd Bergmann1-7/+7
A race condition fix added an rxe_qp structure to the stack in order to be able to perform rollback in rxe_requester(), but the structure is large enough to trigger the warning for possible stack overflow: drivers/infiniband/sw/rxe/rxe_req.c: In function 'rxe_requester': drivers/infiniband/sw/rxe/rxe_req.c:757:1: error: the frame size of 2064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] This changes the rollback function to only save the psn inside the qp, which is the only field we access in the rollback_qp anyway. Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11IB/rdmavt: Add a send completion helperMike Marciniszyn1-0/+17
This is for use by client drivers to drive send completions into a CQ. A new exported table allows for the mapping of ib_wr_opcode into a ib_wc_opcode. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11IB/hfi1: Use reference count wrapper for MRsSebastian Sanchez1-4/+4
Some parts of the code don't use the standard driver wrapper for memory region reference counters. Use the standard driver wrapper throughout the code. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11IB/hfi1: Replace qp->refcount release code with standard driver wrapperSebastian Sanchez1-3/+2
Some parts of the code don't use the standard release wrapper rvt_put_qp() for decrementing and testing the refcount to then try to use a resource. Replace this code with the standard driver wrapper. Fixes: Commit 4d6f85c3fa55 ("IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11IB/rdmavt: Add trace of MR segsMike Marciniszyn3-0/+117
Add tracing of MR segment information. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-11IB/rdmavt: Fix trace hierarchyDennis Dalessandro4-137/+312
Split rdmavt traces into separate files to preserve the original hierarchy since only one trace sub system may now be defined per header file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-03IB/rxe: Remove and fix debug prints after allocation failureLeon Romanovsky1-1/+0
The prints after [k|v][m|z|c]alloc() functions are not needed, because in case of failure, allocator will print their internal error prints anyway. Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16Merge branches 'hfi1' and 'mlx' into k.o/for-4.9-rcDoug Ledford5-14/+28
2016-11-16IB/rxe: Update qp state for user queryYonatan Cohen1-0/+1
The method rxe_qp_error() transitions QP to error state and make sure the QP is drained. It did not though update the QP state for user's query. This patch fixes this. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/rxe: Clear queue buffer when modifying QP to resetYonatan Cohen3-0/+12
RXE resets the send-q only once in rxe_qp_init_req() when QP is created, but when the QP is reused after QP reset, the send-q holds previous garbage data. This garbage data wrongly fails CQEs that otherwise should have completed successfully. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/rxe: Fix handling of erroneous WRYonatan Cohen1-8/+13
To correctly handle a erroneous WR this fix does the following 1. Make sure the bad WQE causes a user completion event. 2. Call rxe_completer to handle the erred WQE. Before the fix, when rxe_requester found a bad WQE, it changed its status to IB_WC_LOC_PROT_ERR and exit with 0 for non RC QPs. If this was the 1st WQE then there would be no ACK to invoke the completer and this bad WQE would be stuck in the QP's send-q. On top of that the requester exiting with 0 caused rxe_do_task to endlessly invoke rxe_requester, resulting in a soft-lockup attached below. In case the WQE was not the 1st and rxe_completer did get a chance to handle the bad WQE, it did not cause a complete event since the WQE's IB_SEND_SIGNALED flag was not set. Setting WQE status to IB_SEND_SIGNALED is subject to IBA spec version 1.2.1, section 10.7.3.1 Signaled Completions. NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [<ffffffffa0590145>] ? rxe_pool_get_index+0x35/0xb0 [rdma_rxe] [<ffffffffa05952ec>] lookup_mem+0x3c/0xc0 [rdma_rxe] [<ffffffffa0595534>] copy_data+0x1c4/0x230 [rdma_rxe] [<ffffffffa058c180>] rxe_requester+0x9d0/0x1100 [rdma_rxe] [<ffffffff8158e98a>] ? kfree_skbmem+0x5a/0x60 [<ffffffffa05962c9>] rxe_do_task+0x89/0xf0 [rdma_rxe] [<ffffffffa05963e2>] rxe_run_task+0x12/0x30 [rdma_rxe] [<ffffffffa059110a>] rxe_post_send+0x41a/0x550 [rdma_rxe] [<ffffffff811ef922>] ? __kmalloc+0x182/0x200 [<ffffffff816ba512>] ? down_read+0x12/0x40 [<ffffffffa054bd32>] ib_uverbs_post_send+0x532/0x540 [ib_uverbs] [<ffffffff815f8722>] ? tcp_sendmsg+0x402/0xb80 [<ffffffffa05453dc>] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs] [<ffffffff81623c2e>] ? inet_recvmsg+0x7e/0xb0 [<ffffffff8158764d>] ? sock_recvmsg+0x3d/0x50 [<ffffffff81215b87>] __vfs_write+0x37/0x140 [<ffffffff81216892>] vfs_write+0xb2/0x1b0 [<ffffffff81217ce5>] SyS_write+0x55/0xc0 [<ffffffff816bc672>] entry_SYSCALL_64_fastpath+0x1a/0xa Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-16IB/rxe: Fix kernel panic in UDP tunnel with GRO and RX checksumYonatan Cohen1-6/+2
Missing initialization of udp_tunnel_sock_cfg causes to following kernel panic, while kernel tries to execute gro_receive(). While being there, we converted udp_port_cfg to use the same initialization scheme as udp_tunnel_sock_cfg. ------------[ cut here ]------------ kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle kernel paging request at ffffffffa0588c50 IP: [<ffffffffa0588c50>] __this_module+0x50/0xffffffffffff8400 [ib_rxe] PGD 1c09067 PUD 1c0a063 PMD bb394067 PTE 80000000ad5e8163 Oops: 0011 [#1] SMP Modules linked in: ib_rxe ip6_udp_tunnel udp_tunnel CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc3+ #2 Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011 task: ffff880235e4e680 ti: ffff880235e68000 task.ti: ffff880235e68000 RIP: 0010:[<ffffffffa0588c50>] [<ffffffffa0588c50>] __this_module+0x50/0xffffffffffff8400 [ib_rxe] RSP: 0018:ffff880237343c80 EFLAGS: 00010282 RAX: 00000000dffe482d RBX: ffff8800ae330900 RCX: 000000002001b712 RDX: ffff8800ae330900 RSI: ffff8800ae102578 RDI: ffff880235589c00 RBP: ffff880237343cb0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ae33e262 R13: ffff880235589c00 R14: 0000000000000014 R15: ffff8800ae102578 FS: 0000000000000000(0000) GS:ffff880237340000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa0588c50 CR3: 0000000001c06000 CR4: 00000000000006e0 Stack: ffffffff8160860e ffff8800ae330900 ffff8800ae102578 0000000000000014 000000000000004e ffff8800ae102578 ffff880237343ce0 ffffffff816088fb 0000000000000000 ffff8800ae330900 0000000000000000 00000000ffad0000 Call Trace: <IRQ> [<ffffffff8160860e>] ? udp_gro_receive+0xde/0x130 [<ffffffff816088fb>] udp4_gro_receive+0x10b/0x2d0 [<ffffffff81611373>] inet_gro_receive+0x1d3/0x270 [<ffffffff81594e29>] dev_gro_receive+0x269/0x3b0 [<ffffffff81595188>] napi_gro_receive+0x38/0x120 [<ffffffffa011caee>] mlx5e_handle_rx_cqe+0x27e/0x340 [mlx5_core] [<ffffffffa011d076>] mlx5e_poll_rx_cq+0x66/0x6d0 [mlx5_core] [<ffffffffa011d7ae>] mlx5e_napi_poll+0x8e/0x400 [mlx5_core] [<ffffffff815949a0>] net_rx_action+0x160/0x380 [<ffffffff816a9197>] __do_softirq+0xd7/0x2c5 [<ffffffff81085c35>] irq_exit+0xf5/0x100 [<ffffffff816a8f16>] do_IRQ+0x56/0xd0 [<ffffffff816a6dcc>] common_interrupt+0x8c/0x8c <EOI> [<ffffffff81061f96>] ? native_safe_halt+0x6/0x10 [<ffffffff81037ade>] default_idle+0x1e/0xd0 [<ffffffff8103828f>] arch_cpu_idle+0xf/0x20 [<ffffffff810c37dc>] default_idle_call+0x3c/0x50 [<ffffffff810c3b13>] cpu_startup_entry+0x323/0x3c0 [<ffffffff81050d8c>] start_secondary+0x15c/0x1a0 RIP [<ffffffffa0588c50>] __this_module+0x50/0xffffffffffff8400 [ib_rxe] RSP <ffff880237343c80> CR2: ffffffffa0588c50 ---[ end trace 489ee31fa7614ac5 ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt ------------[ cut here ]------------ Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-15IB/hfi1: Optimize lkey validation structuresMike Marciniszyn1-5/+5
Profiling shows that the key validation is susceptible to cache line trading when accessing the lkey table. Fix by separating out the read mostly fields from the write fields. In addition the shift amount, which is function of the lkey table size, is precomputed and stored with the table pointer. Since both the shift and table pointer are in the same read mostly cacheline, this saves a cache line in this hot path. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-11-15IB/rdmavt: rdmavt can handle non aligned page mapsDennis Dalessandro1-3/+0
The initial code for rdmavt carried with it a restriction that was a vestige from the qib driver, that to dma map a page it had to be less than a page size. This is not the case on modern hardware, both qib and hfi1 will be just fine with unaligned map requests. This fixes a 4.8 regression where by an IPoIB transfer of > PAGE_SIZE will hang because the dma map page call always fails. This was introduced after commit 5faba5469522 ("IB/ipoib: Report SG feature regardless of HW UD CSUM capability") added the capability to use SG by default. Rather than override this, the HW supports it, so allow SG. Cc: Stable <stable@vger.kernel.org> # 4.8 Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-11kthread: kthread worker API cleanupPetr Mladek1-5/+5
A good practice is to prefix the names of functions by the name of the subsystem. The kthread worker API is a mix of classic kthreads and workqueues. Each worker has a dedicated kthread. It runs a generic function that process queued works. It is implemented as part of the kthread subsystem. This patch renames the existing kthread worker API to use the corresponding name from the workqueues API prefixed by kthread_: __init_kthread_worker() -> __kthread_init_worker() init_kthread_worker() -> kthread_init_worker() init_kthread_work() -> kthread_init_work() insert_kthread_work() -> kthread_insert_work() queue_kthread_work() -> kthread_queue_work() flush_kthread_work() -> kthread_flush_work() flush_kthread_worker() -> kthread_flush_worker() Note that the names of DEFINE_KTHREAD_WORK*() macros stay as they are. It is common that the "DEFINE_" prefix has precedence over the subsystem names. Note that INIT() macros and init() functions use different naming scheme. There is no good solution. There are several reasons for this solution: + "init" in the function names stands for the verb "initialize" aka "initialize worker". While "INIT" in the macro names stands for the noun "INITIALIZER" aka "worker initializer". + INIT() macros are used only in DEFINE() macros + init() functions are used close to the other kthread() functions. It looks much better if all the functions use the same scheme. + There will be also kthread_destroy_worker() that will be used close to kthread_cancel_work(). It is related to the init() function. Again it looks better if all functions use the same naming scheme. + there are several precedents for such init() function names, e.g. amd_iommu_init_device(), free_area_init_node(), jump_label_init_type(), regmap_init_mmio_clk(), + It is not an argument but it was inconsistent even before. [arnd@arndb.de: fix linux-next merge conflict] Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-06IB/rxe: improved debug prints & code cleanupParav Pandit14-88/+109
1. Debugging qp state transitions and qp errors in loopback and multiple QP tests is difficult without qp numbers in debug logs. This patch adds qp number to important debug logs. 2. Instead of having rxe: prefix in few logs and not having in few logs, using uniform module name prefix using pr_fmt macro. 3. Code cleanup for various warnings reported by checkpatch for incomplete unsigned data type, line over 80 characters, return statements. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-06rdma_rxe: Ensure rdma_rxe init occurs at correct timeStephen Bates1-1/+1
There is a problem when CONFIG_RDMA_RXE=y and CONFIG_IPV6=y. This results in the rdma_rxe initialization occurring before the IPv6 services are ready. This patch delays the initialization of rdma_rxe until after the IPv6 services are ready. This fix is based on one proposed by Logan Gunthorpe on a much older code base. Signed-off-by: Stephen Bates <sbates@raithlin.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-06IB/rxe: Properly honor max IRD value for rd/atomic.Parav Pandit3-13/+15
This patch honoris the max incoming read request count instead of outgoing read req count (a) during modify qp by allocating response queue metadata (b) during incoming read request processing Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-06IB/{rxe,core,rdmavt}: Fix kernel crash for reg MRParav Pandit2-0/+34
This patch fixes below kernel crash on memory registration for rxe and other transport drivers which has dma_ops extension. IB/core invokes ib_map_sg_attrs() in generic manner with dma attributes which is used by mlx5 and mthca adapters. However in doing so it ignored honoring dma_ops extension of software based transports for sg map/unmap operation. This results in calling dma_map_sg_attrs of hardware virtual device resulting in crash for null reference. We extend the core to support sg_map/unmap_attrs and transport drivers to implement those dma_ops callback functions. Verified usign perftest applications. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81032a75>] check_addr+0x35/0x60 ... Call Trace: [<ffffffff81032b39>] ? nommu_map_sg+0x99/0xd0 [<ffffffffa02b31c6>] ib_umem_get+0x3d6/0x470 [ib_core] [<ffffffffa01cc329>] rxe_mem_init_user+0x49/0x270 [rdma_rxe] [<ffffffffa01c793a>] ? rxe_add_index+0xca/0x100 [rdma_rxe] [<ffffffffa01c995f>] rxe_reg_user_mr+0x9f/0x130 [rdma_rxe] [<ffffffffa00419fe>] ib_uverbs_reg_mr+0x14e/0x2c0 [ib_uverbs] [<ffffffffa003d3ab>] ib_uverbs_write+0x15b/0x3b0 [ib_uverbs] [<ffffffff811e92a6>] ? mem_cgroup_commit_charge+0x76/0xe0 [<ffffffff811af0a9>] ? page_add_new_anon_rmap+0x89/0xc0 [<ffffffff8117e6c9>] ? lru_cache_add_active_or_unevictable+0x39/0xc0 [<ffffffff811f0da8>] __vfs_write+0x28/0x120 [<ffffffff811f1239>] ? rw_verify_area+0x49/0xb0 [<ffffffff811f1492>] vfs_write+0xb2/0x1b0 [<ffffffff811f27d6>] SyS_write+0x46/0xa0 [<ffffffff814f7d32>] entry_SYSCALL_64_fastpath+0x1a/0xa4 Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-06IB/rxe: Fix sending out loopback packet on netdev interface.Parav Pandit1-6/+6
Both prepare4 and prepare6 sets loopback mask in pkt_info structure instance of skb. The xmit_packet and other requester side functions use a pkt_info struct from the stack without the proper mask. This results in sending out the packet to the actual netdev device and loopback functionality is broken. Modify prepare() to pass its correctly marked pkt_info struct to prepare4() and prepare6() instead of them using SKB_TO_PKT(skb) and getting an incorrectly set mask. Verified with perftest applications. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-06IB/rxe: Avoid scheduling tasklet for userspace QPParav Pandit1-13/+25
This patch avoids scheduing tasklet for WQE and protocol processing for user space QP. It performs the task in calling process context. To improve code readability kernel specific post_send handling moved to post_send_kernel() function. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-03IB/rdmavt: Trivial function comment corrected.Parav Pandit1-1/+1
Corrected function name in comment from qib_ to rvt_. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt, IB/hfi1: Add lockdep asserts for lock debugMike Marciniszyn1-0/+8
This patch adds lockdep asserts in key code paths for insuring lock correctness. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt: Add qp init functionMike Marciniszyn1-42/+58
Add an rvt_qp_init() to initialize specific common fields as the qp is created or reset. The routine is shared by the rvt_reset_qp() and the rvt_create_qp(). The intent is that lock dep assertions will only appear in the rvt_reset_qp(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt: Move reset calldown to reset pathMike Marciniszyn1-6/+5
The reset calldown is misplaced. It should only be called in the code that actually transitions the QP to reset. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt: Correct sparse annotationMike Marciniszyn1-6/+3
The __must_hold() is sufficent to correct the sparse context imbalance inside a function. Per Documentation/sparse.txt: __must_hold - The specified lock is held on function entry and exit. Fixes: Commit c0a67f6ba356 ("IB/rdmavt: Annotate rvt_reset_qp()") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routinesMike Marciniszyn1-3/+2
This improves readability and hides the reference count mechanism from the client drivers. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rdmavt: Don't vfree a kzalloc'ed memory regionColin Ian King1-1/+1
The userspace memory region 'mr' is allocated with kzalloc in __rvt_alloc_mr however it is incorrectly being freed with vfree in __rvt_free_mr. Fix this by using kfree to free it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: Fix kmem_cache leakYonatan Cohen1-0/+13
Decrement qp reference when handling error path in completer to prevent kmem_cache leak. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: Fix race condition between requester and completerYonatan Cohen1-13/+44
rxe_requester() is sending a pkt with rxe_xmit_packet() and then calls rxe_update() to update the wqe and qp's psn values. But sometimes the response is received before the requester had time to update the wqe in which case the completer acts on errornous wqe values. This fix updates the wqe and qp before actually sending the request and rolls back when xmit fails. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: Fix duplicate atomic request handlingYonatan Cohen1-5/+6
When handling ack for atomic opcodes like "fetch&add" or "cmp&swp", the method send_atomic_ack() saves the ack before sending it, in case it gets lost and never reach the requester. In which case the method duplicate_request() will need to find it using the duplicated request.psn. But send_atomic_ack() used a wrong psn value and thus the above ack was never found. This fix uses the ack.psn to locate the ack in case its needed. This fix also copies the ack packet to the skb's control buffer since duplicate_request() will need it when calling rxe_xmit_packet() Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rxe: Fix kernel panic in udp_setup_tunnelYonatan Cohen3-34/+51
Disable creation of a UDP socket for ipv6 when CONFIG_IPV6 is not enabeld. Since udp_sock_create6() returns 0 when CONFIG_IPV6 is not set [ 46.888632] IP: [<c220705a>] setup_udp_tunnel_sock+0x6/0x4f [ 46.891355] *pdpt = 0000000000000000 *pde = f000ff53f000ff53 [ 46.893918] Oops: 0002 [#1] PREEMPT [ 46.896014] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-rc4-00001-g8700e3e #1 [ 46.900280] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014 [ 46.904905] task: cf06c040 ti: cf05e000 task.ti: cf05e000 [ 46.907854] EIP: 0060:[<c220705a>] EFLAGS: 00210246 CPU: 0 [ 46.911137] EIP is at setup_udp_tunnel_sock+0x6/0x4f [ 46.914070] EAX: 00000044 EBX: 00000001 ECX: cf05fef0 EDX: ca8142e0 [ 46.917236] ESI: c2c4505b EDI: cf05fef0 EBP: cf05fed0 ESP: cf05fed0 [ 46.919836] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068 [ 46.922046] CR0: 80050033 CR2: 000001fc CR3: 02cec000 CR4: 000006b0 [ 46.924550] Stack: [ 46.926014] cf05ff10 c1fd4657 ca8142e0 0000000a 00000000 00000000 0000b712 00000008 [ 46.931274] 00000000 6bb5bd01 c1fd48de 00000000 00000000 cf05ff1c 00000000 00000000 [ 46.936122] cf05ff1c c1fd4bdf 00000000 cf05ff28 c2c4507b ffffffff cf05ff88 c2bf1c74 [ 46.942350] Call Trace: [ 46.944403] [<c1fd4657>] rxe_setup_udp_tunnel+0x8f/0x99 [ 46.947689] [<c1fd48de>] ? net_to_rxe+0x4e/0x4e [ 46.950567] [<c1fd4bdf>] rxe_net_init+0xe/0xa4 [ 46.953147] [<c2c4507b>] rxe_module_init+0x20/0x4c [ 46.955448] [<c2bf1c74>] do_one_initcall+0x89/0x113 [ 46.957797] [<c2bf15eb>] ? set_debug_rodata+0xf/0xf [ 46.959966] [<c2bf1dbc>] ? kernel_init_freeable+0xbe/0x15b [ 46.962262] [<c2bf1ddc>] kernel_init_freeable+0xde/0x15b [ 46.964418] [<c232eb54>] kernel_init+0x8/0xd0 [ 46.966618] [<c2333122>] ret_from_kernel_thread+0xe/0x24 [ 46.969592] [<c232eb4c>] ? rest_init+0x6f/0x6f Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>