Age | Commit message (Collapse) | Author | Files | Lines |
|
This commit adds a cache eviction algorithm for the SDMA
user buffer cache.
Besides the interval RB tree used for node lookup, the cache
nodes are also arranged in a doubly-linked list. When a node is
used, it is put at the beginning of the list. Less frequently
used nodes naturally move to the tail of the list.
When the cache limit is reached, the eviction code starts
traversing the linked list in reverse, freeing buffers until
enough space has been freed to fit the new user buffer. This
guarantees that only the least used cache nodes will be removed
from the cache.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Add support for caching of user buffers used for SDMA
transfers. This change improves performance by
avoiding repeatedly pinning the pages of buffers, which
are being re-used by the application.
While the cost of the pinning operation has been made
heavier by adding the extra code to search the cache tree,
re-allocate pages arrays, and future cache evictions,
that cost will be amortized against the savings when the
same buffer is re-used. It is also worth noting that in
most cases, the cost of pinning should be much lower due
to the buffer already being in the cache.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Fix the header by moving the copyright notice out of the license text
and to the top of the header. Also, update the copyright date.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
To facilitate locked page counting, the user SDMA
routines would maintain a list of io vectors, which
were freed in the completion callback and then unpin
the associated pages during the next call into the
kernel.
Since the size of this list was unbounded, doing this
was bad for performance because the driver ended up
spending too much time freeing the io vectors.
This commit changes how the io vector freeing is done
by moving the actual page unpinning in the callback and
maintaining a count of unpinned pages. This count can
then be used during the next call into the kernel to
update the mm->pinned_vm variable (since that requires
process context and the ability to sleep.)
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
Commit a0d406934a46 ("staging/rdma/hfi1: Add page lock limit
check for SDMA requests") added a mechanism to
delay the clean-up of user SDMA requests in order to facilitate
proper locked page counting.
This delayed processing was done using a kernel workqueue, which
meant that a kernel thread would have to spin up and take CPU
cycles to do the clean-up.
This proved detrimental to performance because now there are two
execution threads (the kernel workqueue and the user process)
needing cycles on the same CPU.
Performance-wise, it is much better to do as much of the clean-up
as can be done in interrupt context (during the callback) and do
the remaining work in-line during subsequent calls of the user
process into the driver.
The changes required to implement the above also significantly
simplify the entire SDMA completion processing code and eliminate
a memory corruption causing the following observed crash:
[ 2881.703362] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2881.703389] IP: [<ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x18e0 [hfi1]
[ 2881.703422] PGD 7d4d25067 PUD 77d96d067 PMD 0
[ 2881.703427] Oops: 0000 [#1] SMP
[ 2881.703431] Modules linked in:
[ 2881.703504] CPU: 28 PID: 6668 Comm: mpi_stress Tainted: G OENX 3.12.28-4-default #1
[ 2881.703508] Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0044.090
[ 2881.703512] task: ffff88077da8e0c0 ti: ffff880856772000 task.ti: ffff880856772000
[ 2881.703515] RIP: 0010:[<ffffffffa02897e4>] [<ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x
[ 2881.703529] RSP: 0018:ffff880856773c48 EFLAGS: 00010287
[ 2881.703531] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000002000
[ 2881.703534] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000002000
[ 2881.703537] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[ 2881.703540] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 2881.703543] R13: 0000000000000000 R14: ffff88071e782e68 R15: ffff8810532955c0
[ 2881.703546] FS: 00007f9c4375e700(0000) GS:ffff88107eec0000(0000) knlGS:0000000000000000
[ 2881.703549] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2881.703551] CR2: 0000000000000000 CR3: 00000007d4cba000 CR4: 00000000003407e0
[ 2881.703554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2881.703556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2881.703558] Stack:
[ 2881.703559] ffffffff00002000 ffff881000001800 ffffffff00000000 00000000000080d0
[ 2881.703570] 0000000000000000 0000200000000000 0000000000000000 ffff88071e782db8
[ 2881.703580] ffff8807d4d08d80 ffff881053295600 0000000000000008 ffff88071e782fc8
[ 2881.703589] Call Trace:
[ 2881.703691] [<ffffffffa028b5da>] hfi1_user_sdma_process_request+0x84a/0xab0 [hfi1]
[ 2881.703777] [<ffffffffa0255412>] hfi1_aio_write+0xd2/0x110 [hfi1]
[ 2881.703828] [<ffffffff8119e3d8>] do_sync_readv_writev+0x48/0x80
[ 2881.703837] [<ffffffff8119f78b>] do_readv_writev+0xbb/0x230
[ 2881.703843] [<ffffffff8119fab8>] SyS_writev+0x48/0xc0
This commit also addresses issues related to notification of user
processes of SDMA request slot availability. The slot should be
cleaned up first before the user processes is notified of its
availability.
Reviewed-by: Arthur Kepner <arthur.kepner@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|
|
The driver pins pages on behalf of user processes in two
separate instances - when the process has submitted a
SDMA transfer and when the process programs an expected
receive buffer.
When pinning pages, the driver is required to observe the
locked page limit set by the system administrator and refuse
to lock more pages than allowed. Such a check was done for
expected receives but was missing from the SDMA transfer
code path.
This commit adds the missing check for SDMA transfers. As of
this commit, user SDMA or expected receive requests will be
rejected if the number of pages required to be pinned will
exceed the set limit.
Due to the fact that the driver needs to take the MM semaphore
in order to update the locked page count (which can sleep), this
cannot be done by the callback function as it [the callback] is
executed in interrupt context. Therefore, it is necessary to put
all the completed SDMA tx requests onto a separate list (txcmp) and
offload the actual clean-up and unpinning work to a workqueue.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In preparation of implementing TID caching move EXP_TID macros to a common
header user_exp_rcv.h
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Signed-off-by: Andrew Friedley <andrew.friedley@intel.com>
Signed-off-by: Arthur Kepner <arthur.kepner@intel.com>
Signed-off-by: Brendan Cunningham <brendan.cunningham@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Signed-off-by: John Gregor <john.a.gregor@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Kevin Pine <kevin.pine@intel.com>
Signed-off-by: Kyle Liddell <kyle.liddell@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ravi Krishnaswamy <ravi.krishnaswamy@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Vlad Danushevsky <vladimir.danusevsky@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
|