Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a bvec array to struct xdr_buf, and have the client allocate it
when we need to receive data into pages.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the RPC call relies on the receive call allocating pages as buffers,
then let's label it so that we
a) Don't leak memory by allocating pages for requests that do not expect
this behaviour
b) Can optimise for the common case where calls do not require allocation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We no longer need priority semantics on the xprt->sending queue, because
the order in which tasks are sent is now dictated by their position in
the send queue.
Note that the backlog queue remains a priority queue, meaning that
slot resources are still managed in order of task priority.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Fix up the priority queue to not batch by owner, but by queue, so that
we allow '1 << priority' elements to be dequeued before switching to
the next priority queue.
The owner field is still used to wake up requests in round robin order
by owner to avoid single processes hogging the RPC layer by loading the
queues.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the server is slow, we can find ourselves with quite a lot of entries
on the receive queue. Converting the search from an O(n) to O(log(n))
can make a significant difference, particularly since we have to hold
a number of locks while searching.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The theory was that we would need to grab the socket lock anyway, so we
might as well use it to gate the allocation of RPC slots for a TCP
socket.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
This no longer causes them to lose their place in the transmission queue.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Rather than forcing each and every RPC task to grab the socket write
lock in order to send itself, we allow whichever task is holding the
write lock to attempt to drain the entire transmit queue.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Avoid memory starvation by giving RPCs that are tagged with the
RPC_TASK_SWAPPER flag the highest priority.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Both RDMA and UDP transports require the request to get a "congestion control"
credit before they can be transmitted. Right now, this is done when
the request locks the socket. We'd like it to happen when a request attempts
to be transmitted for the first time.
In order to support retransmission of requests that already hold such
credits, we also want to ensure that they get queued first, so that we
don't deadlock with requests that have yet to obtain a credit.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
One of the intentions with the priority queues was to ensure that no
single process can hog the transport. The field task->tk_owner therefore
identifies the RPC call's origin, and is intended to allow the RPC layer
to organise queues for fairness.
This commit therefore modifies the transmit queue to group requests
by task->tk_owner, and ensures that we round robin among those groups.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Remove the checks for whether or not we need to transmit, and whether
or not a reply has been received. Those are already handled in
call_transmit() itself.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the request is still on the queue, this will be incorrect behaviour.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When we shift to using the transmit queue, then the task that holds the
write lock will not necessarily be the same as the one being transmitted.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Fix up the back channel code to recognise that it has already been
transmitted, so does not need to be called again.
Also ensure that we set req->rq_task.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Move the call encoding so that it occurs before the transport connection
etc.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add the queue that will enforce the ordering of RPC task transmission.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When storing a struct rpc_rqst on the slot allocation list, we currently
use the same field 'rq_list' as we use to store the request on the
receive queue. Since the structure is never on both lists at the same
time, this is OK.
However, for clarity, let's make that a union with different names for
the different lists so that we can more easily distinguish between
the two states.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Allow the caller in clnt.c to call into the code to wait for a reply
after calling xprt_transmit(). Again, the reason is that the backchannel
code does not need this functionality.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Separate out the action of adding a request to the reply queue so that the
backchannel code can simply skip calling it altogether.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We will use the same lock to protect both the transmit and receive queues.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Rather than waking up the entire queue of RPC messages a second time,
just wake up the task that was put to sleep.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When asked to wake up an RPC task, it makes sense to test whether or not
the task is still queued.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add a helper that will wake up a task that is sleeping on a specific
queue, and will set the value of task->tk_status. This is mainly
intended for use by the transport layer to notify the task of an
error condition.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We are going to need to pin for both send and receive.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the previous message was only partially transmitted, we need to close
the socket in order to avoid corruption of the message stream. To do so,
we currently hijack the unlocking of the socket in order to schedule
the close.
Now that we track the message offset in the socket state, we can move
that kind of checking out of the socket lock code, which is needed to
allow messages to remain queued after dropping the socket lock.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Rather than resetting state variables in socket state_change() callback,
do it in the sunrpc TCP connect function itself.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Since we will want to introduce similar TCP state variables for the
transmission of requests, let's rename the existing ones to label
that they are for the receive side.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Currently, we grab the socket bit lock before we allow the message
to be XDR encoded. That significantly slows down the transmission
rate, since we serialise on a potentially blocking operation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add states to indicate that the message send and receive are not yet
complete.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If a message has been encoded using RPCSEC_GSS, the server is
maintaining a window of sequence numbers that it considers valid.
The client should normally be tracking that window, and needs to
verify that the sequence number used by the message being transmitted
still lies inside the window of validity.
So far, we've been able to assume this condition would be realised
automatically, since the client has been encoding the message only
after taking the socket lock. Once we change that condition, we
will need the explicit check.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Move the initialisation back into xprt.c.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
|
|
Commit 51c1e9b554c9 ("auxdisplay: Move panel.c to drivers/auxdisplay folder")
moved the file, but the MAINTAINERS reference was not updated.
Link: https://lore.kernel.org/lkml/20180928220131.31075-1-joe@perches.com/
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
|
|
There is currently a warning when building the Kryo cpufreq driver into
the kernel image:
WARNING: vmlinux.o(.text+0x8aa424): Section mismatch in reference from
the function qcom_cpufreq_kryo_probe() to the function
.init.text:qcom_cpufreq_kryo_get_msm_id()
The function qcom_cpufreq_kryo_probe() references
the function __init qcom_cpufreq_kryo_get_msm_id().
This is often because qcom_cpufreq_kryo_probe lacks a __init
annotation or the annotation of qcom_cpufreq_kryo_get_msm_id is wrong.
Remove the '__init' annotation from qcom_cpufreq_kryo_get_msm_id
so that there is no more mismatch warning.
Additionally, Nick noticed that the remove function was marked as
'__init' when it should really be marked as '__exit'.
Fixes: 46e2856b8e18 (cpufreq: Add Kryo CPU scaling driver)
Fixes: 5ad7346b4ae2 (cpufreq: kryo: Add module remove and exit)
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.18+ <stable@vger.kernel.org> # 4.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
It is possible that a failure can occur during the scheduling of a
pinned event. The initial portion of perf_event_read_local() contains
the various error checks an event should pass before it can be
considered valid. Ensure that the potential scheduling failure
of a pinned event is checked for and have a credible error.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/6486385d1f30336e9973b24c8c65f5079543d3d3.1537377064.git.reinette.chatre@intel.com
|
|
Commit a46b53672b2c2e3770b38a4abf90d16364d2584b ("xen/blkfront: cleanup
stale persistent grants") introduced a regression as purged persistent
grants were not pu into the list of free grants again. Correct that.
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix didn't work for all cases, reverting to add a (hopefully)
better fix.
This reverts commit f151ba989d149bbdfc90e5405724bbea094f9b17.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit b2d35fa5fc80 ("selftests: add headers_install to lib.mk")
introduced a requirement that Makefiles more than one level below the
selftests directory need to define top_srcdir, but it didn't update
any of the powerpc Makefiles.
This broke building all the powerpc selftests with eg:
make[1]: Entering directory '/src/linux/tools/testing/selftests/powerpc'
BUILD_TARGET=/src/linux/tools/testing/selftests/powerpc/alignment; mkdir -p $BUILD_TARGET; make OUTPUT=$BUILD_TARGET -k -C alignment all
make[2]: Entering directory '/src/linux/tools/testing/selftests/powerpc/alignment'
../../lib.mk:20: ../../../../scripts/subarch.include: No such file or directory
make[2]: *** No rule to make target '../../../../scripts/subarch.include'.
make[2]: Failed to remake makefile '../../../../scripts/subarch.include'.
Makefile:38: recipe for target 'alignment' failed
Fix it by setting top_srcdir in the affected Makefiles.
Fixes: b2d35fa5fc80 ("selftests: add headers_install to lib.mk")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
trace_block_unplug() takes true for explicit unplugs and false for
implicit unplugs. schedule() unplugs are implicit and should be
reported as timer unplugs. While correct in the legacy code, this has
been inverted in blk-mq since 4.11.
Cc: stable@vger.kernel.org
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When dax_lock_mapping_entry() has to sleep to obtain entry lock, it will
fail to unlock mapping->i_pages spinlock and thus immediately deadlock
against itself when retrying to grab the entry lock again. Fix the
problem by unlocking mapping->i_pages before retrying.
Fixes: c2a7d2a11552 ("filesystem-dax: Introduce dax_lock_mapping_entry()")
Reported-by: Barret Rhoden <brho@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Commit
1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active")
can occasionally cause system resets when kexec-ing a second kernel even
if SEV is not active.
That's because get_sev_encryption_bit() uses 32-bit rIP-relative
addressing to read the value of enc_bit - a variable which caches a
previously detected encryption bit position - but kexec may allocate
the early boot code to a higher location, beyond the 32-bit addressing
limit.
In this case, garbage will be read and get_sev_encryption_bit() will
return the wrong value, leading to accessing memory with the wrong
encryption setting.
Therefore, remove enc_bit, and thus get rid of the need to do 32-bit
rIP-relative addressing in the first place.
[ bp: massage commit message heavily. ]
Fixes: 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active")
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: brijesh.singh@amd.com
Cc: kexec@lists.infradead.org
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: ghook@redhat.com
Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
|
|
After write SSD completed, bcache schedules journal_write work to
system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM
flag. system_wq is also a bound wq, and there may be no idle kworker on
current processor. Creating a new kworker may unfortunately need to
reclaim memory first, by shrinking cache and slab used by vfs, which
depends on bcache device. That's a deadlock.
This patch create a new workqueue for journal_write with WQ_MEM_RECLAIM
flag. It's rescuer thread will work to avoid the deadlock.
Signed-off-by: Guoju Fang <fangguoju@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|