Age | Commit message (Collapse) | Author | Files | Lines |
|
Certain use cases want to use CLOCK_BOOTTIME or CLOCK_REALTIME rather than
CLOCK_MONOTONIC, instead of the default CLOCK_MONOTONIC.
Add an IORING_TIMEOUT_BOOTTIME and IORING_TIMEOUT_REALTIME flag that
allows timeouts and linked timeouts to use the selected clock source.
Only one clock source may be selected, and we -EINVAL the request if more
than one is given. If neither BOOTIME nor REALTIME are selected, the
previous default of MONOTONIC is used.
Link: https://github.com/axboe/liburing/issues/369
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io-wq divides work into two categories:
1) Work that completes in a bounded time, like reading from a regular file
or a block device. This type of work is limited based on the size of
the SQ ring.
2) Work that may never complete, we call this unbounded work. The amount
of workers here is just limited by RLIMIT_NPROC.
For various uses cases, it's handy to have the kernel limit the maximum
amount of pending workers for both categories. Provide a way to do with
with a new IORING_REGISTER_IOWQ_MAX_WORKERS operation.
IORING_REGISTER_IOWQ_MAX_WORKERS takes an array of two integers and sets
the max worker count to what is being passed in for each category. The
old values are returned into that same array. If 0 is being passed in for
either category, it simply returns the current value.
The value is capped at RLIMIT_NPROC. This actually isn't that important
as it's more of a hint, if we're exceeding the value then our attempt
to fork a new worker will fail. This happens naturally already if more
than one node is in the system, as these values are per-node internally
for io-wq.
Reported-by: Johannes Lundberg <johalun0@gmail.com>
Link: https://github.com/axboe/liburing/issues/420
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
req->buf_index is u16 and so we rely on registered buffers indexes
fitting into it. Add a build check, so when the upper limit for the
number of buffers is lifted we get a compliation fail but not lurking
problems.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/787e8e1a17cea51ca6301426b1c4c4887b8bd676.1629920396.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It's too easy to forget and misjudge about synchronisation in
io_req_task_cancel(), add a comment clarifying it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/71099083835f983a1fd73d5a3da6391924da8300.1629920396.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As we have a more complicated task referencing, which apart from normal
task references includes taking tctx->inflight and caching all that, it
would be a good idea to have all that isolated in helpers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d9114d037f1c195897aa13f38a496078eca2afdb.1630023531.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Given a linkchain like this:
req0(link_flag)-->req1(link_flag)-->...-->reqn(no link_flag)
There is a problem:
- if some intermediate linked req like req1 's submittion fails, reqs
after it won't be cancelled.
- sqpoll disabled: maybe it's ok since users can get the error info
of req1 and stop submitting the following sqes.
- sqpoll enabled: definitely a problem, the following sqes will be
submitted in the next round.
The solution is to refactor the code logic to:
- if a linked req's submittion fails, just mark it and the head(if it
exists) as REQ_F_FAIL. Leverage req->result to indicate whether it
is failed or cancelled.
- submit or fail the whole chain when we come to the end of it.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210827094609.36052-3-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
req_set_fail() in io_submit_sqe() is redundant, remove it.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210827094609.36052-2-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
It's not necessary to free the request back to slab when we fail to
get sqe, just move it to state->free_list.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210825175856.194299-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As done with open opcodes, allow accept to skip installing fd into
processes' file tables and put it directly into io_uring's fixed file
table. Same restrictions and design as for open.
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/6d16163f376fac7ac26a656de6b42199143e9721.1629888991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make io_accept() to handle file descriptor allocations and installation.
A preparation patch for bypassing file tables.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/5b73d204caa0ce979ccb98136695b60f52a3d98c.1629888991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of opening a file into a process's file table as usual and then
registering the fd within io_uring, some users may want to skip the
first step and place it directly into io_uring's fixed file table.
This patch adds such a capability for IORING_OP_OPENAT and
IORING_OP_OPENAT2.
The behaviour is controlled by setting sqe->file_index, where 0 implies
the old behaviour using normal file tables. If non-zero value is
specified, then it will behave as described and place the file into a
fixed file slot sqe->file_index - 1. A file table should be already
created, the slot should be valid and empty, otherwise the operation
will fail.
Keep the error codes consistent with IORING_OP_FILES_UPDATE, ENXIO and
EINVAL on inappropriate fixed tables, and return EBADF on collision with
already registered file.
Note: IOSQE_FIXED_FILE can't be used to switch between modes, because
accept takes a file, and it already uses the flag with a different
meaning.
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/e9b33d1163286f51ea707f87d95bd596dada1e65.1629888991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Introduce and reuse a helper that acts similarly to __sys_accept4_file()
but returns struct file instead of installing file descriptor. Will be
used by io_uring.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Acked-by: David S. Miller <davem@davemloft.net>
Link: https://lore.kernel.org/r/c57b9e8e818d93683a3d24f8ca50ca038d1da8c4.1629888991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
WARNING: CPU: 1 PID: 5870 at fs/io_uring.c:5975 io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975
CPU: 0 PID: 5870 Comm: iou-wrk-5860 Not tainted 5.14.0-rc6-next-20210820-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_try_cancel_userdata+0x30f/0x540 fs/io_uring.c:5975
Call Trace:
io_async_cancel fs/io_uring.c:6014 [inline]
io_issue_sqe+0x22d5/0x65a0 fs/io_uring.c:6407
io_wq_submit_work+0x1dc/0x300 fs/io_uring.c:6511
io_worker_handle_work+0xa45/0x1840 fs/io-wq.c:533
io_wqe_worker+0x2cc/0xbb0 fs/io-wq.c:582
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
io_try_cancel_userdata() can be called from io_async_cancel() executing
in the io-wq context, so the warning fires, which is there to alert
anyone accessing task->io_uring->io_wq in a racy way. However,
io_wq_put_and_exit() always first waits for all threads to complete,
so the only detail left is to zero tctx->io_wq after the context is
removed.
note: one little assumption is that when IO_WQ_WORK_CANCEL, the executor
won't touch ->io_wq, because io_wq_destroy() might cancel left pending
requests in such a way.
Cc: stable@vger.kernel.org
Reported-by: syzbot+b0c9d1588ae92866515f@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dfdd37a80cfa9ffd3e59538929c99cdd55d8699e.1629721757.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Employ inline completion logic for read/write completions done via
io_req_task_complete(). If ->uring_lock is contended, just do normal
request completion, but if not, make tctx_task_work() to grab the lock
and do batched inline completions in io_req_task_complete().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/94589c3ce69eaed86a21bb1ec696407a54fab1aa.1629286357.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Many task_work handlers either grab ->uring_lock, or may benefit from
having it. Move locking logic out of individual handlers to a lazy
approach controlled by tctx_task_work(), so we don't keep doing
tons of mutex lock/unlock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6a34e147f2507a2f3e2fa1e38a9c541dcad3929.1629286357.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_fallback_req_func() doesn't expect anyone creating inline
completions, and no one currently does that. Teach the function to flush
completions preparing for further changes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8b941516921f72e1a64d58932d671736892d7fff.1629286357.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
->splice_fd_in is used only by splice/tee, but no other request checks
it for validity. Add the check for most of request types excluding
reads/writes/sends/recvs, we don't want overhead for them and can leave
them be as is until the field is actually used.
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f44bc2acd6777d932de3d71a5692235b5b2b7397.1629451684.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We've previously had an issue where overflow flush unconditionally calls
io_cqring_ev_posted() even if it didn't flush any events to the ring,
causing wake and eventfd increment where no new events are available.
Some applications don't like that, see commit b18032bb0a88 for details.
This came up in discussion for another patch recently, hence add a
comment detailing what the relationship between calling the events
posted helper and CQ ring entries is.
Link: https://lore.kernel.org/io-uring/77a44fce-c831-16a6-8e80-9aee77f496a2@kernel.dk/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fixed tables may be large enough, place all of them together with
allocated tags under memcg limits.
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b3ac9f5da9821bb59837b5fe25e8ef4be982218c.1629451684.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Limit the number of files in io_uring fixed tables by RLIMIT_NOFILE,
that's the first and the simpliest restriction that we should impose.
Cc: stable@vger.kernel.org
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b2756c340aed7d6c0b302c26dab50c6c5907f4ce.1629451684.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
coml_nr in ctx_flush_and_put() is not protected by uring_lock, this
may cause problems when accessing in parallel:
say coml_nr > 0
ctx_flush_and put other context
if (compl_nr) get mutex
coml_nr > 0
do flush
coml_nr = 0
release mutex
get mutex
do flush (*)
release mutex
in (*) place, we call io_cqring_ev_posted() and users likely get
no events there. To avoid spurious events, re-check the value when
under the lock.
Fixes: 2c32395d8111 ("io_uring: fix __tctx_task_work() ctx race")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210820221954.61815-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now allocated rsrc table uses PAGE_SIZE as the size of 2nd-level, and
accessing this table relies on each level index from fixed TABLE_SHIFT
(12 - 3) in 4k page case. In order to correctly work in non-4k page,
define TABLE_SHIFT as non-fixed (PAGE_SHIFT - shift of data) for
2nd-level table entry number.
Signed-off-by: wangyangbo <wangyangbo@uniontech.com>
Link: https://lore.kernel.org/r/20210819055657.27327-1-wangyangbo@uniontech.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now with IRQ completions done via IRQ, almost all requests freeing
are done from the context of submitter task, so it makes sense to
extend task_put optimisation from io_req_free_batch_finish() to cover
all the cases including task_work by moving it into io_put_task().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/824a7cbd745ddeee4a0f3ff85c558a24fd005872.1629302453.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have two checks of task->flags & PF_EXITING left:
1) In io_req_task_submit(), which is called in task_work and hence always
in the context of the original task. That means that
req->task == current, and hence checking ->flags is totally fine.
2) In io_poll_rewait(), where we need to stop re-arming poll to prevent
it interfering with cancelation. This is only run from task_work as
well, and hence for this case too req->task == current.
Add a comment to both spots detailing that.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We don't need to protect nr_running and worker_refs by wqe->lock, so
narrow the range of raw_spin_lock_irq - raw_spin_unlock_irq
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210810125554.99229-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_timeout_cancel() posts CQEs so needs ->completion_lock to be held,
so grab it in io_timeout_remove().
Fixes: 48ecb6369f1f2 ("io_uring: run timeouts from task_work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6f03d653a4d7bf693ef6f39b6a426b6d97fd96f.1629280204.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move earlier the check for whether __io_queue_proc() tries to poll
already polled waitqueue, and do the same for the second poll entry, if
any. Shouldn't really matter, but at least it would have a more
predictable behaviour.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8cb428cfe8ade0fd055859fabb878db8777d4c2f.1629228203.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have io_req_complete_post() to post a CQE and put the request. It
takes care of all synchronisation and is more concise and efficent, so
replace all hancoded occurrences of
"lock; post CQE; unlock; + put_req()" with io_req_complete_post().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2c83463458a613f9d870e5147eb134da2aa70779.1629228203.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make io_put_rw_kbuf() to do the REQ_F_BUFFER_SELECTED check, so all the
callers don't need to hand code it. The number of places where we call
io_put_rw_kbuf() is growing, so saves some pain.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3df3919e5e7efe03420c44ab4d9317a81a9cf398.1629228203.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Linked timeout handling during issuing is heavy, it adds extra
instructions and forces to save the next linked timeout before
io_issue_sqe().
Follwing the same reasoning as in refcounting patches, a request can't
be freed by the time it returns from io_issue_sqe(), so now we don't
need to do io_prep_linked_timeout() in advance, and it can be delayed to
colder paths optimising the generic path.
Also, it should also save quite a lot for requests with linked timeouts
and completed inline on timeout spinlocking + hrtimer_start() +
hrtimer_try_to_cancel() and so on.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/19bfc9a0d26c5c5f1e359f7650afe807ca8ef879.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Adjust io_disarm_next(), so it can detect if there is a linked but
not-yet-armed timeout and complete/cancel it separately. Will be used in
the following patch.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ae228cde2c0df3d92d29d5e4852ed9fa8a2a97db.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The link test in io_prep_linked_timeout() is pretty bulky, replace it
with a flag. It's better for normal path and linked requests, and also
will be used further for request failing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3703770bfae8bc1ff370e43ef5767940202cab42.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of handling double consecutive linked timeouts through tricky
flag combinations, just check the submit_state.link during timeout_prep
and fail that case in advance.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/04150760b0dc739522264b8abd309409f7421a06.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
IORING_OP_ASYNC_CANCEL and IORING_OP_LINK_TIMEOUT have enough of
overlap, so extract a helper for request cancellation and use in both.
Also, removes some amount of ugliness because of success_ret.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/900122b588e65b637e71bfec80a260726c6a54d6.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
773af69121ecc ("io_uring: always reissue from task_work context") makes
all resubmission to be made from task_work, so we don't need that hack
with resubmit/not-resubmit switch anymore.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/47fa177cca04e5ffd308a35227966c8e15d8525b.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Linked timeouts are never refcounted when it comes to the first call to
__io_prep_linked_timeout(), so save an io_ref_get() and set the desired
value directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/177b24cc62ffbb42d915d6eb9e8876266e4c0d5a.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Tracking linked timeouts as infligh was needed to make sure that io-wq
is not destroyed by io_uring_cancel_generic() racing with
io_async_cancel_one() accessing it. Now, cancellations issued by linked
timeouts are done in the task context, so it's already synchronised.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1b05cf47cb69df2305efdbee8cf7ba36f46c1a3.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If a requests is forwarded into io-wq, there is a good chance it hasn't
been refcounted yet and we can save one req_ref_get() by setting the
refcount number to the right value directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2d53f4449faaf73b4a4c5de667fc3c176d974860.1628981736.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_req_free_batch() has a __must_hold annotation referencing a
request being passed in, but we're passing in the context.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We can merge two spin_unlock() operations to one since we removed some
code not long ago.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When doing cancellation, we use a parameter to indicate where it's from
do_exit or exec. So a boolean value is good enough for this, remove the
struct files* as it is not necessary.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
[axboe: fixup io_uring_files_cancel for !CONFIG_IO_URING]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Extract io_uring_files_cancel() call in io_uring_task_cancel() to make
io_uring_files_cancel() and io_uring_task_cancel() coherent and easy to
read.
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
io_prep_linked_timeout() grew too heavy and compiler now refuse to
inline the function. Help it by splitting in two and annotating with
inline.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/560636717a32e9513724f09b9ecaace942dde4d4.1628705069.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As submission references are gone, there is only one initial reference
left. Instead of actually doing atomic refcounting, add a flag
indicating whether we're going to take more refs or doing any other sync
magic. The flag should be set before the request may get used in
parallel.
Together with the previous patch it saves 2 refcount atomics per request
for IOPOLL and IRQ completions, and 1 atomic per req for inline
completions, with some exceptions. In particular, currently, there are
three cases, when the refcounting have to be enabled:
- Polling, including apoll. Because double poll entries takes a ref.
Might get relaxed in the near future.
- Link timeouts, enabled for both, the timeout and the request it's
bound to, because they work in-parallel and we need to synchronise
to cancel one of them on completion.
- When a request gets in io-wq, because it doesn't hold uring_lock and
we need guarantees of submission references.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8b204b6c5f6643062270a1913d6d3a7f8f795fd9.1628705069.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Requests are by default given with two references, submission and
completion. Completion references are straightforward, they represent
request ownership and are put when a request is completed or so.
Submission references are a bit more trickier. They're needed when
io_issue_sqe() followed deep into the submission stack (e.g. in fs,
block, drivers, etc.), request may have given away for concurrent
execution or already completed, and the code unwinding back to
io_issue_sqe() may be accessing some pieces of our requests, e.g.
file or iov.
Now, we prevent such async/in-depth completions by pushing requests
through task_work. Punting to io-wq is also done through task_works,
apart from a couple of cases with a pretty well known context. So,
there're two cases:
1) io_issue_sqe() from the task context and protected by ->uring_lock.
Either requests return back to io_uring or handed to task_work, which
won't be executed because we're currently controlling that task. So,
we can be sure that requests are staying alive all the time and we don't
need submission references to pin them.
2) io_issue_sqe() from io-wq, which doesn't hold the mutex. The role of
submission reference is played by io-wq reference, which is put by
io_wq_submit_work(). Hence, it should be fine.
Considering that, we can carefully kill the submission reference.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6b68f1c763229a590f2a27148aee77767a8d7750.1628705069.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Soon, we won't need to put several references at once, remove
req_ref_sub_and_test() and @nr argument from io_put_req_deferred(),
and put the rest of the references by hand.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1868c7554108bff9194fb5757e77be23fadf7fc0.1628705069.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move all request refcount helpers to avoid forward declarations in the
future.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/89fd36f6f3fe5b733dfe4546c24725eee40df605.1628705069.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We have no hard/soft IRQ users of this lock left, remove any IRQ
disabling/saving and restoring when grabbing this lock.
This is straight forward with no users entering with IRQs disabled
anymore, the only thing to look out for is the waitqueue poll head
lock which nests inside the completion lock. That needs IRQs disabled,
and hence we have to do that now instead of relying on the outer lock
doing so.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is in preparation to making the completion lock work outside of
hard/soft IRQ context.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is in preparation to making the completion lock work outside of
hard/soft IRQ context.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|