aboutsummaryrefslogtreecommitdiffstats
path: root/fs/io_uring.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-12-20io_uring: pass in 'sqe' to the prep handlersJens Axboe1-242/+251
This moves the prep handlers outside of the opcode handlers, and allows us to pass in the sqe directly. If the sqe is non-NULL, it means that the request should be prepared for the first time. With the opcode handlers not having access to the sqe at all, we are guaranteed that the prep handler has setup the request fully by the time we get there. As before, for opcodes that need to copy in more data then the io_kiocb allows for, the io_async_ctx holds that info. If a prep handler is invoked with req->io set, it must use that to retain information for later. Finally, we can remove io_kiocb->sqe as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: standardize the prep methodsJens Axboe1-65/+63
We currently have a mix of use cases. Most of the newer ones are pretty uniform, but we have some older ones that use different calling calling conventions. This is confusing. For the opcodes that currently rely on the req->io->sqe copy saving them from reuse, add a request type struct in the io_kiocb command union to store the data they need. Prepare for all opcodes having a standard prep method, so we can call it in a uniform fashion and outside of the opcode handler. This is in preparation for passing in the 'sqe' pointer, rather than storing it in the io_kiocb. Once we have uniform prep handlers, we can leave all the prep work to that part, and not even pass in the sqe to the opcode handler. This ensures that we don't reuse sqe data inadvertently. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: read 'count' for IORING_OP_TIMEOUT in prep handlerJens Axboe1-3/+8
Add the count field to struct io_timeout, and ensure the prep handler has read it. Timeout also needs an async context always, set it up in the prep handler if we don't have one. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handlerJens Axboe1-31/+33
Add struct io_sr_msg in our io_kiocb per-command union, and ensure that the send/recvmsg prep handlers have grabbed what they need from the SQE by the time prep is done. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: move all prep state for IORING_OP_CONNECT to prep handlerJens Axboe1-18/+22
Add struct io_connect in our io_kiocb per-command union, and ensure that io_connect_prep() has grabbed what it needs from the SQE. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: add and use struct io_rw for read/writesJens Axboe1-46/+50
Put the kiocb in struct io_rw, and add the addr/len for the request as well. Use the kiocb->private field for the buffer index for fixed reads and writes. Any use of kiocb->ki_filp is flipped to req->file. It's the same thing, and less confusing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20io_uring: use u64_to_user_ptr() consistentlyJens Axboe1-9/+7
We use it in some spots, but not consistently. Convert the rest over, makes it easier to read as well. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-18io_uring: io_wq_submit_work() should not touch req->rwJens Axboe1-3/+8
I've been chasing a weird and obscure crash that was userspace stack corruption, and finally narrowed it down to a bit flip that made a stack address invalid. io_wq_submit_work() unconditionally flips the req->rw.ki_flags IOCB_NOWAIT bit, but since it's a generic work handler, this isn't valid. Normal read/write operations own that part of the request, on other types it could be something else. Move the IOCB_NOWAIT clear to the read/write handlers where it belongs. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-18io_uring: don't wait when under-submittingPavel Begunkov1-0/+4
There is no reliable way to submit and wait in a single syscall, as io_submit_sqes() may under-consume sqes (in case of an early error). Then it will wait for not-yet-submitted requests, deadlocking the user in most cases. Don't wait/poll if can't submit all sqes Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: warn about unhandled opcodeJens Axboe1-2/+6
Now that we have all the opcodes handled in terms of command prep and SQE reuse, add a printk_once() to warn about any potentially new and unhandled ones. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: read opcode and user_data from SQE exactly onceJens Axboe1-25/+20
If we defer a request, we can't be reading the opcode again. Ensure that the user_data and opcode fields are stable. For the user_data we already have a place for it, for the opcode we can fill a one byte hold and store that as well. For both of them, assign them when we originally read the SQE in io_get_sqring(). Any code that uses sqe->opcode or sqe->user_data is switched to req->opcode and req->user_data. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: make IORING_OP_TIMEOUT_REMOVE deferrableJens Axboe1-10/+34
If we defer this command as part of a link, we have to make sure that the SQE data has been read upfront. Integrate the timeout remove op into the prep handling to make it safe for SQE reuse. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: make IORING_OP_CANCEL_ASYNC deferrableJens Axboe1-4/+28
If we defer this command as part of a link, we have to make sure that the SQE data has been read upfront. Integrate the async cancel op into the prep handling to make it safe for SQE reuse. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrableJens Axboe1-14/+54
If we defer these commands as part of a link, we have to make sure that the SQE data has been read upfront. Integrate the poll add/remove into the prep handling to make it safe for SQE reuse. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: make HARDLINK imply LINKPavel Begunkov1-1/+1
The rules are as follows, if IOSQE_IO_HARDLINK is specified, then it's a link and there is no need to set IOSQE_IO_LINK separately, though it could be there. Add proper check and ensure that IOSQE_IO_HARDLINK implies IOSQE_IO_LINK. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: any deferred command must have stable sqe dataJens Axboe1-49/+172
We're currently not retaining sqe data for accept, fsync, and sync_file_range. None of these commands need data outside of what is directly provided, hence it can't go stale when the request is deferred. However, it can get reused, if an application reuses SQE entries. Ensure that we retain the information we need and only read the sqe contents once, off the submission path. Most of this is just moving code into a prep and finish function. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: remove 'sqe' parameter to the OP helpers that take itJens Axboe1-36/+44
We pass in req->sqe for all of them, no need to pass it in as the request is always passed in. This is a necessary prep patch to be able to cleanup/fix the request prep path. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-17io_uring: fix pre-prepped issue with force_nonblock == trueJens Axboe1-77/+98
Some of these code paths assume that any force_nonblock == true issue is not prepped, but that's not true if we did prep as part of link setup earlier. Check if we already have an async context allocate before setting up a new one. Cleanup the async context setup in general, we have a lot of duplicated code there. Fixes: 03b1230ca12a ("io_uring: ensure async punted sendmsg/recvmsg requests copy data") Fixes: f67676d160c6 ("io_uring: ensure async punted read/write requests copy iovec") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-15io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSGJens Axboe1-14/+26
If we have to punt the recvmsg to async context, we copy all the context. But since the iovec used can be either on-stack (if small) or dynamically allocated, if it's on-stack, then we need to ensure we reset the iov pointer. If we don't, then we're reusing old stack data, and that can lead to -EFAULTs if things get overwritten. Ensure we retain the right pointers for the iov, and free it as well if we end up having to go beyond UIO_FASTIOV number of vectors. Fixes: 03b1230ca12a ("io_uring: ensure async punted sendmsg/recvmsg requests copy data") Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-15io_uring: fix stale comment and a few typosBrian Gianforcaro1-4/+4
- Fix a few typos found while reading the code. - Fix stale io_get_sqring comment referencing s->sqe, the 's' parameter was renamed to 'req', but the comment still holds. Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-11io_uring: ensure we return -EINVAL on unknown opcodeJens Axboe1-7/+14
If we submit an unknown opcode and have fd == -1, io_op_needs_file() will return true as we default to needing a file. Then when we go and assign the file, we find the 'fd' invalid and return -EBADF. We really should be returning -EINVAL for that case, as we normally do for unsupported opcodes. Change io_op_needs_file() to have the following return values: 0 - does not need a file 1 - does need a file < 0 - error value and use this to pass back the right value for this invalid case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: add sockets to list of files that support non-blocking issueJens Axboe1-2/+4
In chasing a performance issue between using IORING_OP_RECVMSG and IORING_OP_READV on sockets, tracing showed that we always punt the socket reads to async offload. This is due to io_file_supports_async() not checking for S_ISSOCK on the inode. Since sockets supports the O_NONBLOCK (or MSG_DONTWAIT) flag just fine, add sockets to the list of file types that we can do a non-blocking issue to. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: only hash regular files for async work executionJens Axboe1-1/+3
We hash regular files to avoid having multiple threads hammer on the inode mutex, but it should not be needed on other types of files (like sockets). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: run next sqe inline if possibleJens Axboe1-4/+11
One major use case of linked commands is the ability to run the next link inline, if at all possible. This is done correctly for async offload, but somewhere along the line we lost the ability to do so when we were able to complete a request without having to punt it. Ensure that we do so correctly. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: don't dynamically allocate poll dataJens Axboe1-16/+11
This essentially reverts commit e944475e6984. For high poll ops workloads, like TAO, the dynamic allocation of the wait_queue entry for IORING_OP_POLL_ADD adds considerable extra overhead. Go back to embedding the wait_queue_entry, but keep the usage of wait->private for the pointer stashing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: deferred send/recvmsg should assign iovJens Axboe1-2/+2
Don't just assign it from the main call path, that can miss the case when we're called from issue deferral. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: sqthread should grab ctx->uring_lock for submissionsJens Axboe1-5/+2
We use the mutex to guard against registered file updates, for instance. Ensure we're safe in accessing that state against concurrent updates. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-10io_uring: allow unbreakable linksJens Axboe1-38/+46
Some commands will invariably end in a failure in the sense that the completion result will be less than zero. One such example is timeouts that don't have a completion count set, they will always complete with -ETIME unless cancelled. For linked commands, we sever links and fail the rest of the chain if the result is less than zero. Since we have commands where we know that will happen, add IOSQE_IO_HARDLINK as a stronger link that doesn't sever regardless of the completion result. Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly. Cc: stable@vger.kernel.org # v5.4 Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-05io_uring: fix a typo in a commentLimingWu1-1/+1
thatn -> than. Signed-off-by: Liming Wu <19092205@suning.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-05io_uring: hook all linked requests via link_listPavel Begunkov1-22/+20
Links are created by chaining requests through req->list with an exception that head uses req->link_list. (e.g. link_list->list->list) Because of that, io_req_link_next() needs complex splicing to advance. Link them all through list_list. Also, it seems to be simpler and more consistent IMHO. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-05io_uring: fix error handling in io_queue_link_headPavel Begunkov1-5/+7
In case of an error io_submit_sqe() drops a request and continues without it, even if the request was a part of a link. Not only it doesn't cancel links, but also may execute wrong sequence of actions. Stop consuming sqes, and let the user handle errors. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04io_uring: use hash table for poll command lookupsJens Axboe1-43/+41
We recently changed this from a single list to an rbtree, but for some real life workloads, the rbtree slows down the submission/insertion case enough so that it's the top cycle consumer on the io_uring side. In testing, using a hash table is a more well rounded compromise. It is fast for insertion, and as long as it's sized appropriately, it works well for the cancellation case as well. Running TAO with a lot of network sockets, this removes io_poll_req_insert() from spending 2% of the CPU cycles. Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04io_uring: ensure deferred timeouts copy necessary dataJens Axboe1-41/+42
If we defer a timeout, we should ensure that we copy the timespec when we have consumed the sqe. This is similar to commit f67676d160c6 for read/write requests. We already did this correctly for timeouts deferred as links, but do it generally and use the infrastructure added by commit 1a6b74fc8702 instead of having the timeout deferral use its own. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUTJens Axboe1-3/+0
There's really no reason why we forbid things like link/drain etc on regular timeout commands. Enable the usual SQE flags on timeouts. Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03io_uring: handle connect -EINPROGRESS like -EAGAINJens Axboe1-1/+1
Right now we return it to userspace, which means the application has to poll for the socket to be writeable. Let's just treat it like -EAGAIN and have io_uring handle it internally, this makes it much easier to use. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03io_uring: remove parameter ctx of io_submit_state_startJackie Liu1-2/+2
Parameter ctx we have never used, clean it up. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03io_uring: mark us with IORING_FEAT_SUBMIT_STABLEJens Axboe1-1/+2
If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03io_uring: ensure async punted connect requests copy dataJens Axboe1-4/+47
Just like commit f67676d160c6 for read/write requests, this one ensures that the sockaddr data has been copied for IORING_OP_CONNECT if we need to punt the request to async context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03io_uring: ensure async punted sendmsg/recvmsg requests copy dataJens Axboe1-17/+128
Just like commit f67676d160c6 for read/write requests, this one ensures that the msghdr data is fully copied if we need to punt a recvmsg or sendmsg system call to async context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02io_uring: ensure async punted read/write requests copy iovecJens Axboe1-62/+181
Currently we don't copy the iovecs when we punt to async context. This can be problematic for applications that store the iovec on the stack, as they often assume that it's safe to let the iovec go out of scope as soon as IO submission has been called. This isn't always safe, as we will re-copy the iovec once we're in async context. Make this 100% safe by copying the iovec just once. With this change, applications may safely store the iovec on the stack for all cases. Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02io_uring: add general async offload contextJens Axboe1-24/+32
Right now we just copy the sqe for async offload, but we want to store more context across an async punt. In preparation for doing so, put the sqe copy inside a structure that we can expand. With this pointer added, we can get rid of REQ_F_FREE_SQE, as that is now indicated by whether req->io is NULL or not. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02io_uring: transform send/recvmsg() -ERESTARTSYS to -EINTRJens Axboe1-0/+2
We should never return -ERESTARTSYS to userspace, transform it into -EINTR. Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02io_uring: use current task creds instead of allocating a new oneJens Axboe1-2/+2
syzbot reports: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline] RIP: 0010:__validate_creds include/linux/cred.h:187 [inline] RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550 Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318 RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010 RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849 R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000 R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Modules linked in: ---[ end trace f2e1a4307fbe2245 ]--- RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline] RIP: 0010:__validate_creds include/linux/cred.h:187 [inline] RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550 Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318 RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010 RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849 R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000 R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 which is caused by slab fault injection triggering a failure in prepare_creds(). We don't actually need to create a copy of the creds as we're not modifying it, we just need a reference on the current task creds. This avoids the failure case as well, and propagates the const throughout the stack. Fixes: 181e448d8709 ("io_uring: async workers should inherit the user creds") Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-01Merge tag 'for-linus-20191129' of git://git.kernel.dk/linux-blockLinus Torvalds1-6/+52
Pull block fixes from Jens Axboe: "I wasn't going to send this one off so soon, but unfortunately one of the fixes from the previous pull broke the build on some archs. So I'm sending this sooner rather than later. This contains: - Add highmem.h include for io_uring, because of the kmap() additions from last round. For some reason the build bot didn't spot this even though it sat for days. - Three minor ';' removals - Add support for the Beurer CD-on-a-chip device - Make io_uring work on MMU-less archs" * tag 'for-linus-20191129' of git://git.kernel.dk/linux-block: io_uring: fix missing kmap() declaration on powerpc ataflop: Remove unneeded semicolon block: sunvdc: Remove unneeded semicolon drbd: Remove unneeded semicolon io_uring: add mapping support for NOMMU archs sr_vendor: support Beurer GL50 evo CD-on-a-chip devices. cdrom: respect device capabilities during opening action
2019-11-29io_uring: fix missing kmap() declaration on powerpcJens Axboe1-0/+1
Christophe reports that current master fails building on powerpc with this error: CC fs/io_uring.o fs/io_uring.c: In function ‘loop_rw_iter’: fs/io_uring.c:1628:21: error: implicit declaration of function ‘kmap’ [-Werror=implicit-function-declaration] iovec.iov_base = kmap(iter->bvec->bv_page) ^ fs/io_uring.c:1628:19: warning: assignment makes pointer from integer without a cast [-Wint-conversion] iovec.iov_base = kmap(iter->bvec->bv_page) ^ fs/io_uring.c:1643:4: error: implicit declaration of function ‘kunmap’ [-Werror=implicit-function-declaration] kunmap(iter->bvec->bv_page); ^ which is caused by a missing highmem.h include. Fix it by including it. Fixes: 311ae9e159d8 ("io_uring: fix dead-hung for non-iter fixed rw") Reported-by: Christophe Leroy <christophe.leroy@c-s.fr> Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-28io_uring: add mapping support for NOMMU archsRoman Penyaev1-6/+51
That is a bit weird scenario but I find it interesting to run fio loads using LKL linux, where MMU is disabled. Probably other real archs which run uClinux can also benefit from this patch. Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io_uring: make poll->wait dynamically allocatedJens Axboe1-12/+17
In the quest to bring io_kiocb down to 3 cachelines, this one does the trick. Make the wait_queue_entry for the poll command come out of kmalloc instead of embedding it in struct io_poll_iocb, as the latter is the largest member of io_kiocb. Once we trim this down a bit, we're back at a healthy 192 bytes for struct io_kiocb. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io_uring: cleanup io_import_fixed()Pavel Begunkov1-7/+5
Clean io_import_fixed() call site and make it return proper type. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io_uring: inline struct sqe_submitPavel Begunkov1-91/+78
There is no point left in keeping struct sqe_submit. Inline it into struct io_kiocb, so any req->submit.field is now just req->field - moves initialisation of ring_file into io_get_req() - removes duplicated req->sequence. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26io_uring: store timeout's sqe->off in proper placePavel Begunkov1-4/+5
Timeouts' sequence offset (i.e. sqe->off) is stored in req->submit.sequence under a false name. Keep it in timeout.data instead. The unused space for sequence will be reclaimed in the following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>