linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-10-24	iov_iter: Introduce nofault flag to disable page faults	Andreas Gruenbacher	1	-5/+15
	Introduce a new nofault flag to indicate to iov_iter_get_pages not to fault in user pages. This is implemented by passing the FOLL_NOFAULT flag to get_user_pages, which causes get_user_pages to fail when it would otherwise fault in a page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting in pages when page faults are not allowed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-20	iov_iter: Introduce fault_in_iov_iter_writeable	Andreas Gruenbacher	1	-0/+39
	Introduce a new fault_in_iov_iter_writeable helper for safely faulting in an iterator for writing. Uses get_user_pages() to fault in the pages without actually writing to them, which would be destructive. We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that the iterator passed to .read_iter isn't in memory. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-18	iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable	Andreas Gruenbacher	1	-12/+21
	Turn iov_iter_fault_in_readable into a function that returns the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make sure this change doesn't silently break things. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-18	gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable}	Andreas Gruenbacher	1	-6/+4
	Turn fault_in_pages_{readable,writeable} into versions that return the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename the functions to fault_in_{readable,writeable} to make sure this change doesn't silently break things. Neither of these functions is entirely trivial and it doesn't seem useful to inline them, so move them to mm/gup.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-10-12	iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value	Andreas Gruenbacher	1	-2/+3
	Both iov_iter_get_pages and iov_iter_get_pages_alloc return the number of bytes of the iovec they could get the pages for. When they cannot get any pages, they're supposed to return 0, but when the start of the iovec isn't page aligned, the calculation goes wrong and they return a negative value. Fix both functions. In addition, change iov_iter_get_pages_alloc to return NULL in that case to prevent resource leaks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-09-14	iov_iter: add helper to save iov_iter state	Jens Axboe	1	-0/+36
	In an ideal world, when someone is passed an iov_iter and returns X bytes, then X bytes would have been consumed/advanced from the iov_iter. But we have use cases that always consume the entire iterator, a few examples of that are iomap and bdev O_DIRECT. This means we cannot rely on the state of the iov_iter once we've called ->read_iter() or ->write_iter(). This would be easier if we didn't always have to deal with truncate of the iov_iter, as rewinding would be trivial without that. We recently added a commit to track the truncate state, but that grew the iov_iter by 8 bytes and wasn't the best solution. Implement a helper to save enough of the iov_iter state to sanely restore it after we've called the read/write iterator helpers. This currently only works for IOVEC/BVEC/KVEC as that's all we need, support for other iterator types are left as an exercise for the reader. Link: https://lore.kernel.org/linux-fsdevel/CAHk-=wiacKV4Gh-MYjteU0LwNBSGpWrK-Ov25HdqB1ewinrFPg@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-09-08	lib/iov_iter.c: fix kernel-doc warnings	Randy Dunlap	1	-2/+6
	Fix all kernel-doc warnings in lib/iov_iter.c: lib/iov_iter.c:695: warning: Function parameter or member 'i' not described in '_copy_mc_to_iter' lib/iov_iter.c:695: warning: Excess function parameter 'iter' description in '_copy_mc_to_iter' lib/iov_iter.c:695: warning: No description found for return value of '_copy_mc_to_iter' lib/iov_iter.c:758: warning: Function parameter or member 'i' not described in '_copy_from_iter_flushcache' lib/iov_iter.c:758: warning: Excess function parameter 'iter' description in '_copy_from_iter_flushcache' lib/iov_iter.c:758: warning: No description found for return value of '_copy_from_iter_flushcache' Link: https://lkml.kernel.org/r/20210809051053.6531-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-04	iov_iter: remove uaccess_kernel() warning from iov_iter_init()	Linus Torvalds	1	-1/+0
	This warning was there to catch any architectures that still use CONFIG_SET_FS, and that would mis-use iov_iter_init() for anything that wasn't a proper user space pointer. So that WARN_ON_ONCE(uaccess_kernel()); makes perfect conceptual sense: you really shouldn't use a kernel pointer with set_fs(KERNEL_DS) and then pass it to iov_iter_init(). HOWEVER. Guenter Roeck reports that this warning actually triggers in no-mmu configurations of both ARM and m68k. And the reason isn't that they pass in a kernel pointer under set_fs(KERNEL_DS) at all: the reason is that in those configurations, "uaccess_kernel()" is simply not reliable. Those no-mmu setups set USER_DS and KERNEL_DS to the same values, so you can't test for the difference. In particular, the no-mmu case for ARM does #define USER_DS KERNEL_DS #define uaccess_kernel() (true) so USER_DS and KERNEL_DS have the same value, and uaccess_kernel() is always trivially true. The m68k case is slightly different and not quite as obvious. It does (spread out over multiple header files just to be extra exciting: asm/processor.h, asm/segment.h and asm-generic/uaccess.h): #define TASK_SIZE (0xFFFFFFFFUL) #define USER_DS MAKE_MM_SEG(TASK_SIZE) #define KERNEL_DS MAKE_MM_SEG(~0UL) #define get_fs() (current_thread_info()->addr_limit) #define uaccess_kernel() (get_fs().seg == KERNEL_DS.seg) but the end result is the same: uaccess_kernel() will always be true, because USER_DS and KERNEL_DS end up having the same value, even if that value is defined differently. This is very arguably a misfeature in those implementations, but in the end we don't really care. All modern architectures have gotten rid of set_fs() already, and generic kernel code never uses it. And while the sanity check was a nice idea, an architecture would have to go the extra mile to actually break this. So this well-intentioned warning isn't really all that likely to find anything but these known false positives, and as such just isn't worth maintaining. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 8cd54c1c8480 ("iov_iter: separate direction from flavour") Cc: Matthew Wilcox <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-10	csum_and_copy_to_pipe_iter(): leave handling of csum_state to caller	Al Viro	1	-23/+18
	... since all the logics is already there for use by iovec/kvec/etc. cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	clean up copy_mc_pipe_to_iter()	Al Viro	1	-24/+9
	... and we don't need kmap_atomic() there - kmap_local_page() is fine. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	pipe_zero(): we don't need no stinkin' kmap_atomic()...	Al Viro	1	-1/+3
	FWIW, memcpy_to_page() itself almost certainly ought to use kmap_local_page()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: clean csum_and_copy_...() primitives up a bit	Al Viro	1	-6/+4
	1) kmap_atomic() is not needed here, kmap_local_page() is enough. 2) No need to make sum = csum_block_add(sum, next, off); conditional upon next != 0 - adding 0 is a no-op as far as csum_block_add() is concerned. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	copy_page_from_iter(): don't need kmap_atomic() for kvec/bvec cases	Al Viro	1	-2/+2
	kmap_local_page() is enough. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	copy_page_to_iter(): don't bother with kmap_atomic() for bvec/kvec cases	Al Viro	1	-3/+3
	kmap_local_page() is enough there. Moreover, we can use _copy_to_iter() for actual copying in those cases - no useful extra checks on the address we are copying from in that call. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iterate_xarray(): only of the first iteration we might get offset != 0	Al Viro	1	-3/+3
	recalculating offset on each iteration is pointless - on all subsequent passes through the loop it will be zero anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	pull handling of ->iov_offset into iterate_{iovec,bvec,xarray}	Al Viro	1	-12/+14
	fewer arguments (by one, but still...) for iterate_...() macros Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: make iterator callbacks use base and len instead of iovec	Al Viro	1	-91/+91
	Iterator macros used to provide the arguments for step callbacks in a structure matching the flavour - iovec for ITER_IOVEC, kvec for ITER_KVEC and bio_vec for ITER_BVEC. That already broke down for ITER_XARRAY (bio_vec there); now that we are using kvec callback for bvec and xarray cases, we are always passing a pointer + length (void __user * + size_t for ITER_IOVEC callback, void * + size_t for everything else). Note that the original reason for bio_vec (page + offset + len) in case of ITER_BVEC used to be that we did not want to kmap a page when all we wanted was e.g. to find the alignment of its subrange. Now all such users are gone and the ones that are left want the page mapped anyway for actually copying the data. So in all cases we have pointer + length, and there's no good reason for keeping those in struct iovec or struct kvec - we can just pass them to callback separately. Again, less boilerplate in callbacks... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: make the amount already copied available to iterator callbacks	Al Viro	1	-70/+50
	Making iterator macros keep track of the amount of data copied is pretty easy and it has several benefits: 1) we no longer need the mess like (from += v.iov_len) - v.iov_len in the callbacks - initial value + total amount copied so far would do just fine. 2) less obviously, we no longer need to remember the initial amount of data we wanted to copy; the loops in iterator macros are along the lines of wanted = bytes; while (bytes) { copy some bytes -= copied if short copy break } bytes = wanted - bytes; Replacement is offs = 0; while (bytes) { copy some offs += copied bytes -= copied if short copy break } bytes = offs; That wouldn't be a win per se, but unlike the initial value of bytes, the amount copied so far is useful in callbacks. 3) in some cases (csum_and_copy_..._iter()) we already had offs manually maintained by the callbacks. With that change we can drop that. Less boilerplate and more readable code... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: get rid of separate bvec and xarray callbacks	Al Viro	1	-82/+30
	After the previous commit we have * xarray and bvec callbacks idential in all cases * both equivalent to kvec callback wrapped into kmap_local_page()/kunmap_local() pair. So we can pass only two (iovec and kvec) callbacks to iterate_and_advance() and let iterate_{bvec,xarray} wrap it into kmap_local_page()/kunmap_local_page(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: teach iterate_{bvec,xarray}() about possible short copies	Al Viro	1	-41/+24
	... and now we finally can sort out the mess in _copy_mc_to_iter(). Provide a variant of iterate_and_advance() that does NOT ignore the return values of bvec, xarray and kvec callbacks, use that in _copy_mc_to_iter(). That gets rid of magic in those callbacks - we used to need it so we'd get at least the right return value in case of failure halfway through. As a bonus, now iterator is advanced by the amount actually copied for all flavours. That's what the callers expect and it used to do that correctly in iovec and xarray cases. However, in kvec and bvec cases the iterator had not been advanced on such failures, breaking the users. Fixed now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iterate_bvec(): expand bvec.h macro forest, massage a bit	Al Viro	1	-13/+20
	... incidentally, using pointer instead of index in an array (the only change here) trims half-kilobyte of .text... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: unify iterate_iovec and iterate_kvec	Al Viro	1	-23/+5
	The differences between iterate_iovec and iterate_kvec are minor: * kvec callback is treated as if it returned 0 * initialization of __p is with i->iov and i->kvec resp. which is trivially dealt with. No code generation changes - compiler is quite capable of turning left = ((void)(STEP), 0); __v.iov_len -= left; (with no accesses to left downstream) and (void)(STEP); into the same code. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: massage iterate_iovec and iterate_kvec to logics similar to iterate_bvec	Al Viro	1	-55/+36
	Premature optimization is the root of all evil... Trying to unroll the first pass through the loop makes it harder to follow and not just for readers - compiler ends up generating worse code than it would on a "non-optimized" loop. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iterate_and_advance(): get rid of magic in case when n is 0	Al Viro	1	-1/+1
	iov_iter_advance() needs to do some non-trivial work when it's given 0 as argument (skip all empty iovecs, mostly). We used to implement it via iterate_and_advance(); we no longer do so and for all other users of iterate_and_advance() zero length is a no-op. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	csum_and_copy_to_iter(): massage into form closer to csum_and_copy_from_iter()	Al Viro	1	-4/+4
	Namely, have off counted starting from 0 rather than from csstate->off. To compensate we need to shift the initial value (csstate->sum) (rotate by 8 bits, as usual for csum) and do the same after we are finished adding the pieces up. What we get out of that is a bit more redundancy in our variables - from is always equal to addr + off, which will be useful several commits down the road. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant	Al Viro	1	-26/+4
	Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the callers do not need to do iov_iter_advance() after it. In case when they end up consuming less than they'd been given they need to do iov_iter_revert() on everything they had not consumed. That, however, needs to be done only on slow paths. All in-tree callers converted. And that kills the last user of iterate_all_kinds() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	[xarray] iov_iter_npages(): just use DIV_ROUND_UP()	Al Viro	1	-14/+2
	Compiler is capable of recognizing division by power of 2 and turning it into shifts. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter_npages(): don't bother with iterate_all_kinds()	Al Viro	1	-34/+54
	note that in bvec case pages can be compound ones - we can't just assume that each segment is covered by one (sub)page Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	get rid of iterate_all_kinds() in iov_iter_get_pages()/iov_iter_get_pages_alloc()	Al Viro	1	-56/+91
	Here iterate_all_kinds() is used just to find the first (non-empty, in case of iovec) segment. Which can be easily done explicitly. Note that in bvec case we now can get more than PAGE_SIZE worth of them, in case when we have a compound page in bvec and a range that crosses a subpage boundary. Older behaviour had been to stop on that boundary; we used to get the right first page (for_each_bvec() took care of that), but that was all we'd got. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter_gap_alignment(): get rid of iterate_all_kinds()	Al Viro	1	-13/+14
	For one thing, it's only used for iovec (and makes sense only for those). For another, here we don't care about iov_offset, since the beginning of the first segment and the end of the last one are ignored. So it makes a lot more sense to just walk through the iovec array... We need to deal with the case of truncated iov_iter, but unlike the situation with iov_iter_alignment() we don't care where the last segment ends - just which segment is the last one. [fixed a braino spotted by Qian Cai <quic_qiancai@quicinc.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter_alignment(): don't bother with iterate_all_kinds()	Al Viro	1	-10/+53
	It's easier to go over the array manually. We need to watch out for truncated iov_iter, though - iovec array might cover more than i->count. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	sanitize iov_iter_fault_in_readable()	Al Viro	1	-10/+16
	1) constify iov_iter argument; we are not advancing it in this primitive. 2) cap the amount requested by the amount of data in iov_iter. All existing callers should've been safe, but the check is really cheap and doing it here makes for easier analysis, as well as more consistent semantics among the primitives. 3) don't bother with iterate_iovec(). Explicit loop is not any harder to follow, and we get rid of standalone iterate_iovec() users - it's only used by iterate_and_advance() and (soon to be gone) iterate_all_kinds(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: optimize iov_iter_advance() for iovec and kvec	Al Viro	1	-14/+28
	We can do better than generic iterate_and_advance() for this one; inspired by bvec_iter_advance() (and massaged into that form by equivalent transformations). [fixed a braino caught by kernel test robot <oliver.sang@intel.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: separate direction from flavour	Al Viro	1	-37/+48
	Instead of having them mixed in iter->type, use separate ->iter_type and ->data_source (u8 and bool resp.) And don't bother with (pseudo-) bitmap for the former - microoptimizations from being able to check if the flavour is one of two values are not worth the confusion for optimizer. It can't prove that we never get e.g. ITER_IOVEC \| ITER_PIPE, so we end up with extra headache. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter_advance(): don't modify ->iov_offset for ITER_DISCARD	Al Viro	1	-2/+0
	the field is not used for that flavour Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: reorder handling of flavours in primitives	Al Viro	1	-46/+45
	iovec is the most common one; test it first and test explicitly, rather than "not anything else". Replace all flavour checks with use of iov_iter_is_...() helpers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-10	iov_iter: switch ..._full() variants of primitives to use of iov_iter_revert()	Al Viro	1	-104/+0
	Use corresponding plain variants, revert on short copy. That's the way it should've been done from the very beginning, except that we didn't have iov_iter_revert() back then... [fixed another braino caught by Qian Cai <quic_qiancai@quicinc.com>] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-03	iov_iter_advance(): use consistent semantics for move past the end	Al Viro	1	-3/+2
	asking to advance by more than we have left in the iov_iter should move to the very end; it should not leave negative i->count and it should not spew into syslog, etc. - it's a legitimate operation. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-03	[xarray] iov_iter_fault_in_readable() should do nothing in xarray case	Al Viro	1	-1/+1
	... and actually should just check it's given an iovec-backed iterator in the first place. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-03	copy_page_to_iter(): fix ITER_DISCARD case	Al Viro	1	-2/+5
	we need to advance the iterator... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-03	teach copy_page_to_iter() to handle compound pages	Al Viro	1	-3/+25
	In situation when copy_page_to_iter() got a compound page the current code would only work on systems with no CONFIG_HIGHMEM. It is the majority of real-world setups, or we would've drown in bug reports by now. Still needs fixing. Current variant works for solitary page; rename that to __copy_page_to_iter() and turn the handling of compound pages into a loop over subpages. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-06-03	iov_iter: Remove iov_iter_for_each_range()	David Howells	1	-27/+0
	Remove iov_iter_for_each_range() as it's no longer used with the removal of lustre. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-05-05	iov_iter: lift memzero_page() to highmem.h	Ira Weiny	1	-7/+1
	Patch series "btrfs: Convert kmap/memset/kunmap to memzero_user()". Lifting memzero_user(), convert it to kmap_local_page() and then use it in btrfs. This patch (of 3): memzero_page() can replace the kmap/memset/kunmap pattern in other places in the code. While zero_user() has the same interface it is not the same call and its use should be limited and some of those calls may be better converted from zero_user() to memzero_page().[1] But that is not addressed in this series. Lift memzero_page() to highmem. [1] https://lore.kernel.org/lkml/CAHk-=wijdojzo56FzYqE5TOYw2Vws7ik3LEMGj9SPQaJJ+Z73Q@mail.gmail.com/ Link: https://lkml.kernel.org/r/20210309212137.2610186-1-ira.weiny@intel.com Link: https://lkml.kernel.org/r/20210309212137.2610186-2-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: David Sterba <dsterba@suse.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-26	iov_iter: Four fixes for ITER_XARRAY	David Howells	1	-0/+5
	Fix four things[1] in the patch that adds ITER_XARRAY[2]: (1) Remove the address_space struct predeclaration. This is a holdover from when it was ITER_MAPPING. (2) Fix _copy_mc_to_iter() so that the xarray segment updates count and iov_offset in the iterator before returning. (3) Fix iov_iter_alignment() to not loop in the xarray case. Because the middle pages are all whole pages, only the end pages need be considered - and this can be reduced to just looking at the start position in the xarray and the iteration size. (4) Fix iov_iter_advance() to limit the size of the advance to no more than the remaining iteration size. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Jeff Layton <jlayton@redhat.com> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Link: https://lore.kernel.org/r/YIVrJT8GwLI0Wlgx@zeniv-ca.linux.org.uk [1] Link: https://lore.kernel.org/r/161918448151.3145707.11541538916600921083.stgit@warthog.procyon.org.uk [2]
2021-04-23	iov_iter: Add ITER_XARRAY	David Howells	1	-23/+290
	Add an iterator, ITER_XARRAY, that walks through a set of pages attached to an xarray, starting at a given page and offset and walking for the specified amount of bytes. The iterator supports transparent huge pages. The iterate_xarray() macro calls the helper function with rcu_access() helped. I think that this is only a problem for iov_iter_for_each_range() - and that returns an error for ITER_XARRAY (also, this function does not appear to be called). The caller must guarantee that the pages are all present and they must be locked using PG_locked, PG_writeback or PG_fscache to prevent them from going away or being migrated whilst they're being accessed. This is useful for copying data from socket buffers to inodes in network filesystems and for transferring data between those inodes and the cache using direct I/O. Whilst it is true that ITER_BVEC could be used instead, that would require a bio_vec array to be allocated to refer to all the pages - which should be redundant if inode->i_pages also points to all these pages. Note that older versions of this patch implemented an ITER_MAPPING instead, which was almost the same. Changes: v7: - Rename iter_xarray_copy_pages() to iter_xarray_populate_pages()[1]. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Christoph Hellwig <hch@lst.de> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3577430.1579705075@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/158861205740.340223.16592990225607814022.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539527815.286939.14607323792547049341.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653786033.2770958.14154191921867463240.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789064740.6155.11932541175173658065.stgit@warthog.procyon.org.uk/ # v6 Link: https://lore.kernel.org/r/27c369a8f42bb8a617672b2dc0126a5c6df5a050.camel@kernel.org [1]
2021-03-01	Merge branch 'kmap-conversion-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux	Linus Torvalds	1	-14/+0
	Pull kmap conversion updates from David Sterba: "This contains changes regarding kmap API use and eg conversion from kmap_atomic to kmap_local_page. The API belongs to memory management but to save cross-tree dependency headaches we've agreed to take it through the btrfs tree because there are some trivial conversions possible, while the rest will need some time and getting the easy cases out of the way would be convenient. The changes can be grouped: - function exports, new helpers - new VM_BUG_ON for additional verification; it's been discussed if it should be VM_BUG_ON or BUG_ON, the former was chosen due to performance reasons - code replaced by relevant helpers" [ This is an updated version of a request that originally came in during the merge window, but I asked for some updates: https://lore.kernel.org/lkml/cover.1614090658.git.dsterba@suse.com/ which is why this got merge after the merge window closed. - Linus ] * 'kmap-conversion-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: use copy_highpage() instead of 2 kmaps() btrfs: use memcpy_[to\|from]_page() and kmap_local_page() mm/highmem: Add VM_BUG_ON() to mem*_page() calls mm/highmem: Introduce memcpy_page(), memmove_page(), and memset_page() mm/highmem: Convert memcpy_[to\|from]_page() to kmap_local_page() mm/highmem: Lift memcpy_[to\|from]_page to core
2021-02-21	Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block	Linus Torvalds	1	-2/+19
	Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-11	mm/highmem: Lift memcpy_[to\|from]_page to core	Ira Weiny	1	-14/+0
	Working through a conversion to a call kmap_local_page() instead of kmap() revealed many places where the pattern kmap/memcpy/kunmap occurred. Eric Biggers, Matthew Wilcox, Christoph Hellwig, Dan Williams, and Al Viro all suggested putting this code into helper functions. Al Viro further pointed out that these functions already existed in the iov_iter code.[1] Various locations for the lifted functions were considered. Headers like mm.h or string.h seem ok but don't really portray the functionality well. pagemap.h made some sense but is for page cache functionality.[2] Another alternative would be to create a new header for the promoted memcpy functions, but it masks the fact that these are designed to copy to/from pages using the kernel direct mappings and complicates matters with a new header. Placing these functions in 'highmem.h' is suboptimal especially with the changes being proposed in the functionality of kmap. From a caller perspective including/using 'highmem.h' implies that the functions defined in that header are only required when highmem is in use which is increasingly not the case with modern processors. However, highmem.h is where all the current functions like this reside (zero_user(), clear_highpage(), clear_user_highpage(), copy_user_highpage(), and copy_highpage()). So it makes the most sense even though it is distasteful for some.[3] Lift memcpy_to_page() and memcpy_from_page() to pagemap.h. [1] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/ https://lore.kernel.org/lkml/20201013112544.GA5249@infradead.org/ [2] https://lore.kernel.org/lkml/20201208122316.GH7338@casper.infradead.org/ [3] https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/#t https://lore.kernel.org/lkml/20201208163814.GN1563847@iweiny-DESK2.sc.intel.com/ Cc: Boris Pismenny <borisp@mellanox.com> Cc: Or Gerlitz <gerlitz.or@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-04	udp: fix skb_copy_and_csum_datagram with odd segment sizes	Willem de Bruijn	1	-10/+14
	When iteratively computing a checksum with csum_block_add, track the offset "pos" to correctly rotate in csum_block_add when offset is odd. The open coded implementation of skb_copy_and_csum_datagram did this. With the switch to __skb_datagram_iter calling csum_and_copy_to_iter, pos was reinitialized to 0 on each call. Bring back the pos by passing it along with the csum to the callback. Changes v1->v2 - pass csum value, instead of csump pointer (Alexander Duyck) Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/ Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Reported-by: Oliver Graute <oliver.graute@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-25	iov_iter: optimise bvec iov_iter_advance()	Pavel Begunkov	1	-0/+19
	iov_iter_advance() is heavily used, but implemented through generic means. For bvecs there is a specifically crafted function for that, so use bvec_iter_advance() instead, it's faster and slimmer. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>