Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
It's possible for a request to invalidate a fscache_cookie will come in
while we're already processing an invalidation. If that happens we
currently take an extra access reference that will leak. Only call
__fscache_begin_cookie_access if the FSCACHE_COOKIE_DO_INVALIDATE bit
was previously clear.
Also, ensure that we attempt to clear the bit when the cookie is
"FAILED" and put the reference to avoid an access leak.
Fixes: 85e4ea1049c7 ("fscache: Fix invalidation/lookup race")
Suggested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
had been broken for ITER_BVEC et.al. since ever (OK, v3.17 when
ITER_BVEC had first appeared)...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... since April 2021
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... just shove it into one pipe_buffer.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
now that we are advancing the iterator, there's no need to
treat the first page separately - just call append_pipe()
in a loop.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
mechanical change; will be further massaged in subsequent commits
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
here nothing even looks at the iov_iter after the call, so we couldn't
care less whether it advances or not.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
that one is somewhat clumsier than usual and needs serious testing.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and adjust the callers
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and untangle the cleanup on failure to add into pipe.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... doing revert if we end up not using some pages
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.
Provide inline wrappers doing that, convert trivial open-coded
uses of those.
BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
All call sites of get_pages_array() are essenitally identical now.
Replace with common helper...
Returns number of slots available in resulting array or 0 on OOM;
it's up to the caller to make sure it doesn't ask to zero-entry
array (i.e. neither maxpages nor size are allowed to be zero).
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
... and don't mangle maxsize there - turn the loop into counting
one instead. Easier to see that we won't run out of array that
way. Note that special treatment of the partial buffer in that
thing is an artifact of the non-advancing semantics of
iov_iter_get_pages() - if not for that, it would be append_pipe(),
same as the body of the loop that follows it. IOW, once we make
iov_iter_get_pages() advancing, the whole thing will turn into
calculate how many pages do we want
allocate an array (if needed)
call append_pipe() that many times.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
same as for pipes and xarrays; after that iov_iter_get_pages() becomes
a wrapper for __iov_iter_get_pages_alloc().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
same as for pipes
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The differences between those two are
* pipe_get_pages() gets a non-NULL struct page ** value pointing to
preallocated array + array size.
* pipe_get_pages_alloc() gets an address of struct page ** variable that
contains NULL, allocates the array and (on success) stores its address in
that variable.
Not hard to combine - always pass struct page ***, have
the previous pipe_get_pages_alloc() caller pass ~0U as cap for
array size.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
zero maxpages is bogus, but best treated as "just return 0";
NULL pages, OTOH, should be treated as a hard bug.
get rid of now completely useless checks in xarray_get_pages{,_alloc}().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Incidentally, ITER_XARRAY did *not* free the sucker in case when
iter_xarray_populate_pages() returned 0...
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
All their callers are next to each other; all of them
want the total amount of pages and, possibly, the
offset in the partial final buffer.
Combine into a new helper (pipe_npages()), fix the
bogosity in pipe_space_for_user(), while we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We often need to find whether the last buffer is anon or not, and
currently it's rather clumsy:
check if ->iov_offset is non-zero (i.e. that pipe is not empty)
if so, get the corresponding pipe_buffer and check its ->ops
if it's &default_pipe_buf_ops, we have an anon buffer.
Let's replace the use of ->iov_offset (which is nowhere near similar to
its role for other flavours) with signed field (->last_offset), with
the following rules:
empty, no buffers occupied: 0
anon, with bytes up to N-1 filled: N
zero-copy, with bytes up to N-1 filled: -N
That way abs(i->last_offset) is equal to what used to be in i->iov_offset
and empty vs. anon vs. zero-copy can be distinguished by the sign of
i->last_offset.
Checks for "should we extend the last buffer or should we start
a new one?" become easier to follow that way.
Note that most of the operations can only be done in a sane
state - i.e. when the pipe has nothing past the current position of
iterator. About the only thing that could be done outside of that
state is iov_iter_advance(), which transitions to the sane state by
truncating the pipe. There are only two cases where we leave the
sane state:
1) iov_iter_get_pages()/iov_iter_get_pages_alloc(). Will be
dealt with later, when we make get_pages advancing - the callers are
actually happier that way.
2) iov_iter copied, then something is put into the copy. Since
they share the underlying pipe, the original gets behind. When we
decide that we are done with the copy (original is not usable until then)
we advance the original. direct_io used to be done that way; nowadays
it operates on the original and we do iov_iter_revert() to discard
the excessive data. At the moment there's nothing in the kernel that
could do that to ITER_PIPE iterators, so this reason for insane state
is theoretical right now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Fold pipe_truncate() into it, clean up. We can release buffers
in the same loop where we walk backwards to the iterator beginning
looking for the place where the new position will be.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
instead of setting ->iov_offset for new position and calling
pipe_truncate() to adjust ->len of the last buffer and discard
everything after it, adjust ->len at the same time we set ->iov_offset
and use pipe_discard_from() to deal with buffers past that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
it's only used to get to the partial buffer we can add to,
and that's always the last one, i.e. pipe->head - 1.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Expand the only remaining call of push_pipe() (in
__pipe_get_pages()), combine it with the page-collecting loop there.
Note that the only reason it's not a loop doing append_pipe() is
that append_pipe() is advancing, while iov_iter_get_pages() is not.
As soon as it switches to saner semantics, this thing will switch
to using append_pipe().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
New helper: append_pipe(). Extends the last buffer if possible,
allocates a new one otherwise. Returns page and offset in it
on success, NULL on failure. iov_iter is advanced past the
data we've got.
Use that instead of push_pipe() in copy-to-pipe primitives;
they get simpler that way. Handling of short copy (in "mc" one)
is done simply by iov_iter_revert() - iov_iter is in consistent
state after that one, so we can use that.
[Fix for braino caught by Liu Xinpeng <liuxp11@chinatelecom.cn> folded in]
[another braino fix, this time in copy_pipe_to_iter() and pipe_zero();
caught by testcase from Hugh Dickins]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
There are only two kinds of pipe_buffer in the area used by ITER_PIPE.
1) anonymous - copy_to_iter() et.al. end up creating those and copying
data there. They have zero ->offset, and their ->ops points to
default_pipe_page_ops.
2) zero-copy ones - those come from copy_page_to_iter(), and page
comes from caller. ->offset is also caller-supplied - it might be
non-zero. ->ops points to page_cache_pipe_buf_ops.
Move creation and insertion of those into helpers - push_anon(pipe, size)
and push_page(pipe, page, offset, size) resp., separating them from
the "could we avoid creating a new buffer by merging with the current
head?" logics.
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
pipe_buffer instances of a pipe are organized as a ring buffer,
with power-of-2 size. Indices are kept *not* reduced modulo ring
size, so the buffer refered to by index N is
pipe->bufs[N & (pipe->ring_size - 1)].
Ring size can change over the lifetime of a pipe, but not while
the pipe is locked. So for any iov_iter primitives it's a constant.
Original conversion of pipes to this layout went overboard trying
to microoptimize that - calculating pipe->ring_size - 1, storing
it in a local variable and using through the function. In some
cases it might be warranted, but most of the times it only
obfuscates what's going on in there.
Introduce a helper (pipe_buf(pipe, N)) that would encapsulate
that and use it in the obvious cases. More will follow...
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Use pipe_discard_from() explicitly in generic_file_read_iter(); don't bother
with rather non-obvious use of iov_iter_advance() in there.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.
We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.
New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.
DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If something manages to set the maximum file size to MAX_OFFSET+1, this
can cause the xfs and ext4 filesystems at least to become corrupt.
Ordinarily, the kernel protects against userspace trying this by
checking the value early in the truncate() and ftruncate() system calls
calls - but there are at least two places that this check is bypassed:
(1) Cachefiles will round up the EOF of the backing file to DIO block
size so as to allow DIO on the final block - but this might push
the offset negative. It then calls notify_change(), but this
inadvertently bypasses the checking. This can be triggered if
someone puts an 8EiB-1 file on a server for someone else to try and
access by, say, nfs.
(2) ksmbd doesn't check the value it is given in set_end_of_file_info()
and then calls vfs_truncate() directly - which also bypasses the
check.
In both cases, it is potentially possible for a network filesystem to
cause a disk filesystem to be corrupted: cachefiles in the client's
cache filesystem; ksmbd in the server's filesystem.
nfsd is okay as it checks the value, but we can then remove this check
too.
Fix this by adding a check to inode_newsize_ok(), as called from
setattr_prepare(), thereby catching the issue as filesystems set up to
perform the truncate with minimal opportunity for bypassing the new
check.
Fixes: 1f08c925e7a3 ("cachefiles: Implement backing file wrangling")
Fixes: f44158485826 ("cifsd: add file operations")
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: stable@kernel.org
Acked-by: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Steve French <sfrench@samba.org>
cc: Hyunchul Lee <hyc.lee@gmail.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch removes the trailing white space in kernel/sysysctl.c
Signed-off-by: Fanjun Kong <bh1scw@gmail.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
[mcgrof: fix commit message subject]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
This patch fixes two coding style issues:
1. Clean up indentation, replace spaces with tab
2. Add space after ','
Signed-off-by: Fanjun Kong <bh1scw@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
There are two adjacent sysctl entries protected by the same
CONFIG_TREE_RCU config symbol. Merge them into a single block to
improve readability.
Use the more common "#ifdef" form while at it.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
devm_regulator_get_optional() API will return -ENODEV if the regulator was
not found. For the optional supplies CX, PX we should not fail in that case
but rather continue. So let's catch that error and continue silently if
those regulators are not found.
The commit 3f52d118f992 ("remoteproc: qcom_q6v5_pas: Deal silently with
optional px and cx regulators") was supposed to do the same but it missed
the fact that devm_regulator_get_optional() API returns -ENODEV when the
regulator was not found.
Cc: Abel Vesa <abel.vesa@linaro.org>
Fixes: 3f52d118f992 ("remoteproc: qcom_q6v5_pas: Deal silently with optional px and cx regulators")
Reported-by: Steev Klimaszewski <steev@kali.org>
Reviewed-by: Abel Vesa <abel.vesa@linaro.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20220801053939.12556-1-manivannan.sadhasivam@linaro.org
|
|
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
|
|
The various functions contain a NULL check starting in v5.15.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
|
|
This reverts commit 4bf7fda4dce22214c70c49960b1b6438e6260b67.
It turns out that it was hopelessly naive to think that this would work,
considering that we've always done this. The first machine I actually
tested this on broke at bootup, getting to
Reached target cryptsetup.target - Local Encrypted Volumes.
and then hanging. It's unclear what actually fails, since there's a lot
else going on around that time (eg amdgpu probing also happens around
that same time, but it could be some other random init thing that didn't
complete earlier and just caused the boot to hang at that point).
The expectations that we should default to some unsafe and untested mode
seems entirely unfounded, and the belief that this wouldn't affect
modern systems is clearly entirely false. The machine in question is
about two years old, so it's not exactly shiny, but it's also not some
dusty old museum piece PDP-11 in a closet.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Garry <john.garry@huawei.com>
Cc: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add the apparmor.net email address that the project is transitioning
emails to.
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
This reverts commit 6f5c672d17f583b081e283927f5040f726c54598.
This breaks normal crash dump when CPU0 is offline.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
This reverts commit 7d06fed77b7d8fc9f6cc41b4e3f2823d32532ad8.
This introduced vmem_mutex locking from vmem_map_4k_page()
function called from smp_reinit_ipl_cpu() with interrupts
disabled. While it is a pre-SMP early initcall no other CPUs
running in parallel nor other code taking vmem_mutex on this
boot stage - it still needs to be fixed.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
This reverts commit e409b7f19172a3c154de62de4baf32a2c25a375a.
Commit 7d06fed77b7d ("s390/smp: rework absolute lowcore access")
introduced mutex lock with interrupts disabled. This commit is
a follow-up that needs to be reverted as well.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Vishal notes that when attempting to define a second pmem region on a
device the DPA allocation fails with a message of the form:
decoder11.1: failed to reserve skipped space
Recall that the skip setting is used when there is a pmem allocation in
the presence of free ram DPA space. The first pmem allocation skips over
the free ram and subsequent pmem allocations do not require a skip. The
bug is that a skip is still attempted and the DPA reservation code
flags the double skip allocation conflict.
Fixes: cf880423b6a0 ("cxl/hdm: Add support for allocating DPA to an endpoint decoder")
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165973754730.1558392.15466392461645857658.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The endpoint decode granularity must be <= the window granularity
otherwise capacity in the endpoints is lost in the decode. Consider an
attempt to have a region granularity of 512 with 4 devices within a
window that maps 2 host bridges at a granularity of 256 bytes:
HPA DPA Offset HB Port EP
0x0 0x0 0 0 0
0x100 0x0 1 0 2
0x200 0x100 0 0 0
0x300 0x100 1 0 2
0x400 0x200 0 1 1
0x500 0x200 1 1 3
0x600 0x300 0 1 1
0x700 0x300 1 1 3
0x800 0x400 0 0 0
0x900 0x400 1 0 2
0xA00 0x500 0 0 0
0xB00 0x500 1 0 2
Notice how endpoint0 maps HPA 0x0 and 0x200 correctly, but then at HPA
0x800 it results in DPA 0x200-0x400 on being skipped.
Fix this by restricing the region granularity to be equal to the window
granularity resulting in the following for a x4 region under a x2 window
at a granularity of 256.
HPA DPA Offset HB Port EP
0x0 0x0 0 0 0
0x100 0x0 1 0 2
0x200 0x0 0 1 1
0x300 0x0 1 1 3
0x400 0x100 0 0 0
0x500 0x100 1 0 2
0x600 0x100 0 1 1
0x700 0x100 1 1 3
Not that it ever made practical sense to support region granularity >
window granularity. The window rotates host bridges causing endpoints to
never see a consecutive stream of requests at the desired granularity
without breaks to issue cycles to the other host bridge.
Fixes: 80d10a6cee05 ("cxl/region: Add interleave geometry attributes")
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165973127171.1526540.9923273539049172976.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
In cases where the decode fans out as it traverses downstream, the
interleave granularity needs to increment to identify the port selector
bits out of the remaining address bits. For example, recall that with an
x2 parent port intereleave (IW == 1), the downstream decode for children
of those ports will either see address bit IG+8 always set, or address
bit IG+8 always clear. So if the child port needs to select a downstream
port it can only use address bits starting at IG+9 (where IG and IW are
the CXL encoded values for interleave granularity (ilog2(ig) - 8) and
ways (ilog2(iw))).
When the parent port interleave is x1 no such masking occurs and the
child port can maintain the granularity that was routed to the parent
port.
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165973126583.1526540.657948655360009242.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A recent bug fix added the setup of the endpoint decoder interleave
geometry settings to cxl_region_attach(). Move the HPA setup there as
well to keep all endpoint decoder parameter setting in a central
location.
For symmetry, move endpoint HPA teardown to cxl_region_detach(), and for
switches move HPA setup / teardown to cxl_port_{setup,reset}_targets().
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165973126020.1526540.14701949254436069807.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|