Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff.
These sub-routines set the attributes specified in @args.
We will use this later for setting parent pointers as a deferred
attribute operation.
[dgc: remove attr fork init code from xfs_attr_set_args().]
[dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.]
[dgc: correct sf add error handling.]
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
This patch adds a subroutine xfs_attr_try_sf_addname
used by xfs_attr_set. This subrotine will attempt to
add the attribute name specified in args in shortform,
as well and perform error handling previously done in
xfs_attr_set.
This patch helps to pre-simplify xfs_attr_set for reviewing
purposes and reduce indentation. New function will be added
in the next patch.
[dgc: moved commit to helper function, too.]
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
This patch moves fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
since xfs_attr.c is in libxfs. We will need these later in
xfsprogs.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The kernel only issues a log message that it's been shut down when
the filesystem triggers a shutdown itself. Hence there is no trace
in the log when a shutdown is triggered manually from userspace.
This can make it hard to see sequence of events in the log when
things go wrong, so make sure we always log a message when a
shutdown is run.
While there, clean up the logic flow so we don't have to continually
check if the shutdown trigger was user initiated before logging
shutdown messages.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
We don't handle buffer state properly in online repair's findroot
routine. If a buffer already has b_ops set, we don't ever want to touch
that, and we don't want to call the read verifiers on a buffer that
could be dirty (CRCs are only recomputed during log checkpoints).
Therefore, be more careful about what we do with a buffer -- if someone
else already attached ops that are not the ones for this btree type,
just ignore the buffer. We only attach our btree type's buf ops if it
matches the magic/uuid and structure checks.
We also modify xfs_buf_read_map to allow callers to set buffer ops on a
DONE buffer with NULL ops so that repair doesn't leave behind buffers
which won't have buffers attached to them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
If a caller supplies buffer ops when trying to read a buffer and the
buffer doesn't already have buf ops assigned, ensure that the ops are
assigned to the buffer and the verifier is run on that buffer.
Note that current XFS code is careful to assign buffer ops after a
xfs_{trans_,}buf_read call in which ops were not supplied. However, we
should apply ops defensively in case there is ever a coding mistake; and
an upcoming repair patch will need to be able to read a buffer without
assigning buf ops.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
In xrep_findroot_block, if we find a candidate root block with sibling
pointers or sibling blocks on the same tree level, we should not return
that block as a tree root because root blocks cannot have siblings.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Needed by userspace programs that call fstatfs().
It'd be natural to publish XFS_SB_MAGIC in uapi, but while these two
have identical values, they have different semantic meaning: one is
an enum cookie meant for statfs, the other a signature of the
on-disk format.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Instead of just asserting that we have no delalloc space dangling
in an inode that gets freed print the actual offenders for debug
mode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
We should want to write directly into the data fork for blocks that don't
have an extent in the COW fork covering them yet.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
We only need to allocate blocks for zeroing for reflink inodes,
and for we currently have a special case for reflink files in
the otherwise direct I/O path that I'd like to get rid of.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The option to enable unwritten extents was made default in 2003,
removed from mkfs in 2007, and cannot be disabled in v5. We also
rely on it for a lot of common functionality, so filesystems without
it will run a completely untested and buggy code path. Enabling the
support also is a simple bit flip using xfs_db, so legacy file
systems can still be brought forward.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The invalid state isn't any different from a hole, so merge the two
states. Use the more descriptive hole name, but keep it as the first
value of the enum to catch uninitialized fields.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
When reflinking sub-file ranges, a data corruption can occur when
the source file range includes a partial EOF block. This shares the
unknown data beyond EOF into the second file at a position inside
EOF, exposing stale data in the second file.
XFS only supports whole block sharing, but we still need to
support whole file reflink correctly. Hence if the reflink
request includes the last block of the souce file, only proceed with
the reflink operation if it lands at or past the destination file's
current EOF. If it lands within the destination file EOF, reject the
entire request with -EINVAL and make the caller go the hard way.
This avoids the data corruption vector, but also avoids disruption
of returning EINVAL to userspace for the common case of whole file
cloning.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
A deduplication data corruption is Exposed by fstests generic/505 on
XFS. It is caused by extending the block match range to include the
partial EOF block, but then allowing unknown data beyond EOF to be
considered a "match" to data in the destination file because the
comparison is only made to the end of the source file. This corrupts
the destination file when the source extent is shared with it.
XFS only supports whole block dedupe, but we still need to appear to
support whole file dedupe correctly. Hence if the dedupe request
includes the last block of the souce file, don't include it in the
actual XFS dedupe operation. If the rest of the range dedupes
successfully, then report the partial last block as deduped, too, so
that userspace sees it as a successful dedupe rather than return
EINVAL because we can't dedupe unaligned blocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Before cloning into a file, update the ctime and remove sensitive
attributes like suid, just like we'd do for a regular file write.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
When we're reflinking between two files and the destination file range
is well beyond the destination file's EOF marker, zero any posteof
speculative preallocations in the destination file so that we don't
expose stale disk contents. The previous strategy of trying to clear
the preallocations does not work if the destination file has the
PREALLOC flag set.
Uncovered by shared/010.
Reported-by: Zorro Lang <zlang@redhat.com>
Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Refactor all the reflink preparation steps into a separate helper
that we'll use to land all the upcoming fixes for insufficient input
checks.
This rework also moves the invalidation of the destination range to
the prep function so that it is done before the range is remapped.
This ensures that nobody can access the data in range being remapped
until the remap is complete.
[dgc: fix xfs_reflink_remap_prep() return value and caller check to
handle vfs_clone_file_prep_inodes() returning 0 to mean "nothing to
do". ]
[dgc: make sure length changed by vfs_clone_file_prep_inodes() gets
propagated back to XFS code that does the remapping. ]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Commit 01239d77b9dd ("xfs: fix a null pointer dereference in
xfs_bmap_extents_to_btree") attempted to fix a null pointer
dreference when a fuzzing corruption of some kind was found.
This fix was flawed, resulting in assert failures like:
XFS: Assertion failed: ifp->if_broot == NULL, file: fs/xfs/libxfs/xfs_bmap.c, line: 715
.....
Call Trace:
xfs_bmap_extents_to_btree+0x6b9/0x7b0
__xfs_bunmapi+0xae7/0xf00
? xfs_log_reserve+0x1c8/0x290
xfs_reflink_remap_extent+0x20b/0x620
xfs_reflink_remap_blocks+0x7e/0x290
xfs_reflink_remap_range+0x311/0x530
vfs_dedupe_file_range_one+0xd7/0xe0
vfs_dedupe_file_range+0x15b/0x1a0
do_vfs_ioctl+0x267/0x6c0
The problem is that the error handling code now asserts that the
inode fork is not in btree format before the error handling code
undoes the modifications that put the fork back in extent format.
Fix this by moving the assert back to after the xfs_iroot_realloc()
call that returns the fork to extent format, and clean up the jump
labels to be meaningful.
Also, returning ENOSPC when xfs_btree_get_bufl() fails to
instantiate the buffer that was allocated (the actual fix in the
commit mentioned above) is incorrect. This is a fatal error - only
an invalid block address or a filesystem shutdown can result in
failing to get a buffer here.
Hence change this to EFSCORRUPTED so that the higher layer knows
this was a corruption related failure and should not treat it as an
ENOSPC error. This should result in a shutdown (via cancelling a
dirty transaction) which is necessary as we do not attempt to clean
up the (invalid) block that we have already allocated.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The iomap page fault mechanism currently dirties the associated page
after the full block range of the page has been allocated. This
leaves the page susceptible to delayed allocations without ever
being set dirty on sub-page block sized filesystems.
For example, consider a page fault on a page with one preexisting
real (non-delalloc) block allocated in the middle of the page. The
first iomap_apply() iteration performs delayed allocation on the
range up to the preexisting block, the next iteration finds the
preexisting block, and the last iteration attempts to perform
delayed allocation on the range after the prexisting block to the
end of the page. If the first allocation succeeds and the final
allocation fails with -ENOSPC, iomap_apply() returns the error and
iomap_page_mkwrite() fails to dirty the page having already
performed partial delayed allocation. This eventually results in the
page being invalidated without ever converting the delayed
allocation to real blocks.
This problem is reliably reproduced by generic/083 on XFS on ppc64
systems (64k page size, 4k block size). It results in leaked
delalloc blocks on inode reclaim, which triggers an assert failure
in xfs_fs_destroy_inode() and filesystem accounting inconsistency.
Move the set_page_dirty() call from iomap_page_mkwrite() to the
actor callback, similar to how the buffer head implementation works.
The actor callback is called iff ->iomap_begin() returns success, so
ensures the page is dirtied as soon as possible after an allocation.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
One of the first steps of log recovery is to check for the special
case of a zeroed log. If the first cycle in the log is zero or the
tail portion of the log is zeroed, the head is set to the first
instance of cycle 0. xlog_find_zeroed() includes a sanity check that
enforces that the first cycle in the log must be 1 if the last cycle
is 0. While this is true in most cases, the check is not totally
valid because it doesn't consider the case where the filesystem
crashed after a partial/out of order log buffer completion that
wraps around the end of the physical log.
For example, consider a filesystem that has completed most of the
first cycle of the log, reaches the end of the physical log and
splits the next single log buffer write into two in order to wrap
around the end of the log. If these I/Os are reordered, the second
(wrapped) I/O completes and the first happens to fail, the log is
left in a state where the last cycle of the log is 0 and the first
cycle is 2. This causes the xlog_find_zeroed() sanity check to fail
and prevents the filesystem from mounting. This situation has been
reproduced on particular systems via repeated runs of generic/475.
This is an expected state that log recovery already knows how to
deal with, however. Since the log is still partially zeroed, the
head is detected correctly and points to a valid tail. The
subsequent stale block detection clears blocks beyond the head up to
the tail (within a maximum range), with the express purpose of
clearing such out of order writes. As expected, this removes the out
of order cycle 2 blocks at the physical start of the log.
In other words, the only thing that prevents a clean mount and
recovery of the filesystem in this scenario is the specific (last ==
0 && first != 1) sanity check in xlog_find_zeroed(). Since the log
head/tail are now independently validated via cycle, log record and
CRC checks, this highly specific first cycle check is of dubious
value. Remove it and rely on the higher level validation to
determine whether log content is sane and recoverable.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Verify the inode di_forkoff, lifted from xfs_repair's
process_check_inode_forkoff().
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The iomap direct I/O code issues a single ->end_io call for the whole
I/O request, and if some of the extents cowered needed a COW operation
it will call xfs_reflink_end_cow over the whole range.
When we do AIO writes we drop the iolock after doing the initial setup,
but before the I/O completion. Between dropping the lock and completing
the I/O we can have a racing buffered write create new delalloc COW fork
extents in the region covered by the outstanding direct I/O write, and
thus see delalloc COW fork extents in xfs_reflink_end_cow. As
concurrent writes are fundamentally racy and no guarantees are given we
can simply skip those.
This can be easily reproduced with xfstests generic/208 in always_cow
mode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xchk_inode_flags2() currently treats any di_flags2 values that the
running kernel doesn't recognize as corruption, and calls
xchk_ino_set_corrupt() if they are set. However, it's entirely possible
that these flags were set in some newer kernel and are quite valid,
but ignored in this kernel.
(Validators don't care one bit about unknown di_flags2.)
Call xchk_ino_set_warning instead, because this may or may not actually
indicate a problem.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Remove duplicated include xfs_alloc.h
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
This function is only used to punch out delayed allocations on I/O
failure, which means we need to have read the extents earlier.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
When xfs_reflink_allocate_cow() allocates a transaction, it drops
the ILOCK to perform the operation. This Introduces a race condition
where another thread modifying the file can perform the COW
allocation operation underneath us. This result in the retry loop
finding an allocated block and jumping straight to the conversion
code. It does not, however, cancel the transaction it holds and so
this gets leaked. This results in a lockdep warning:
================================================
WARNING: lock held when returning to user space!
4.18.5 #1 Not tainted
------------------------------------------------
worker/6123 is leaving the kernel with locks still held!
1 lock held by worker/6123:
#0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220
And eventually the filesystem deadlocks because it runs out of log
space that is reserved by the leaked transaction and never gets
released.
The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
gotos - it's no surprise that it has bug where the flow through
several goto jumps then fails to clean up context from a non-obvious
logic path. CLean up the logic flow and make sure every path does
the right thing.
Reported-by: Alexander Y. Fomichev <git.user@gmail.com>
Tested-by: Alexander Y. Fomichev <git.user@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: slight refactor]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
We've had a few reports of lockdep tripping over memory reclaim
context vs filesystem freeze "deadlocks". They all have looked
to be false positives on analysis, but it seems that they are
being tripped because we take freeze references before we run
a GFP_KERNEL allocation for the struct xfs_trans.
We can avoid this false positive vector just by re-ordering the
operations in xfs_trans_alloc(). That is. we need allocate the
structure before we take the freeze reference and enter the GFP_NOFS
allocation context that follows the xfs_trans around. This prevents
lockdep from seeing the GFP_KERNEL allocation inside the transaction
context, and that prevents it from triggering the freeze level vs
alloc context vs reclaim warnings.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The xfs_buf_log_item structure has a reference counter with slightly
tricky semantics. In the common case, a buffer is logged and
committed in a transaction, committed to the on-disk log (added to
the AIL) and then finally written back and removed from the AIL. The
bli refcount covers two potentially overlapping timeframes:
1. the bli is held in an active transaction
2. the bli is pinned by the log
The caveat to this approach is that the reference counter does not
purely dictate the lifetime of the bli. IOW, when a dirty buffer is
physically logged and unpinned, the bli refcount may go to zero as
the log item is inserted into the AIL. Only once the buffer is
written back can the bli finally be freed.
The above semantics means that it is not enough for the various
refcount decrementing contexts to release the bli on decrement to
zero. xfs_trans_brelse(), transaction commit (->iop_unlock()) and
unpin (->iop_unpin()) must all drop the associated reference and
make additional checks to determine if the current context is
responsible for freeing the item.
For example, if a transaction holds but does not dirty a particular
bli, the commit may drop the refcount to zero. If the bli itself is
clean, it is also not AIL resident and must be freed at this time.
The same is true for xfs_trans_brelse(). If the transaction dirties
a bli and then aborts or an unpin results in an abort due to a log
I/O error, the last reference count holder is expected to explicitly
remove the item from the AIL and release it (since an abort means
filesystem shutdown and metadata writeback will never occur).
This leads to fairly complex checks being replicated in a few
different places. Since ->iop_unlock() and xfs_trans_brelse() are
nearly identical, refactor the logic into a common helper that
implements and documents the semantics in one place. This patch does
not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfs_trans_brelse() is a bit of a historical mess, similar to
xfs_buf_item_unlock(). It is unnecessarily verbose, has snippets of
commented out code, inconsistency with regard to stale items, etc.
Clean up xfs_trans_brelse() to use similar logic and flow as
xfs_buf_item_unlock() with regard to bli reference count handling.
This patch makes no functional changes, but facilitates further
refactoring of the common bli reference count handling code.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
xfstests generic/388,475 occasionally reproduce assertion failures
in xfs_buf_item_unpin() when the final bli reference is dropped on
an invalidated buffer and the buffer is not locked as it is expected
to be. Invalidated buffers should remain locked on transaction
commit until the final unpin, at which point the buffer is removed
from the AIL and the bli is freed since stale buffers are not
written back.
The assert failures are associated with filesystem shutdown,
typically due to log I/O errors injected by the test. The
problematic situation can occur if the shutdown happens to cause a
race between an active transaction that has invalidated a particular
buffer and an I/O error on a log buffer that contains the bli
associated with the same (now stale) buffer.
Both transaction and log contexts acquire a bli reference. If the
transaction has already invalidated the buffer by the time the I/O
error occurs and ends up aborting due to shutdown, the transaction
and log hold the last two references to a stale bli. If the
transaction cancel occurs first, it treats the buffer as non-stale
due to the aborted state: the bli reference is dropped and the
buffer is released/unlocked. The log buffer I/O error handling
eventually calls into xfs_buf_item_unpin(), drops the final
reference to the bli and treats it as stale. The buffer wasn't left
locked by xfs_buf_item_unlock(), however, so the assert fails and
the buffer is double unlocked. The latter problem is mitigated by
the fact that the fs is shutdown and no further damage is possible.
->iop_unlock() of an invalidated buffer should behave consistently
with respect to the bli refcount, regardless of aborted state. If
the refcount remains elevated on commit, we know the bli is awaiting
an unpin (since it can't be in another transaction) and will be
handled appropriately on log buffer completion. If the final bli
reference of an invalidated buffer is dropped in ->iop_unlock(), we
can assume the transaction has aborted because invalidation implies
a dirty transaction. In the non-abort case, the log would have
acquired a bli reference in ->iop_pin() and prevented bli release at
->iop_unlock() time. In the abort case the item must be freed and
buffer unlocked because it wasn't pinned by the log.
Rework xfs_buf_item_unlock() to simplify the currently circuitous
and duplicate logic and leave invalidated buffers locked based on
bli refcount, regardless of aborted state. This ensures that a
pinned, stale buffer is always found locked when eventually
unpinned.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
Now that deferred operations are completely managed via
transactions, it's no longer necessary to cancel the dfops in error
paths that already cancel the associated transaction. There are a
few such calls lingering throughout the codebase.
Remove all remaining unnecessary calls to xfs_defer_cancel(). This
leaves xfs_defer_cancel() calls in two places. The first is the call
in the transaction cancel path itself, which facilitates this patch.
The second is made via the xfs_defer_finish() error path to provide
consistent error semantics with transaction commit. For example,
xfs_trans_commit() expects an xfs_defer_finish() failure to clean up
the dfops structure before it returns.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
The VFS routine that calls ->get_link blindly copies whatever's returned
into the user's buffer. If we return a NULL pointer, the vfs will
crash on the null pointer. Therefore, return -EFSCORRUPTED instead of
blowing up the kernel.
[dgc: clean up with hch's suggestions]
Reported-by: wen.xu@gatech.edu
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|
|
|
This is not normally noticeable, but repeated forks are unnecessarily
expensive because they repeatedly dirty the parent page tables during
the page table copy operation.
It's trivial to just avoid write protecting the page table entry if it
was already not writable.
This patch was inspired by
https://bugzilla.kernel.org/show_bug.cgi?id=200447
which points to an ancient "waste time re-doing fork" issue in the
presence of lots of signals.
That bug was fixed by Eric Biederman's signal handling series
culminating in commit c3ad2c3b02e9 ("signal: Don't restart fork when
signals come in"), but the unnecessary work for repeated forks is still
work just fixing, particularly since the fix is trivial.
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
At the point where r is being checked for different values, r is always
going to be equal to 2 as the previous if statements jump to end or end1
if r is not 2. Hence the assignment to err can be simplified to just
err an assignment without any checks on the value or r.
Detected by CoverityScan, CID#1226737 ("Logically dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Tejun Heo wrote:
>
> I asked Jens whether he could take care of the libata tree and he
> thankfully agreed, so, from now on, Jens will be the libata
> maintainer.
>
> Thanks a lot!
Thanks for your work in this area. I still remember the first linux
storage summit we did in Vancouver 2001, Tejun was invited to talk about
his libata error handling work. Before that, it was basically a crap
shoot if we recovered properly or not... A lot of water has flown under
the bridge since then!
Here's an "official" patch. Linus, can you apply it?
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Enabling the interrupt early, before power has been applied to the
device, can result in an interrupt being delivered too early if:
- the IOMMU shares an interrupt with a VOP
- the VOP has a pending interrupt (after a kexec, for example)
In these conditions, we end-up taking the interrupt without
the IOMMU being ready to handle the interrupt (not powered on).
Moving the interrupt request past the pm_runtime_enable() call
makes sure we can at least access the IOMMU registers. Note that
this is only a partial fix, and that the VOP interrupt will still
be screaming until the VOP driver kicks in, which advocates for
a more synchronized interrupt enabling/disabling approach.
Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
pm_runtime_get_if_in_use can fail: either PM has been disabled
altogether (-EINVAL), or the device hasn't been enabled yet (0).
Sadly, the Rockchip IOMMU driver tends to conflate the two things
by considering a non-zero return value as successful.
This has the consequence of hiding other bugs, so let's handle this
case throughout the driver, with a WARN_ON_ONCE so that we can try
and work out what happened.
Fixes: 0f181d3cf7d98 ("iommu/rockchip: Add runtime PM support")
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
A number of the Rockchip-specific drivers (IOMMU, display controllers)
are now assuming that CONFIG_PM is set, and may completely misbehave
if that's not the case.
Since there is hardly any reason for this configuration option not
to be selected anyway, let's require it (in the same way Tegra already
does).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
A number of the Rockchip-specific drivers (IOMMU, display controllers)
are now assuming that CONFIG_PM is set, and may completely misbehave
if that's not the case.
Since there is hardly any reason for this configuration option not
to be selected anyway, let's require it (in the same way Tegra already
does).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
The idle-states binding documentation[1] mentions that the
'entry-method' property is required on 64-bit platforms and must be
set to "psci".
commit a13f18f59d26 ("Documentation: arm: Fix typo in the idle-states
bindings examples") attempted to fix this earlier but clearly more is
needed.
Fix the cpu-capacity.txt documentation that uses the incorrect value so
we don't get copy-paste errors like these. Clarify the language in
idle-states.txt by removing the reference to the psci bindings that
might be causing this confusion.
Finally, fix devicetrees of various boards to reflect current
documentation.
[1] Documentation/devicetree/bindings/arm/idle-states.txt (see
idle-states node)
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
|
|
This can be dropped with commit 771c035372a036f83353eef46dbb829780330234
("deprecate the '__deprecated' attribute warnings entirely and for good")
now in upstream.
And we got rid of the last __deprecated use, too.
Signed-off-by: Sedat Dilek <sedat.dilek@credativ.de>
[wsa: shortened commit message to reflect the current situation]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective.
Make the warning more helpful by suggesting the proper mem=X kernel boot
parameter to make it effective and a link to the L1TF document to help
decide if the mitigation is worth the unusable RAM.
[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz
|
|
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
The old @sunsite.dk address is no longer active, so update the references.
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
There aren't any users left. Remove this callback from the 2.4 times.
Phew, finally, that took years to reach...
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
As we now have deferred probing, we can use a custom mechanism and
finally get rid of the legacy interface from the i2c core.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
|