aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2024-04-22xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positiveChristoph Hellwig1-14/+24
And to make that more clear, rearrange the code a bit and add asserts and a comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: move RT inode locking out of __xfs_bunmapiChristoph Hellwig2-7/+11
__xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary inodes given that it is very high level code. While this only looks ugly right now, it will become a problem when supporting delayed allocations for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents and thus never unlock the rt inodes. Move the locking into xfs_bmap_del_extent_real just before the call to xfs_rtfree_blocks instead and use a new flag in the transaction to ensure that the locking happens only once. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: free RT extents after updating the bmap btreeChristoph Hellwig1-17/+9
Currently xfs_bmap_del_extent_real frees RT extents before updating the bmap btree, while it frees regular blocks after performing the bmap btree update for convoluted historic reasons. Switch to free the RT blocks in the same place as the regular data blocks instead to simply the code and fix a very theoretical bug. A short history of this code researched by Dave Chiner below: The truncate for data device extents was originally a two-phase operation. First it removed the bmapbt record, but because this can free BMBT extents, it can use up all the free space tree reservation space. So the transaction gets rolled to commit the BMBT change and the xfs_bmap_finish() call that frees the data extent runs with a new transaction reservation that allows different free space btrees to be logged without overrun. However, on crash, this could lose the free space because there was nothing to tell recovery about the extents removed from the BMBT, hence EFIs were introduced. They tie the extent free operation to the bmapbt record removal commit for recovery of the second phase of the extent removal process. Then RT extents came along. RT extent freeing does not require a free space btree reservation because the free space metadata is static and transaction size is bound. Hence we don't need to care if the BMBT record removal modifies the per-ag free space trees and we don't need a two-phase extent remove transaction. The only thing we have to care about is not losing space on crash. Hence instead of recording the extent for freeing in the bmap list for xfs_bmap_finish() to process in a new transaction, it simply freed the rtextent directly. So the original code (from 1994) simply replaced the "free AG extent later" queueing with a direct free. This code was originally at the start of xfs_dmap_del_extent(), but the xfs_bmap_add_free() got moved to the end of the function via the "do_fx" flag (the current code logic) in 1997 (commit c4fac74eaa58 in the historic xfs-import tree) because there was a shutdown occurring because of a case where splitting the extent record failed because the BMBT split and the filesystem didn't have enough space for the split to be done. (FWIW, I'm not sure this can happen anymore.) The commit backed out the BMBT change on ENOSPC error, and in doing so I think this actually breaks RT free space tracking. However, it then returns an ENOSPC error, and we have a dirty transaction in the RT case so this will shut down the filesysetm when the transaction is cancelled. Hence the corrupted "bmbt now points at freed rt dev space" condition never make it to disk, but it's still the wrong way to handle the issue. IOWs, this proposed change fixes that "shutdown at ENOSPC on rt devices" situation that was introduced by the above commit back in 1997. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: refactor realtime inode lockingChristoph Hellwig7-23/+87
Create helper functions to deal with locking realtime metadata inodes. This enables us to maintain correct locking order once we start adding the realtime rmap and refcount btree inodes. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitionsChristoph Hellwig1-2/+1
Commit bb7b1c9c5dd3 ("xfs: tag transactions that contain intent done items") switched the XFS_TRANS_ definitions to be bit based, and using comments above the definitions. As XFS_TRANS_LOWMODE was last and has a big fat comment it was missed. Switch it to the same style. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: compile out v4 support if disabledChristoph Hellwig2-18/+44
Add a few strategic IS_ENABLED statements to let the compiler eliminate unused code when CONFIG_XFS_SUPPORT_V4 is disabled. This saves multiple kilobytes of .text in my .config: $ size xfs.o.* text data bss dec hex filename 1363633 294836 592 1659061 1950b5 xfs.o.new 1371453 294868 592 1666913 196f61 xfs.o.old Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: remove the unused xfs_extent_busy_enomem trace eventChristoph Hellwig1-1/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: unwind xfs_extent_busy_clearChristoph Hellwig1-34/+25
The current structure of xfs_extent_busy_clear that locks the first busy extent in each AG and unlocks when switching to a new AG makes sparse unhappy as the lock critical section tracking can't cope with taking the lock conditionally and inside a loop. Rewrite xfs_extent_busy_clear so that it has an outer loop only advancing when moving to a new AG, and an inner loop that consumes busy extents for the given AG to make life easier for sparse and to also make this logic more obvious to humans. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: move more logic into xfs_extent_busy_clear_oneChristoph Hellwig1-11/+12
Move the handling of discarded entries into xfs_extent_busy_clear_one to reuse the length check and tidy up the logic in the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: Remove unused function is_rt_data_forkJiapeng Chong1-8/+0
The function are defined in the rmap_repair.c file, but not called elsewhere, so delete the unused function. fs/xfs/scrub/rmap_repair.c:436:1: warning: unused function 'is_rt_data_fork'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8425 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: small cleanup in xrep_update_qflags()Dan Carpenter1-1/+1
The "mp" pointer is the same as "sc->mp" so this change doesn't affect runtime at all. However, it's nicer to use same name for both the lock and the unlock. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: Fix typo in commentThorsten Blum1-1/+1
s/somethign/something/ Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: fix sparse warnings about unused interval tree functionsDave Chinner1-10/+12
Sparse throws warnings about the interval tree functions that are defined and then not used in the scrub bitmap code: fs/xfs/scrub/bitmap.c:57:1: warning: unused function 'xbitmap64_tree_iter_next' [-Wunused-function] INTERVAL_TREE_DEFINE(struct xbitmap64_node, bn_rbnode, uint64_t, ^ ./include/linux/interval_tree_generic.h:151:33: note: expanded from macro 'INTERVAL_TREE_DEFINE' ITSTATIC ITSTRUCT * \ ^ <scratch space>:3:1: note: expanded from here xbitmap64_tree_iter_next ^ fs/xfs/scrub/bitmap.c:331:1: warning: unused function 'xbitmap32_tree_iter_next' [-Wunused-function] INTERVAL_TREE_DEFINE(struct xbitmap32_node, bn_rbnode, uint32_t, ^ ./include/linux/interval_tree_generic.h:151:33: note: expanded from macro 'INTERVAL_TREE_DEFINE' ITSTATIC ITSTRUCT * \ ^ <scratch space>:59:1: note: expanded from here xbitmap32_tree_iter_next Fix these by marking the functions created by the interval tree creation macro as __maybe_unused to suppress this warning. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-22xfs: silence sparse warning when checking version numberDave Chinner1-2/+1
Scrub checks the superblock version number against the known good feature bits that can be set in the version mask. It calculates the version mask to compare like so: vernum_mask = cpu_to_be16(~XFS_SB_VERSION_OKBITS | XFS_SB_VERSION_NUMBITS | XFS_SB_VERSION_ALIGNBIT | XFS_SB_VERSION_DALIGNBIT | XFS_SB_VERSION_SHAREDBIT | XFS_SB_VERSION_LOGV2BIT | XFS_SB_VERSION_SECTORBIT | XFS_SB_VERSION_EXTFLGBIT | XFS_SB_VERSION_DIRV2BIT); This generates a sparse warning: fs/xfs/scrub/agheader.c:168:23: warning: cast truncates bits from constant value (ffff3f8f becomes 3f8f) This is because '~XFS_SB_VERSION_OKBITS' is considered a 32 bit constant, even though it's value is always under 16 bits. This is a kinda silly thing to do, because: /* * Supported feature bit list is just all bits in the versionnum field because * we've used them all up and understand them all. Except, of course, for the * shared superblock bit, which nobody knows what it does and so is unsupported. */ #define XFS_SB_VERSION_OKBITS \ ((XFS_SB_VERSION_NUMBITS | XFS_SB_VERSION_ALLFBITS) & \ ~XFS_SB_VERSION_SHAREDBIT) #define XFS_SB_VERSION_NUMBITS 0x000f #define XFS_SB_VERSION_ALLFBITS 0xfff0 #define XFS_SB_VERSION_SHAREDBIT 0x0200 XFS_SB_VERSION_OKBITS has a value of 0xfdff, and so ~XFS_SB_VERSION_OKBITS == XFS_SB_VERSION_SHAREDBIT. The calculated mask already sets XFS_SB_VERSION_SHAREDBIT, so starting with ~XFS_SB_VERSION_OKBITS is completely redundant.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-20xfs: fix CIL sparse lock context warningsDave Chinner2-2/+3
Sparse reports: fs/xfs/xfs_log_cil.c:1127:1: warning: context imbalance in 'xlog_cil_push_work' - different lock contexts for basic block fs/xfs/xfs_log_cil.c:1380:1: warning: context imbalance in 'xlog_cil_push_background' - wrong count at exit fs/xfs/xfs_log_cil.c:1623:9: warning: context imbalance in 'xlog_cil_commit' - unexpected unlock xlog_cil_push_background() has a locking annotations for an rw_sem. Sparse does not track lock contexts for rw_sems, so the annotation generates false warnings. Remove the annotation. xlog_wait_on_iclog() drops the log->l_ic_loglock. The function has a sparse annotation, but the prototype in xfs_log_priv.h does not. Hence the warning from xlog_cil_push_work() which calls xlog_wait_on_iclog(). Add the missing annotation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-15xfs: unlock new repair tempfiles after creationDarrick J. Wong1-0/+2
After creation, drop the ILOCK on temporary files that have been created to stage a repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: don't pick up IOLOCK during rmapbt repair scanDarrick J. Wong1-15/+1
Now that we've fixed the directory operations to hold the ILOCK until they're finished with rmapbt updates for directory shape changes, we no longer need to take this lock when scanning directories for rmapbt records. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: Hold inode locks in xfs_renameAllison Henderson1-12/+33
Modify xfs_rename to hold all inode locks across a rename operation We will need this later when we add parent pointers Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: Hold inode locks in xfs_trans_alloc_dirAllison Henderson2-4/+19
Modify xfs_trans_alloc_dir to hold locks after return. Caller will be responsible for manual unlock. We will need this later to hold locks across parent pointer operations Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: Hold inode locks in xfs_iallocAllison Henderson3-6/+16
Modify xfs_ialloc to hold locks after return. Caller will be responsible for manual unlock. We will need this later to hold locks across parent pointer operations Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com> [djwong: hold the parent ilocked across transaction rolls too] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15docs: describe xfs directory tree online fsckDarrick J. Wong1-0/+124
I've added a scrubber that checks the directory tree structure and fixes them; describe this in the design documentation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: Increase XFS_QM_TRANS_MAXDQS to 5Allison Henderson4-6/+53
With parent pointers enabled, a rename operation can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip and wip. This causes their dquots to a be attached to the transaction chain, so we need to increase XFS_QM_TRANS_MAXDQS. This patch also add a helper function xfs_dqlockn to lock an arbitrary number of dquots. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15docs: update offline parent pointer repair strategyDarrick J. Wong1-21/+60
Now update how xfs_repair checks and repairs parent pointer info. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: Increase XFS_DEFER_OPS_NR_INODES to 5Allison Henderson4-11/+32
Renames that generate parent pointer updates can join up to 5 inodes locked in sorted order. So we need to increase the number of defer ops inodes and relock them in the same way. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com> [djwong: have one sorting function] Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: fix performance problems when fstrimming a subset of a fragmented AGDarrick J. Wong1-60/+93
On a 10TB filesystem where the free space in each AG is heavily fragmented, I noticed some very high runtimes on a FITRIM call for the entire filesystem. xfs_scrub likes to report progress information on each phase of the scrub, which means that a strace for the entire filesystem: ioctl(3, FITRIM, {start=0x0, len=10995116277760, minlen=0}) = 0 <686.209839> shows that scrub is uncommunicative for the entire duration. Reducing the size of the FITRIM requests to a single AG at a time produces lower times for each individual call, but even this isn't quite acceptable, because the time between progress reports are still very high: Strace for the first 4x 1TB AGs looks like (2): ioctl(3, FITRIM, {start=0x0, len=1099511627776, minlen=0}) = 0 <68.352033> ioctl(3, FITRIM, {start=0x10000000000, len=1099511627776, minlen=0}) = 0 <68.760323> ioctl(3, FITRIM, {start=0x20000000000, len=1099511627776, minlen=0}) = 0 <67.235226> ioctl(3, FITRIM, {start=0x30000000000, len=1099511627776, minlen=0}) = 0 <69.465744> I then had the idea to limit the length parameter of each call to a smallish amount (~11GB) so that we could report progress relatively quickly, but much to my surprise, each FITRIM call still took ~68 seconds! Unfortunately, the by-length fstrim implementation handles this poorly because it walks the entire free space by length index (cntbt), which is a very inefficient way to walk a subset of the blocks of an AG. Therefore, create a second implementation that will walk the bnobt and perform the trims in block number order. This implementation avoids the worst problems of the original code, though it lacks the desirable attribute of freeing the biggest chunks first. On the other hand, this second implementation will be much easier to constrain the system call latency, and makes it much easier to report fstrim progress to anyone who's running xfs_scrub. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com
2024-04-15xfs: create subordinate scrub contexts for xchk_metadata_inode_subtypeDarrick J. Wong4-73/+91
When a file-based metadata structure is being scrubbed in xchk_metadata_inode_subtype, we should create an entirely new scrub context so that each scrubber doesn't trip over another's buffers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15docs: update online directory and parent pointer repair sectionsDarrick J. Wong1-26/+29
Update the case studies of online directory and parent pointer reconstruction to reflect what they actually do in the final version. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15docs: update the parent pointers documentation to the final versionDarrick J. Wong1-41/+53
Now that we've decided on the ondisk format of parent pointers, update the documentation to reflect that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: pin inodes that would otherwise overflow link countDarrick J. Wong5-26/+36
The VFS inc_nlink function does not explicitly check for integer overflows in the i_nlink field. Instead, it checks the link count against s_max_links in the vfs_{link,create,rename} functions. XFS sets the maximum link count to 2.1 billion, so integer overflows should not be a problem. However. It's possible that online repair could find that a file has more than four billion links, particularly if the link count got corrupted while creating hardlinks to the file. The di_nlinkv2 field is not large enough to store a value larger than 2^32, so we ought to define a magic pin value of ~0U which means that the inode never gets deleted. This will prevent a UAF error if the repair finds this situation and users begin deleting links to the file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: try to avoid allocating from sick inode clustersDarrick J. Wong1-0/+40
I noticed that xfs/413 and xfs/375 occasionally failed while fuzzing core.mode of an inode. The root cause of these problems is that the field we fuzzed (core.mode or core.magic, typically) causes the entire inode cluster buffer verification to fail, which affects several inodes at once. The repair process tries to create either a /lost+found or a temporary repair file, but regrettably it picks the same inode cluster that we just corrupted, with the result that repair triggers the demise of the filesystem. Try avoid this by making the inode allocation path detect when the perag health status indicates that someone has found bad inode cluster buffers, and try to read the inode cluster buffer. If the cluster buffer fails the verifiers, try another AG. This isn't foolproof and can result in premature ENOSPC, but that might be better than shutting down. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: check unused nlink fields in the ondisk inodeDarrick J. Wong2-0/+20
v2/v3 inodes use di_nlink and not di_onlink; and v1 inodes use di_onlink and not di_nlink. Whichever field is not in use, make sure its contents are zero, and teach xfs_scrub to fix that if it is. This clears a bunch of missing scrub failure errors in xfs/385 for core.onlink. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: repair AGI unlinked inode bucket listsDarrick J. Wong3-4/+1074
Teach the AGI repair code to rebuild the unlinked buckets and lists. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: online repair of symbolic linksDarrick J. Wong7-2/+587
If a symbolic link target looks bad, try to sift through the rubble to find as much of the target buffer that we can, and stage a new target (short or remote format as needed) in a temporary file and use the atomic extent swapping mechanism to commit the results. In the worst case, we replace the target with an overly long filename that cannot possibly resolve. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: hoist AGI repair context to a heap objectDarrick J. Wong1-42/+63
Save ~460 bytes of stack space by moving all the repair context to a heap object. We're going to add even more context data in the next patch, which is why we really need to do this now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: check AGI unlinked inode bucketsDarrick J. Wong3-1/+42
Look for corruptions in the AGI unlinked bucket chains. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: ensure dentry consistency when the orphanage adopts a fileDarrick J. Wong2-0/+133
When the orphanage adopts a file, that file becomes a child of the orphanage. The dentry cache may have entries for the orphanage directory and the name we've chosen, so (1) make sure we abort if the dcache has a positive entry because something's not right; and (2) invalidate and purge negative dentries if the adoption goes through. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: pass the owner to xfs_symlink_write_targetDarrick J. Wong3-6/+6
Require callers of xfs_symlink_write_target to pass the owner number explicitly. This sets us up for online repair to be able to write a remote symlink target to sc->tempip with sc->ip's inumber in the block heaader. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: move files to orphanage instead of letting nlinks drop to zeroDarrick J. Wong7-19/+163
If we encounter an inode with a nonzero link count but zero observed links, move it to the orphanage. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: expose xfs_bmap_local_to_extents for online repairDarrick J. Wong4-7/+16
Allow online repair to call xfs_bmap_local_to_extents and add a void * argument at the end so that online repair can pass its own context. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: ask the dentry cache if it knows the parent of a directoryDarrick J. Wong5-1/+81
It's possible that the dentry cache can tell us the parent of a directory. Therefore, when repairing directory dot dot entries, query the dcache as a last resort before scanning the entire filesystem. A reviewer asks: "How high is the chance that we actually have a valid dcache entry for a file in a corrupted directory?" There's a decent chance of this actually working. Say you have a 1000-block directory foo, and block 980 gets corrupted. Let's further suppose that block 0 has a correct entry for ".." and "bar". If someone accesses /mnt/foo/bar, that will cause the dcache to create a dentry from /mnt to /mnt/foo whose d_parent points back to /mnt. If you then want to rebuild the directory, XFS can obtain the parent from the dcache without needing to wander into parent pointers or scan the filesystem to find /mnt's connection to foo. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: move orphan files to the orphanageDarrick J. Wong11-20/+844
When we're repairing a directory structure or fixing the dotdot entry of a subdirectory, it's possible that we won't ever find a parent for the subdirectory. When this is the case, move it to the orphanage, aka /lost+found. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: online repair of parent pointersDarrick J. Wong6-1/+238
Teach the online repair code to fix parent pointers for directories. For now, this means correcting the dotdot entry of an existing directory that is otherwise consistent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: scan the filesystem to repair a directory dotdot entryDarrick J. Wong7-24/+528
Teach the online directory repair code to scan the filesystem so that we can set the dotdot entry when we're rebuilding a directory. This involves dropping ILOCK on the directory that we're repairing, which means that the VFS can sneak in and tell us to update dotdot at any time. Deal with these races by using a dirent hook to absorb dotdot updates, and be careful not to check the scan results until after we've retaken the ILOCK. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: update the unlinked list when repairing link countsDarrick J. Wong1-9/+33
When we're repairing the link counts of a file, we must ensure either that the file has zero link count and is on the unlinked list; or that it has nonzero link count and is not on the unlinked list. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: online repair of directoriesDarrick J. Wong15-2/+1563
If a directory looks like it's in bad shape, try to sift through the rubble to find whatever directory entries we can, scan the directory tree for the parent (if needed), stage the new directory contents in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: inactivate directory data blocksDarrick J. Wong1-0/+51
Teach inode inactivation to delete all the incore buffers backing a directory. In normal runtime this should never happen because the VFS forbids rmdir on a non-empty directory. In the next patch, online directory repair stands up a new directory, exchanges it with the broken directory, and then drops the private temporary directory. If we cancel the repair just prior to exchanging the directory contents, the new directory will need to be torn down. Note: If we commit the repair, reaping will take care of all the ondisk space allocations and incore buffers for the old corrupt directory. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: create an xattr iteration function for scrubDarrick J. Wong5-78/+414
Create a streamlined function to walk a file's xattrs, without all the cursor management stuff in the regular listxattr. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: ensure unlinked list state is consistent with nlink during scrubDarrick J. Wong4-4/+67
Now that we have the means to tell if an inode is on an unlinked inode list or not, we can check that an inode with zero link count is on the unlinked list; and an inode that has nonzero link count is not on that list. Make repair clean things up too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: flag empty xattr leaf blocks for optimizationDarrick J. Wong2-0/+13
Empty xattr leaf blocks at offset zero are a waste of space but otherwise harmless. If we encounter one, flag it as an opportunity for optimization. If we encounter empty attr leaf blocks anywhere else in the attr fork, that's corruption. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: scrub should set preen if attr leaf has holesDarrick J. Wong4-0/+20
If an attr block indicates that it could use compaction, set the preen flag to have the attr fork rebuilt, since the attr fork rebuilder can take care of that for us. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>