aboutsummaryrefslogtreecommitdiffstats
path: root/fs/xfs/libxfs/xfs_ag_resv.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-07-07xfs: pass perag to xfs_alloc_read_agf()Dave Chinner1-1/+1
xfs_alloc_read_agf() initialises the perag if it hasn't been done yet, so it makes sense to pass it the perag rather than pull a reference from the buffer. This allows callers to be per-ag centric rather than passing mount/agno pairs everywhere. Whilst modifying the xfs_reflink_find_shared() function definition, declare it static and remove the extern declaration as it is an internal function only these days. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-07-07xfs: kill xfs_alloc_pagf_init()Dave Chinner1-1/+1
Trivial wrapper around xfs_alloc_read_agf(), can be easily replaced by passing a NULL agfbp to xfs_alloc_read_agf(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-10-19xfs: compute maximum AG btree height for critical reservation calculationDarrick J. Wong1-1/+2
Compute the actual maximum AG btree height for deciding if a per-AG block reservation is critically low. This only affects the sanity check condition, since we /generally/ will trigger on the 10% threshold. This is a long-winded way of saying that we're removing one more usage of XFS_BTREE_MAXLEVELS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-07-02Merge tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-6/+5
Pull xfs updates from Darrick Wong: "Most of the work this cycle has been on refactoring various parts of the codebase. The biggest non-cleanup changes are (1) reducing the number of cache flushes sent when writing the log; (2) a substantial number of log recovery fixes; and (3) I started accepting pull requests from contributors if the commits in their branches match what's been sent to the list. For a week or so I /had/ staged a major cleanup of the logging code from Dave Chinner, but it exposed so many lurking bugs in other parts of the logging and log recovery code that I decided to defer that patchset until we can address those latent bugs. Larger cleanups this time include walking the incore inode cache (me) and rework of the extended attribute code (Allison) to prepare it for adding logged xattr updates (and directory tree parent pointers) in future releases. Summary: - Refactor the buffer cache to use bulk page allocation - Convert agnumber-based AG iteration to walk per-AG structures - Clean up some unit conversions and other code warts - Reduce spinlock contention in the directio fastpath - Collapse all the inode cache walks into a single function - Remove indirect function calls from the inode cache walk code - Dramatically reduce the number of cache flushes sent when writing log buffers - Preserve inode sickness reports for longer - Rename xfs_eofblocks since it controls inode cache walks - Refactor the extended attribute code to prepare it for the addition of log intent items to make xattrs fully transactional - A few fixes to earlier large patchsets - Log recovery fixes so that we don't accidentally mark the log clean when log intent recovery fails - Fix some latent SOB errors - Clean up shutdown messages that get logged to dmesg - Fix a regression in the online shrink code - Fix a UAF in the buffer logging code if the fs goes offline - Fix uninitialized error variables - Fix a UAF in the CIL when commited log item callbacks race with a shutdown - Fix a bug where the CIL could hang trying to push part of the log ring buffer that hasn't been filled yet" * tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (102 commits) xfs: don't wait on future iclogs when pushing the CIL xfs: Fix a CIL UAF by getting get rid of the iclog callback lock xfs: remove callback dequeue loop from xlog_state_do_iclog_callbacks xfs: don't nest icloglock inside ic_callback_lock xfs: Initialize error in xfs_attr_remove_iter xfs: fix endianness issue in xfs_ag_shrink_space xfs: remove dead stale buf unpin handling code xfs: hold buffer across unpin and potential shutdown processing xfs: force the log offline when log intent item recovery fails xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes xfs: shorten the shutdown messages to a single line xfs: print name of function causing fs shutdown instead of hex pointer xfs: fix type mismatches in the inode reclaim functions xfs: separate primary inode selection criteria in xfs_iget_cache_hit xfs: refactor the inode recycling code xfs: add iclog state trace events xfs: xfs_log_force_lsn isn't passed a LSN xfs: Fix CIL throttle hang when CIL space used going backwards xfs: journal IO cache flush reductions xfs: remove need_start_rec parameter from xlog_write() ...
2021-06-28Merge tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linuxLinus Torvalds1-2/+2
Pull fallthrough fixes from Gustavo Silva: "Fix many fall-through warnings when building with Clang 12.0.0 and '-Wimplicit-fallthrough' so that we at some point will be able to enable that warning by default" * tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (26 commits) rxrpc: Fix fall-through warnings for Clang drm/nouveau/clk: Fix fall-through warnings for Clang drm/nouveau/therm: Fix fall-through warnings for Clang drm/nouveau: Fix fall-through warnings for Clang xfs: Fix fall-through warnings for Clang xfrm: Fix fall-through warnings for Clang tipc: Fix fall-through warnings for Clang sctp: Fix fall-through warnings for Clang rds: Fix fall-through warnings for Clang net/packet: Fix fall-through warnings for Clang net: netrom: Fix fall-through warnings for Clang ide: Fix fall-through warnings for Clang hwmon: (max6621) Fix fall-through warnings for Clang hwmon: (corsair-cpro) Fix fall-through warnings for Clang firewire: core: Fix fall-through warnings for Clang braille_console: Fix fall-through warnings for Clang ipv4: Fix fall-through warnings for Clang qlcnic: Fix fall-through warnings for Clang bnxt_en: Fix fall-through warnings for Clang netxen_nic: Fix fall-through warnings for Clang ...
2021-06-08Merge tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.14-merge2Darrick J. Wong1-6/+5
xfs: initial agnumber -> perag conversions for shrink If we want to use active references to the perag to be able to gate shrink removing AGs and hence perags safely, we've got a fair bit of work to do actually use perags in all the places we need to. There's a lot of code that iterates ag numbers and then looks up perags from that, often multiple times for the same perag in the one operation. If we want to use reference counted perags for access control, then we need to convert all these uses to perag iterators, not agno iterators. [Patches 1-4] The first step of this is consolidating all the perag management - init, free, get, put, etc into a common location. THis is spread all over the place right now, so move it all into libxfs/xfs_ag.[ch]. This does expose kernel only bits of the perag to libxfs and hence userspace, so the structures and code is rearranged to minimise the number of ifdefs that need to be added to the userspace codebase. The perag iterator in xfs_icache.c is promoted to a first class API and expanded to the needs of the code as required. [Patches 5-10] These are the first basic perag iterator conversions and changes to pass the perag down the stack from those iterators where appropriate. A lot of this is obvious, simple changes, though in some places we stop passing the perag down the stack because the code enters into an as yet unconverted subsystem that still uses raw AGs. [Patches 11-16] These replace the agno passed in the btree cursor for per-ag btree operations with a perag that is passed to the cursor init function. The cursor takes it's own reference to the perag, and the reference is dropped when the cursor is deleted. Hence we get reference coverage for the entire time the cursor is active, even if the code that initialised the cursor drops it's reference before the cursor or any of it's children (duplicates) have been deleted. The first patch adds the perag infrastructure for the cursor, the next four patches convert a btree cursor at a time, and the last removes the agno from the cursor once it is unused. [Patches 17-21] These patches are a demonstration of the simplifications and cleanups that come from plumbing the perag through interfaces that select and then operate on a specific AG. In this case the inode allocation algorithm does up to three walks across all AGs before it either allocates an inode or fails. Two of these walks are purely just to select the AG, and even then it doesn't guarantee inode allocation success so there's a third walk if the selected AG allocation fails. These patches collapse the selection and allocation into a single loop, simplifies the error handling because xfs_dir_ialloc() always returns ENOSPC if no AG was selected for inode allocation or we fail to allocate an inode in any AG, gets rid of xfs_dir_ialloc() wrapper, converts inode allocation to run entirely from a single perag instance, and then factors xfs_dialloc() into a much, much simpler loop which is easy to understand. Hence we end up with the same inode allocation logic, but it only needs two complete iterations at worst, makes AG selection and allocation atomic w.r.t. shrink and chops out out over 100 lines of code from this hot code path. [Patch 22] Converts the unlink path to pass perags through it. There's more conversion work to be done, but this patchset gets through a large chunk of it in one hit. Most of the iterators are converted, so once this is solidified we can move on to converting these to active references for being able to free perags while the fs is still active. * tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (23 commits) xfs: remove xfs_perag_t xfs: use perag through unlink processing xfs: clean up and simplify xfs_dialloc() xfs: inode allocation can use a single perag instance xfs: get rid of xfs_dir_ialloc() xfs: collapse AG selection for inode allocation xfs: simplify xfs_dialloc_select_ag() return values xfs: remove agno from btree cursor xfs: use perag for ialloc btree cursors xfs: convert allocbt cursors to use perags xfs: convert refcount btree cursor to use perags xfs: convert rmap btree cursor to using a perag xfs: add a perag to the btree cursor xfs: pass perags around in fsmap data dev functions xfs: push perags through the ag reservation callouts xfs: pass perags through to the busy extent code xfs: convert secondary superblock walk to use perags xfs: convert xfs_iwalk to use perag references xfs: convert raw ag walks to use for_each_perag xfs: make for_each_perag... a first class citizen ...
2021-06-02xfs: push perags through the ag reservation calloutsDave Chinner1-5/+4
We currently pass an agno from the AG reservation functions to the individual feature accounting functions, which in future may have to do perag lookups to access per-AG state. Instead, pre-emptively plumb the perag through from the highest AG reservation layer to the feature callouts so they won't have to look it up again. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-06-02xfs: move xfs_perag_get/put to xfs_ag.[ch]Dave Chinner1-1/+1
They are AG functions, not superblock functions, so move them to the appropriate location. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-05-26xfs: Fix fall-through warnings for ClangGustavo A. R. Silva1-2/+2
In preparation to enable -Wimplicit-fallthrough for Clang, fix the following warnings by replacing /* fall through */ comments, and its variants, with the new pseudo-keyword macro fallthrough: fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Notice that Clang doesn't recognize /* fall through */ comments as implicit fall-through markings, so in order to globally enable -Wimplicit-fallthrough for Clang, these comments need to be replaced with fallthrough; in the whole codebase. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-05-24xfs: check free AG space when making per-AG reservationsDarrick J. Wong1-3/+15
The new online shrink code exposed a gap in the per-AG reservation code, which is that we only return ENOSPC to callers if the entire fs doesn't have enough free blocks. Except for debugging mode, the reservation init code doesn't ever check that there's enough free space in that AG to cover the reservation. Not having enough space is not considered an immediate fatal error that requires filesystem offlining because (a) it's shouldn't be possible to wind up in that state through normal file operations and (b) even if one did, freeing data blocks would recover the situation. However, online shrink now needs to know if shrinking would not leave enough space so that it can abort the shrink operation. Hence we need to promote this assertion into an actual error return. Observed by running xfs/168 with a 1k block size, though in theory this could happen with any configuration. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-04-29xfs: unconditionally read all AGFs on mounts with perag reservationBrian Foster1-11/+23
perag reservation is enabled at mount time on a per AG basis. The upcoming change to set aside allocbt blocks from block reservation requires a populated allocbt counter as soon as possible after mount to be fully effective against large perag reservations. Therefore as a preparation step, initialize the pagf on all mounts where at least one reservation is active. Note that this already occurs to some degree on most default format filesystems as reservation requirement calculations already depend on the AGF or AGI, depending on the reservation type. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: add error injection for per-AG resv failureGao Xiang1-1/+5
per-AG resv failure after fixing up freespace is hard to test in an effective way, so directly add an error injection path to observe such error handling path works as expected. Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2019-11-07xfs: fix missing header includesDarrick J. Wong1-0/+2
Some of the xfs source files are missing header includes, so add them back. Sparse complains about non-static functions that don't have a forward declaration anywhere. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-06-28xfs: remove unused header filesEric Sandeen1-8/+0
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-14xfs: rename m_inotbt_nores to m_finobt_noresDarrick J. Wong1-1/+1
Rename this flag variable to imply more strongly that it's related to the free inode btree (finobt) operation. No functional changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-07-29xfs: pass transaction lock while setting up agresv on cyclic metadataDarrick J. Wong1-6/+7
Pass a tranaction pointer through to all helpers that calculate the per-AG block reservation. Online repair will use this to reinitialize per-ag reservations while it still holds all the AG headers locked to the repair transaction. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-06-24xfs: fix fdblocks accounting w/ RMAPBT per-AG reservationDarrick J. Wong1-4/+27
In __xfs_ag_resv_init we incorrectly calculate the amount by which to decrease fdblocks when reserving blocks for the rmapbt. Because rmapbt allocations do not decrease fdblocks, we must decrease fdblocks by the entire size of the requested reservation in order to achieve our goal of always having enough free blocks to satisfy an rmapbt expansion. This is in contrast to the refcountbt/finobt, which /do/ subtract from fdblocks whenever they allocate a block. For this allocation type we preserve the existing behavior where we decrease fdblocks only by the requested reservation minus the size of the existing tree. This fixes the problem where the available block counts reported by statfs change across a remount if there had been an rmapbt size change since mount time. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-06-06xfs: convert to SPDX license tagsDave Chinner1-15/+1
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-11xfs: account only rmapbt-used blocks against rmapbt perag resBrian Foster1-0/+4
The rmapbt perag metadata reservation reserves blocks for the reverse mapping btree (rmapbt). Since the rmapbt uses blocks from the agfl and perag accounting is updated as blocks are allocated from the allocation btrees, the reservation actually accounts blocks as they are allocated to (or freed from) the agfl rather than the rmapbt itself. While this works for blocks that are eventually used for the rmapbt, not all agfl blocks are destined for the rmapbt. Blocks that are allocated to the agfl (and thus "reserved" for the rmapbt) but then used by another structure leads to a growing inconsistency over time between the runtime tracking of rmapbt usage vs. actual rmapbt usage. Since the runtime tracking thinks all agfl blocks are rmapbt blocks, it essentially believes that less future reservation is required to satisfy the rmapbt than what is actually necessary. The inconsistency is rectified across mount cycles because the perag reservation is initialized based on the actual rmapbt usage at mount time. The problem, however, is that the excessive drain of the reservation at runtime opens a window to allocate blocks for other purposes that might be required for the rmapbt on a subsequent mount. This problem can be demonstrated by a simple test that runs an allocation workload to consume agfl blocks over time and then observe the difference in the agfl reservation requirement across an unmount/mount cycle: mount ...: xfs_ag_resv_init: ... resv 3193 ask 3194 len 3194 ... ... : xfs_ag_resv_alloc_extent: ... resv 2957 ask 3194 len 1 umount...: xfs_ag_resv_free: ... resv 2956 ask 3194 len 0 mount ...: xfs_ag_resv_init: ... resv 3052 ask 3194 len 3194 As the above tracepoints show, the reservation requirement reduces from 3194 blocks to 2956 blocks as the workload runs. Without any other changes in the filesystem, the same reservation requirement jumps from 2956 to 3052 blocks over a umount/mount cycle. To address this divergence, update the RMAPBT reservation to account blocks used for the rmapbt only rather than all blocks filled into the agfl. This patch makes several high-level changes toward that end: 1.) Reintroduce an AGFL reservation type to serve as an accounting no-op for blocks allocated to (or freed from) the AGFL. 2.) Invoke RMAPBT usage accounting from the actual rmapbt block allocation path rather than the AGFL allocation path. The first change is required because agfl blocks are considered free blocks throughout their lifetime. The perag reservation subsystem is invoked unconditionally by the allocation subsystem, so we need a way to tell the perag subsystem (via the allocation subsystem) to not make any accounting changes for blocks filled into the AGFL. The second change causes the in-core RMAPBT reservation usage accounting to remain consistent with the on-disk state at all times and eliminates the risk of leaving the rmapbt reservation underfilled. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-11xfs: rename agfl perag res type to rmapbtBrian Foster1-17/+18
The AGFL perag reservation type accounts all allocations that feed into (or are released from) the allocation group free list (agfl). The purpose of the reservation is to support worst case conditions for the reverse mapping btree (rmapbt). As such, the agfl reservation usage accounting only considers rmapbt usage when the in-core counters are initialized at mount time. This implementation inconsistency leads to divergence of the in-core and on-disk usage accounting over time. In preparation to resolve this inconsistency and adjust the AGFL reservation into an rmapbt specific reservation, rename the AGFL reservation type and associated accounting fields to something more rmapbt-specific. Also fix up a couple tracepoints that incorrectly use the AGFL reservation type to pass the agfl state of the associated extent where the raw reservation type is expected. Note that this patch does not change perag reservation behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-01xfs: move error injection tags into their own fileDarrick J. Wong1-0/+1
Move the error injection tag names into a libxfs header so that we can share it between kernel and userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2017-09-26xfs: perag initialization should only touch m_ag_max_usable for AG 0Darrick J. Wong1-2/+10
We call __xfs_ag_resv_init to make a per-AG reservation for each AG. This makes the reservation per-AG, not per-filesystem. Therefore, it is incorrect to adjust m_ag_max_usable for each AG. Adjust it only when we're reserving AG 0's blocks so that we only do it once per fs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2017-06-27xfs: remove unneeded parameter from XFS_TEST_ERRORDarrick J. Wong1-2/+1
Since we moved the injected error frequency controls to the mountpoint, we can get rid of the last argument to XFS_TEST_ERROR. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2017-01-25xfs: use per-AG reservations for the finobtChristoph Hellwig1-6/+41
Currently we try to rely on the global reserved block pool for block allocations for the free inode btree, but I have customer reports (fairly complex workload, need to find an easier reproducer) where that is not enough as the AG where we free an inode that requires a new finobt block is entirely full. This causes us to cancel a dirty transaction and thus a file system shutdown. I think the right way to guard against this is to treat the finot the same way as the refcount btree and have a per-AG reservations for the possible worst case size of it, and the patch below implements that. Note that this could increase mount times with large finobt trees. In an ideal world we would have added a field for the number of finobt fields to the AGI, similar to what we did for the refcount blocks. We should do add it next time we rev the AGI or AGF format by adding new fields. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-25xfs: only update mount/resv fields on success in __xfs_ag_resv_initChristoph Hellwig1-9/+14
Try to reserve the blocks first and only then update the fields in or hanging off the mount structure. This way we can call __xfs_ag_resv_init again after a previous failure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-01-03xfs: use the actual AG length when reserving blocksDarrick J. Wong1-0/+3
We need to use the actual AG length when making per-AG reservations, since we could otherwise end up reserving more blocks out of the last AG than there are actual blocks. Complained-about-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: simulate per-AG reservations being critically lowDarrick J. Wong1-1/+3
Create an error injection point that enables us to simulate being critically low on per-AG block reservations. This should enable us to simulate this specific ENOSPC condition so that we can test falling back to a regular file copy. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-10-05xfs: preallocate blocks for worst-case btree expansionDarrick J. Wong1-0/+11
To gracefully handle the situation where a CoW operation turns a single refcount extent into a lot of tiny ones and then run out of space when a tree split has to happen, use the per-AG reserved block pool to pre-allocate all the space we'll ever need for a maximal btree. For a 4K block size, this only costs an overhead of 0.3% of available disk space. When reflink is enabled, we have an unfortunate problem with rmap -- since we can share a block billions of times, this means that the reverse mapping btree can expand basically infinitely. When an AG is so full that there are no free blocks with which to expand the rmapbt, the filesystem will shut down hard. This is rather annoying to the user, so use the AG reservation code to reserve a "reasonable" amount of space for rmap. We'll prevent reflinks and CoW operations if we think we're getting close to exhausting an AG's free space rather than shutting down, but this permanent reservation should be enough for "most" users. Hopefully. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [hch@lst.de: ensure that we invalidate the freed btree buffer] Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-09-19xfs: set up per-AG free space reservationsDarrick J. Wong1-0/+325
One unfortunate quirk of the reference count and reverse mapping btrees -- they can expand in size when blocks are written to *other* allocation groups if, say, one large extent becomes a lot of tiny extents. Since we don't want to start throwing errors in the middle of CoWing, we need to reserve some blocks to handle future expansion. The transaction block reservation counters aren't sufficient here because we have to have a reserve of blocks in every AG, not just somewhere in the filesystem. Therefore, create two per-AG block reservation pools. One feeds the AGFL so that rmapbt expansion always succeeds, and the other feeds all other metadata so that refcountbt expansion never fails. Use the count of how many reserved blocks we need to have on hand to create a virtual reservation in the AG. Through selective clamping of the maximum length of allocation requests and of the length of the longest free extent, we can make it look like there's less free space in the AG unless the reservation owner is asking for blocks. In other words, play some accounting tricks in-core to make sure that we always have blocks available. On the plus side, there's nothing to clean up if we crash, which is contrast to the strategy that the rough draft used (actually removing extents from the freespace btrees). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>