aboutsummaryrefslogtreecommitdiffstats
path: root/fs/btrfs/space-info.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-09-09btrfs: add enospc debug messages for ticket failureJosef Bacik1-7/+25
When debugging weird enospc problems it's handy to be able to dump the space info when we wake up all tickets, and see what the ticket values are. This helped me figure out cases where we were enospc'ing when we shouldn't have been. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: do not account global reserve in can_overcommitJosef Bacik1-18/+1
We ran into a problem in production where a box with plenty of space was getting wedged doing ENOSPC flushing. These boxes only had 20% of the disk allocated, but their metadata space + global reserve was right at the size of their metadata chunk. In this case can_overcommit should be allowing allocations without problem, but there's logic in can_overcommit that doesn't allow us to overcommit if there's not enough real space to satisfy the global reserve. This is for historical reasons. Before there were only certain places we could allocate chunks. We could go to commit the transaction and not have enough space for our pending delayed refs and such and be unable to allocate a new chunk. This would result in a abort because of ENOSPC. This code was added to solve this problem. However since then we've gained the ability to always be able to allocate a chunk. So we can easily overcommit in these cases without risking a transaction abort because of ENOSPC. Also prior to now the global reserve really would be used because that's the space we relied on for delayed refs. With delayed refs being tracked separately we no longer have to worry about running out of delayed refs space while committing. We are much less likely to exhaust our global reserve space during transaction commit. Fix the can_overcommit code to simply see if our current usage + what we want is less than our current free space plus whatever slack space we have in the disk is. This solves the problem we were seeing in production and keeps us from flushing as aggressively as we approach our actual metadata size usage. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: remove orig_bytes from reserve_ticketJosef Bacik1-8/+0
Now that we do not do partial filling of tickets simply remove orig_bytes, it is no longer needed. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: fix may_commit_transaction to deal with no partial fillingJosef Bacik1-0/+12
Now that we aren't partially filling tickets we may have some slack space left in the space_info. We need to account for this in may_commit_transaction, otherwise we may choose to not commit the transaction despite it actually having enough space to satisfy our ticket. Calculate the free space we have in the space_info, if any, and subtract this from the ticket we have and use that amount to determine if we will need to commit to reclaim enough space. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: rework wake_all_ticketsJosef Bacik1-7/+49
Now that we no longer partially fill tickets we need to rework wake_all_tickets to call btrfs_try_to_wakeup_tickets() in order to see if any subsequent tickets are able to be satisfied. If our tickets_id changes we know something happened and we can keep flushing. Also if we find a ticket that is smaller than the first ticket in our queue then we want to retry the flushing loop again in case may_commit_transaction() decides we could satisfy the ticket by committing the transaction. Rename this to maybe_fail_all_tickets() while we're at it, to better reflect what the function is actually doing. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: refactor the ticket wakeup codeJosef Bacik1-50/+5
Now that btrfs_space_info_add_old_bytes simply checks if we can make the reservation and updates bytes_may_use, there's no reason to have both helpers in place. Factor out the ticket wakeup logic into it's own helper, make btrfs_space_info_add_old_bytes() update bytes_may_use and then call the wakeup helper, and replace all calls to btrfs_space_info_add_new_bytes() with the wakeup helper. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: stop partially refilling tickets when releasing spaceJosef Bacik1-27/+16
btrfs_space_info_add_old_bytes is used when adding the extra space from an existing reservation back into the space_info to be used by any waiting tickets. In order to keep us from overcommitting we check to make sure that we can still use this space for our reserve ticket, and if we cannot we'll simply subtract it from space_info->bytes_may_use. However this is problematic, because it assumes that only changes to bytes_may_use would affect our ability to make reservations. Any changes to bytes_reserved would be missed. If we were unable to make a reservation prior because of reserved space, but that reserved space was free'd due to unlink or truncate and we were allowed to immediately reclaim that metadata space we would still ENOSPC. Consider the example where we create a file with a bunch of extents, using up 2MiB of actual space for the new tree blocks. Then we try to make a reservation of 2MiB but we do not have enough space to make this reservation. The iput() occurs in another thread and we remove this space, and since we did not write the blocks we simply do space_info->bytes_reserved -= 2MiB. We would never see this because we do not check our space info used, we just try to re-use the freed reservations. To fix this problem, and to greatly simplify the wakeup code, do away with this partial refilling nonsense. Use btrfs_space_info_add_old_bytes to subtract the reservation from space_info->bytes_may_use, and then check the ticket against the total used of the space_info the same way we do with the initial reservation attempt. This keeps the reservation logic consistent and solves the problem of early ENOSPC in the case that we free up space in places other than bytes_may_use and bytes_pinned. Thanks, Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: roll tracepoint into btrfs_space_info_update helperJosef Bacik1-10/+0
We duplicate this tracepoint everywhere we call these helpers, so update the helper to have the tracepoint as well. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: do not allow reservations if we have pending ticketsJosef Bacik1-3/+7
If we already have tickets on the list we don't want to steal their reservations. This is a preparation patch for upcoming changes, technically this shouldn't happen today because of the way we add bytes to tickets before adding them to the space_info in most cases. This does not change the FIFO nature of reserve tickets, it simply allows us to enforce it in a different way. Previously it was enforced because any new space would be added to the first ticket on the list, which would result in new reservations getting a reserve ticket. This replaces that mechanism by simply checking to see if we have outstanding reserve tickets and skipping straight to adding a ticket for our reservation. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: move math functions to misc.hDavid Sterba1-1/+1
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: rename the btrfs_calc_*_metadata_size helpersJosef Bacik1-1/+1
btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that it doesn't take into account any splitting at the levels, because truncate will never split nodes. However truncate _and_ changing will never split nodes, so rename btrfs_calc_trunc_metadata_size to btrfs_calc_metadata_size. Also btrfs_calc_trans_metadata_size is purely for inserting items, so rename this to btrfs_calc_insert_metadata_size. Making these clearer will help when I start using them differently in upcoming patches. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: introduce an evict flushing stateJosef Bacik1-2/+25
We have this weird space flushing loop inside inode.c for evict where we'll do the normal LIMIT flush, and then commit the transaction and hope we get our space. This is super janky, and in fact there's really nothing stopping us from using FLUSH_ALL except that we run delayed iputs, which means we could deadlock. So introduce a new flush state for eviction that does the normal priority flushing with all of the states that are safe for eviction. The nice side-effect of this is that we'll try harder for evictions. Previously if (for example generic/269) you had a bunch of other operations happening on the fs you could race with those reservations when committing the transaction, and eventually miss getting a reservation for the evict. With this code we'll have our ticket in place through the transaction commit, so any pinned bytes will go to our pending evictions first. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: refactor priority_reclaim_metadata_spaceJosef Bacik1-6/+9
With the eviction flushing stuff we'll want to allow for different states, but still work basically the same way that priority_reclaim_metadata_space works currently. Refactor this to take the flushing states and size as an argument so we can use the same logic for limit flushing and eviction flushing. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: factor out the ticket flush handlingJosef Bacik1-22/+42
We're going to make this logic a little more complicated for evict, so factor the ticket flushing/waiting code out of __reserve_metadata_bytes. This has no functional change. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: unify error handling for ticket flushingJosef Bacik1-21/+11
Currently we handle the cleanup of errored out tickets in both the priority flush path and the normal flushing path. This is the same code in both places, so just refactor so we don't duplicate the cleanup work. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: add a flush step for delayed iputsJosef Bacik1-2/+3
Delayed iputs could very well free up enough space without needing to commit the transaction, so make this step it's own step. This will allow us to skip the step for evictions in a later patch. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: factor out sysfs code for creating space infosDavid Sterba1-23/+2
Move creation of data/metadata/system space info directories to sysfs.c. Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09btrfs: move basic block_group definitions to their own headerJosef Bacik1-0/+1
This is prep work for moving all of the block group cache code into its own file. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor comment updates ] Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: Simplify update of space_info in __reserve_metadata_bytes()Goldwyn Rodrigues1-11/+5
We don't need an if-else-if chain where we can use a simple OR since both conditions are performing the same action. The short-circuit for OR will ensure that if the first condition is true, can_overcommit() is not called. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: unexport can_overcommitJosef Bacik1-12/+11
Now that we've moved all of the users to space-info.c, unexport it and name it back to can_overcommit. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move reserve_metadata_bytes and supporting code to space-info.cJosef Bacik1-0/+698
This moves all of the metadata reservation code into space-info.c. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move dump_space_info to space-info.cJosef Bacik1-0/+55
We'll need this exported so we can use it in all the various was we need to use it. This is prep work to move reserve_metadata_bytes. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move btrfs_space_info_add_*_bytes to space-info.cJosef Bacik1-0/+106
Now that we've moved all the pre-requisite stuff, move these two functions. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move and export can_overcommitJosef Bacik1-0/+68
This is the first piece of moving the space reservation code to space-info.c Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02btrfs: move the space_info handling code to space-info.cJosef Bacik1-0/+174
These are the basic init and lookup functions and some helper functions, fairly straightforward before the bad stuff starts. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>