aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-09-23btrfs: store and use node size in local variable in check_eb_alignment()Filipe Manana1-7/+8
Instead of dereferencing fs_info every time we need to access the node size, store in a local variable to make the code less verbose and avoid a line split too. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print key types as human readable stringsFilipe Manana1-2/+66
Looking at a leaf dump from the kernel's print-tree implementation is not so friendly to analyze since key types are printed as numbers. Improve on this by printing key types as strings that are a diminutive of the macro names for key types, just like we do in btrfs-progs. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: move code for processing file extent item into helperFilipe Manana1-23/+29
The code for processing file extent items is quite large and it's better to have it in a dedicated helper rather than in a huge switch statement, just like we do in btrfs-progs. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print compression type for file extent itemsFilipe Manana1-2/+5
We are not printing anything about the compression type, so add that useful information in the same format as btrfs-progs. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print correct inline extent data sizeFilipe Manana1-2/+4
We are advertising the ram_bytes of an inline extent as its data size, but that is not true for compressed extents. The ram_bytes corresponds to the uncompressed data size while the data size (compressed data) is given by btrfs_file_extent_inline_item_len(). So fix this and print both values in the same format as in btrfs-progs. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print range information for extent csum itemsFilipe Manana1-0/+15
Currently we don't print anything for extent csum items other than the generic line with the key, item offset and item size. While one can still determine the range the extent csum covers by doing a few simple computations, it makes it more time consuming to analyse a leaf dump. So add a line that prints information about the range covered by the checksum using the same format as btrfs-progs. This is useful when debugging log tree issues since we log extent csum items for new extents. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print information about dir log itemsFilipe Manana1-0/+11
We currently don't print information about dir log items (other than the key, item offset and item size), which is useful to look at when debugging problems with a log tree. So print their specific information (currently they only have an end index number) in a format similar to btrfs-progs. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print information about inode extref itemsFilipe Manana1-0/+23
Currently we ignore inode extref items, we just print their key, item offset in the leaf and their size, no information about their content like the index number, parent inode, name length and name. Improve on this by printing the index, parent and name length in the same format as btrfs-progs. Note that we don't print the name, as that would require some processing and escaping like we do in btrfs-progs, and that could expose sensitive information for some users in case they share their dmesg/syslog and it contains a leaf dump. So for now leave names out. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print information about inode ref itemsFilipe Manana1-0/+20
Currently we ignore inode ref items, we just print their key, item offset in the leaf and their size, no information about their content like the index number, name length and name. Improve on this by printing the index and name length in the same format as btrfs-progs. Note that we don't print the name, as that would require some processing and escaping like we do in btrfs-progs, and that could expose sensitive information for some users in case they share their dmesg/syslog and it contains a leaf dump. So for now leave names out. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print dir items for dir index and xattr keys tooFilipe Manana1-0/+2
Currently we only print the dir items for BTRFS_DIR_ITEM_KEY keys, but we also have dir items for BTRFS_DIR_INDEX_KEY and BTRFS_XATTR_ITEM_KEY keys too. So print them for those keys too. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print more information about dir itemsFilipe Manana1-7/+24
Currently we only print the object id component of the location key from a dir item and the flags. We are missing the whole key, transid and the name and data lengths. We are also ignoring the fact that we can have multiple dir item objects encoded in a single item for a BTRFS_DIR_ITEM_KEY key, so what we print is only for the first item. Improve on this by iterating on all dir items and print the missing information. This is done with the same format as in btrfs-progs, what we miss is printing the names and data since not only that would require some processing and escaping like in btrfs-progs, but it would also reveal information that may be sensitive and users may not want to share that in case that get a leaf dumped in dmesg. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: print-tree: print missing fields for inode itemsFilipe Manana1-6/+31
We are not dumping a lot of fields for an inode item which are useful for debugging whenever we dump a leaf (log replay failure for example), so add them and make it as close as possible to the print tree implementation in btrfs-progs (things like converting timespecs to human readable dates and converting flags to strings are missing since they are not so practical to do in the kernel). Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: tree-checker: add inode extref checksQu Wenruo1-0/+37
Like inode refs, inode extrefs have a variable length name, which means we have to do a proper check to make sure no header nor name can exceed the item limits. The check itself is very similar to check_inode_ref(), just a different structure (btrfs_inode_extref vs btrfs_inode_ref). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: send: index backref cache by node number instead of by sector numberFilipe Manana1-2/+2
We now have a nodesize_bits member in fs_info so we can index an extent buffer in the backref cache by node number instead of by sector number. While this allows for a denser index space with the possibility of using less maple tree nodes, in practice it's unlikely to hit such benefits since we currently limit the maximum number of keys in the cache to 128, so unless all extent buffers are contiguous we are unlikely to see a memory usage reduction in the backing maple tree due to fewer nodes. Nevertheless it doesn't cost anything to index by node number and it's more logical. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: dump detailed info and specific messages on log replay failuresFilipe Manana3-86/+349
Currently debugging log replay failures can be harder than needed, since all we do now is abort a transaction, which gives us a line number, a stack trace and an error code. But that is most of the times not enough to give some clue about what went wrong. So add a new helper to abort log replay and provide contextual information: 1) Dump the current leaf of the log tree being processed and print the slot we are currently at and the key at that slot; 2) Dump the current subvolume tree leaf if we have any; 3) Print the current stage of log replay; 4) Print the id of the subvolume root associated with the log tree we are currently processing (as we can have multiple); 5) Print some error message to mention what we were trying to do when we got an error. Replace all transaction abort calls (btrfs_abort_transaction()) with the new helper btrfs_abort_log_replay(), which besides dumping all that extra information, it also aborts the current transaction. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: abort transaction if we fail to update inode in log replay dir fixupFilipe Manana1-0/+2
If we fail to update the inode at link_to_fixup_dir(), we don't abort the transaction and propagate the error up the call chain, which makes it hard to pinpoint the error to the inode update. So abort the transaction if the inode update call fails, so that if it happens we known immediately. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: abort transaction if we fail to find dir item during log replayFilipe Manana1-1/+3
At __add_inode_ref() if we get an error when trying to lookup a dir item we don't abort the transaction and propagate the error up the call chain, so that somewhere else up in the call chain the transaction is aborted. This however makes it hard to know that the failure comes from looking up a dir item, so add a transaction abort in case we fail there, so that we immediately pinpoint where the problem comes from during log replay. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: remove pointless inode lookup when processing extrefs during log replayFilipe Manana1-13/+1
At unlink_extrefs_not_in_log() we do an inode lookup of the directory but we already have the directory inode accessible as a function argument, so the lookup is redudant. Remove it and use the directory inode passed in as an argument. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: stop passing inode object IDs to __add_inode_ref() in log replayFilipe Manana1-21/+15
There's no point in passing the inode and parent inode object IDs to __add_inode_ref() and its helpers because we can get them by calling btrfs_ino() against the inode and the directory inode, and we pass both inodes to __add_inode_ref() and its helpers. So remove the object IDs parameters to reduce arguments passed and to make things less confusing. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: add path for subvolume tree changes to struct walk_controlFilipe Manana1-167/+156
While replaying log trees we need to do searches and updates to subvolume trees and for that we use a path that we allocate in replay_one_buffer() and then pass it as a parameter to other functions deeper in the log replay call chain. Instead of passing it as parameter, add it to struct walk_control since we pass a pointer to that structure for every log replay function. This reduces the number of arguments passed to the functions and it will be needed and important for an upcoming changes that improves error reporting for log replay. Also name the new filed in struct walk_control to 'subvol_path' - while that is longer to type, the naming makes it clear it's used for subvolume tree operations since many of the log replay functions operate both on subvolume and log trees, and for the log tree searches we have struct walk_control::log_leaf to also make it obvious it's an extent buffer for a log tree extent buffer. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: remove redundant path release when overwriting item during log replayFilipe Manana1-1/+0
At overwrite_item() we have a redundant btrfs_release_path() just before failing with -ENOMEM, as the caller who passed in the path will free it and therefore also release any refcounts and locks on the extent buffers of the path. So remove it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: remove redundant path release when processing dentry during log replayFilipe Manana1-1/+0
At replay_one_one() we have a redundant btrfs_release_path() just before calling insert_one_name(), as some lines above we have already released the path with another btrfs_release_path() call. So remove it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: avoid unnecessary path allocation when replaying a dir itemFilipe Manana1-9/+1
There's no need to allocate 'fixup_path' at replay_one_dir_item(), as the path passed as an argument is unused by the time link_to_fixup_dir() is called (replay_one_name() releases the path before it returns). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: avoid path allocations when dropping extents during log replayFilipe Manana1-0/+2
We can avoid a path allocation in the btrfs_drop_extents() calls we have at replay_one_extent() and replay_one_buffer() by passing the path we already have in those contextes as it's unused by the time they call btrfs_drop_extents(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: avoid unnecessary path allocation at fixup_inode_link_count()Filipe Manana1-7/+3
There's no need to allocate a path as our single caller already has a path that we can use. So pass the caller's path and use it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: add current log leaf, key and slot to struct walk_controlFilipe Manana1-129/+126
A lot of the log replay functions get passed the current log leaf being processed as well as the current slot and the key at that slot. Instead of passing them as parameters, add them to struct walk_control so that we reduce the numbers of parameters. This is also going to be needed to further changes that improve error reporting during log replay. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: use the inode item boolean everywhere in overwrite_item()Filipe Manana1-5/+5
We have this boolean 'inode_item' to tell if we are processing an inode item key and we use it in a couple of places while in another two places we open code by checking if the key type matches the inode item type. Make this consistent and use the boolean everywhere. Also rename it from 'inode_item' to 'is_inode_item', which makes it more clear that it's a boolean and not an instance of struct btrfs_inode_item, and make it const too. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: use level argument in log tree walk callback replay_one_buffer()Filipe Manana1-5/+3
We already have the extent buffer's level in an argument, there's no need to first ensure the extent buffer's data is loaded (by calling btrfs_read_extent_buffer()) and then call btrfs_header_level() to check the level. So use the level argument and do the check before calling btrfs_read_extent_buffer(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: use level argument in log tree walk callback process_one_buffer()Filipe Manana1-2/+1
We already have the extent buffer's level in an argument, there's no need to call btrfs_header_level(). So use the level argument and make the code shorter. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: pass walk_control structure to overwrite_item()Filipe Manana1-6/+7
Instead of passing the transaction and subvolume root as arguments to overwrite_item(), pass the walk_control structure as we can grab them from the structure. This reduces the number of arguments passed and it's going to be needed by an incoming change that improves error reporting for log replay. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: pass walk_control structure to drop_one_dir_item() and helpersFilipe Manana1-21/+23
Instead of passing the transaction as an argument to drop_one_dir_item() and its helpers (link_to_fixup_dir() and unlink_inode_for_log_replay()), pass the walk_control structure as we can access the transaction from it and the subvolume root. This is going to be needed by an incoming change that improves error reporting for log replay and also reduces the number of arguments passed to link_to_fixup_dir(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: pass walk_control structure to replay_one_dir_item() and replay_one_name()Filipe Manana1-8/+8
Instead of passing the transaction and subvolume root and log tree as arguments, pass the walk_control structure as we can grab all of those from the structure. This reduces the number of arguments passed and it's going to be needed by an incoming change that improves error reporting for log replay. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: pass walk_control structure to add_inode_ref() and helpersFilipe Manana1-24/+23
Instead of passing the transaction, subvolume root and log tree as arguments to add_inode_ref() and its helpers (__add_inode_ref(), unlink_refs_not_in_log(), unlink_extrefs_not_in_log() and unlink_old_inode_refs()), pass the walk_control structure as we can access all of those from the structure. This reduces the number of arguments passed and it's going to be needed by an incoming change that improves error reporting for log replay. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: pass walk_control structure to replay_one_extent()Filipe Manana1-3/+4
Instead of passing the transaction and subvolume root as arguments to replay_one_extent(), pass the walk_control structure as we can grab all of those from the structure. This reduces the number of arguments passed and it's going to be needed by an incoming change that improves error reporting for log replay. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: pass walk_control structure to check_item_in_log()Filipe Manana1-8/+8
Instead of passing the transaction and log tree as arguments to check_item_in_log(), pass the walk_control structure as we can grab those from the structure. This reduces the number of arguments passed and it's going to be needed by an incoming change that improves error reporting for log replay. Notice that a NULL log root argument to check_item_in_log() makes it unconditionally delete a directory entry, so since the walk_control always has a non-NULL log root, we add an extra boolean to check_item_in_log() to tell it if it should unconditionally delete a directory entry, preserving the behaviour and also making it a bit more clear. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: pass walk_control structure to replay_dir_deletes()Filipe Manana1-14/+14
Instead of passing the transaction, subvolume root and log tree as arguments to replay_dir_deletes(), pass the walk_control structure as we can grab all of those from the structure. This reduces the number of arguments passed and it's going to be needed by an incoming change that improves error reporting for log replay. This also requires changing fixup_inode_link_counts() and fixup_inode_link_count() to take that structure as an argument since fixup_inode_link_count() makes a call to replay_dir_deletes(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: move up the definition of struct walk_controlFilipe Manana1-52/+51
In upcoming changes we need to pass struct walk_control as an argument to replay_dir_deletes() and link_to_fixup_dir() so we need to move its definition above the prototypes of those functions. So move it up right below the enum that defines log replay stages and before any functions and function prototypes are declared. Also fixup the comments while moving it so that they comply with the preferred code style (capitalize the first word in a sentence, end sentences with punctuation, makes lines wider and closer to the 80 characters limit). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: pass walk_control structure to replay_xattr_deletes()Filipe Manana1-6/+7
Instead of passing the transaction, subvolume root and log tree as arguments to replay_xattr_deletes(), pass the walk_control structure as we can grab all of those from the structure. This reduces the number of arguments passed and it's going to be needed by an incoming change that improves error reporting for log replay. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: always drop log root tree reference in btrfs_replay_log()Filipe Manana2-2/+1
Currently we have this odd behaviour: 1) At btrfs_replay_log() we drop the reference of the log root tree if the call to btrfs_recover_log_trees() failed; 2) But if the call to btrfs_recover_log_trees() did not fail, we don't drop the reference in btrfs_replay_log() - we expect that btrfs_recover_log_trees() does it in case it returns success. Let's simplify this and make btrfs_replay_log() always drop the reference on the log root tree, not only this simplifies code as it's what makes sense since it's btrfs_replay_log() who grabbed the reference in the first place. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: stop setting log_root_tree->log_root to NULL in btrfs_recover_log_trees()Filipe Manana1-1/+0
There's no point in setting log_root_tree->log_root to NULL as this is already NULL, we never assigned anything to it before and it's meaningless as a log root never has a value other than NULL for the ->log_root field, that can be not NULL only for non log roots. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: stop passing transaction parameter to log tree walk functionsFilipe Manana1-14/+14
It's unncessary to pass a transaction parameter since struct walk_control already has a member that points to the transaction, so we can make the functions access the structure. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: deduplicate log root free in error paths from btrfs_recover_log_trees()Filipe Manana1-4/+1
Instead of duplicating the dropping of a log tree in case we jump to the 'error' label, move the dropping under the 'error' label and get rid of the the unnecessary setting of the log root to NULL since we return immediately after. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: add and use a log root field to struct walk_controlFilipe Manana1-27/+35
Instead of passing an extra log root parameter for the log tree walk functions and callbacks, add the log tree to struct walk_control and make those functions and callbacks extract the log root from that structure, reducing the number of parameters. This also simplifies further upcoming changes to report log tree replay failures. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: rename root to log in walk_down_log_tree() and walk_up_log_tree()Filipe Manana1-6/+5
Everywhere we have a log root we name it as 'log' or 'log_root' except in walk_down_log_tree() and walk_up_log_tree() where we name it as 'root', which not only it's inconsistent, it's also confusing since we typically use 'root' when naming variables that refer to a subvolume tree. So for clairty and consistency rename the 'root' argument to 'log'. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: rename replay_dest member of struct walk_control to rootFilipe Manana1-15/+17
Everywhere else we refer to a subvolume root we are replaying to simply as 'root', so rename from 'replay_dest' to 'root' for consistency and having a more meaningful and shorter name. While at it also update the comment to be more detailed and comply to preferred style (first word in a sentence is capitalized and sentence ends with punctuation). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: use booleans in walk control structure for log replayFilipe Manana1-9/+14
The 'free' and 'pin' member of struct walk_control, used during log replay and when freeing a log tree, are defined as integers but in practice are used as booleans. Change their type to bool and while at it update their comments to be more detailed and comply with the preferred comment style (first word in a sentence is capitalized, sentences end with punctuation and the comment opening (/*) is on a line of its own). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: cache max and min order inside btrfs_fs_infoQu Wenruo3-3/+7
Inside btrfs_fs_info we cache several bits shift like sectorsize_bits. Apply this to max and min folio orders so that every time mapping order needs to be applied we can skip the calculation. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: introduce btrfs_bio_for_each_block_all() helperQu Wenruo2-29/+44
Currently if we want to iterate all blocks inside a bio, we do something like this: bio_for_each_segment_all(bvec, bio, iter_all) { for (off = 0; off < bvec->bv_len; off += sectorsize) { /* Iterate blocks using bv + off */ } } That's fine for now, but it will not handle future bs > ps, as bio_for_each_segment_all() is a single-page iterator, it will always return a bvec that's no larger than a page. But for bs > ps cases, we need a full folio (which covers at least one block) so that we can work on the block. To address this problem and handle future bs > ps cases better: - Introduce a helper btrfs_bio_for_each_block_all() This helper will create a local bvec_iter, which has the size of the target bio. Then grab the current physical address of the current location, then advance the iterator by block size. - Use btrfs_bio_for_each_block_all() to replace existing call sites Including: * set_bio_pages_uptodate() in raid56 * verify_bio_data_sectors() in raid56 Both will result much easier to read code. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: introduce btrfs_bio_for_each_block() helperQu Wenruo6-46/+60
Currently if we want to iterate a bio in block unit, we do something like this: while (iter->bi_size) { struct bio_vec bv = bio_iter_iovec(); /* Do something with using the bv */ bio_advance_iter_single(&bbio->bio, iter, sectorsize); } That's fine for now, but it will not handle future bs > ps, as bio_iter_iovec() returns a single-page bvec, meaning the bv_len will not exceed page size. This means the code using that bv can only handle a block if bs <= ps. To address this problem and handle future bs > ps cases better: - Introduce a helper btrfs_bio_for_each_block() Instead of bio_vec, which has single and multiple page version and multiple page version has quite some limits, use my favorite way to represent a block, phys_addr_t. For bs <= ps cases, nothing is changed, except we will do a very small overhead to convert phys_addr_t to a folio, then use the proper folio helpers to handle the possible highmem cases. For bs > ps cases, all blocks will be backed by large folios, meaning every folio will cover at least one block. And still use proper folio helpers to handle highmem cases. With phys_addr_t, we will handle both large folio and highmem properly. So there is no better single variable to present a btrfs block than phys_addr_t. - Extract the data block csum calculation into a helper The new helper, btrfs_calculate_block_csum() will be utilized by btrfs_csum_one_bio(). - Use btrfs_bio_for_each_block() to replace existing call sites Including: * index_one_bio() from raid56.c Very straight-forward. * btrfs_check_read_bio() Also update repair_one_sector() to grab the folio using phys_addr_t, and do extra checks to make sure the folio covers at least one block. We do not need to bother bv_len at all now. * btrfs_csum_one_bio() Now we can move the highmem handling into a dedicated helper, calculate_block_csum(), and use btrfs_bio_for_each_block() helper. There is one exception in btrfs_decompress_buf2page(), which is copying decompressed data into the original bio, which is not iterating using block size thus we don't need to bother. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: concentrate highmem handling for data verificationQu Wenruo5-31/+57
Currently for btrfs checksum verification, we do it in the following pattern: kaddr = kmap_local_*(); ret = btrfs_check_csum_csum(kaddr); kunmap_local(kaddr); It's OK for now, but it's still not following the patterns of helpers inside linux/highmem.h, which never requires a virt memory address. In those highmem helpers, they mostly accept a folio, some offset/length inside the folio, and in the implementation they check if the folio needs partial kmap, and do the handling. Inspired by those formal highmem helpers, enhance the highmem handling of data checksum verification by: - Rename btrfs_check_sector_csum() to btrfs_check_block_csum() To follow the more common term "block" used in all other major filesystems. - Pass a physical address into btrfs_check_block_csum() and btrfs_data_csum_ok() The physical address is always available even for a highmem page. Since it's page frame number << PAGE_SHIFT + offset in page. And with that physical address, we can grab the folio covering the page, and do extra checks to ensure it covers at least one block. This also allows us to do the kmap inside btrfs_check_block_csum(). This means all the extra HIGHMEM handling will be concentrated into btrfs_check_block_csum(), and no callers will need to bother highmem by themselves. - Properly zero out the block if csum mismatch Since btrfs_data_csum_ok() only got a paddr, we can not and should not use memzero_bvec(), which only accepts single page bvec. Instead use paddr to grab the folio and call folio_zero_range() Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>