aboutsummaryrefslogtreecommitdiffstatshomepage
AgeCommit message (Collapse)AuthorFilesLines
2025-05-21bcachefs: print_string_as_lines: avoid printing empty lineKent Overstreet1-0/+13
If the final line in in the message to be printed is blang, don't print it. This happens with indented printbufs - after a newline we emit spaces up to the indent level. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Make various async objs visible in debugfsKent Overstreet7-15/+100
Add async objs list for - promote_op - bch_read_bio - btree_read_bio - btree_write_bio This gets us introspection on in-flight async ops, and because under the hood it uses fast_lists (percpu slot buffer on top of a radix tree), it'll be fast enough to enable in production. This will be very helpful for debugging "something got stuck" issues, which have been cropping up from time to time (in the CI, especially with folio writeback). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Async object debuggingKent Overstreet10-32/+224
Debugging infrastructure for async objs: this lets us easily create fast_lists for various object types so they'll be visible in debugfs. Add new object types to the BCH_ASYNC_OBJS_TYPES() enum, and drop a pretty-printer wrapper in async_objs.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: fast_listKent Overstreet3-0/+198
A fast "list" data structure, which is actually a radix tree, with an IDA for slot allocation and a percpu buffer on top of that. Items cannot be added or moved to the head or tail, only added at some (arbitrary) position and removed. The advantage is that adding, removing and iteration is generally lockless, only hitting the lock in ida when the percpu buffer is full or empty. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_read_bio_to_textKent Overstreet3-3/+52
Pretty printer for struct bch_read_bio. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_bio_to_text()Kent Overstreet2-0/+12
Pretty printer for struct bio, to be used for async object debugging. This is pretty minimal, we'll add more to it as we discover what we need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch_dev.io_ref -> enumerated_refKent Overstreet19-134/+222
Convert device IO refs to enumerated_refs, for easier debugging of refcount issues. Simple conversion: enumerate all users and convert to the new helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch_fs.writes -> enumerated_refsKent Overstreet18-153/+86
Drop the single-purpose write ref code in bcachefs.h, and convert to enumarated refs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: enumerated_ref.cKent Overstreet4-0/+230
Factor out the debug code for rw filesystem refs into a small library. In release mode an enumerated ref is a normal percpu refcount, but in debug mode all enumerated users of the ref get their own atomic_long_t ref - making it much easier to chase down refcount usage bugs for when a refcount has many users. For debugging, we have enumerated_ref_to_text(), which prints the current value of each different user. Additionally, in debug mode enumerated_ref_stop() has a 10 second timeout, after which it will dump outstanding refcounts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: for_each_rw_member_rcu()Kent Overstreet6-9/+21
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: __bch2_fs_read_write() no longer depends on io_refKent Overstreet1-6/+13
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: for_each_online_member_rcu()Kent Overstreet6-21/+35
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: recalc_capacity() no longer depends on io_refKent Overstreet1-7/+10
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_target_to_text() no longer depends on io_refKent Overstreet1-5/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_check_rebalance_work()Kent Overstreet3-0/+119
Add a pass for checking the rebalance_work btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Kill dead codeAlan Huang1-2/+0
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Fix struct with flex member ABI warningKent Overstreet2-16/+16
This pops up when buliding in userspace. The structs aren't actually variable length, but no way to tell the compiler that... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21docs: bcachefs: idle work scheduling design docKent Overstreet2-0/+85
People have been asking to see the plan for this, so - bcachefs has various background tasks that need to be scheduled to balance efficiency, predictability of performance, etc. The design and philosophy hasn't changed too much since bcache, which was primarily designed for server usage, with sustained load in mind. These days we're seeing more desktop usage - where we really want to let the system idle effictively, to reduce total power usage - while also still balancing previous concerns, we still want to let work accumulate to a degree. This lays out all the requirements and starts to sketch out the algorithm I have in mind. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_move_data_btree() can now walk rootsKent Overstreet1-1/+46
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_move_data_btree() can move btree nodesKent Overstreet2-13/+26
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: plumb btree_id through move_pred_fdKent Overstreet3-11/+13
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Plumb target parameter through btree_node_rewrite_pos()Kent Overstreet4-18/+30
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: export bch2_move_data_phys()Kent Overstreet2-10/+15
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: BCH_MEMBER_RESIZE_ON_MOUNTKent Overstreet7-12/+77
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: BCH_FEATURE_small_imageKent Overstreet7-17/+40
We can't go RW if it's an image file that hasn't been resized. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: BCH_FEATURE_no_alloc_infoKent Overstreet8-16/+61
If a filesystem is going to only be used read-only, and will be a deployable image, we can strip out alloc info for a substantial reduction in metadata size - around half, due to backpointers. Alloc info will be regenerated on first read-write mount. Remounting RW is disallowed for now, since we don't yet have check_allocations running in RW mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Print features on startup with -o verboseKent Overstreet1-0/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Shrink superblock downgrade tableKent Overstreet1-0/+3
Don't generate entries for versions that won't be able to mount. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: sb_validate() no longer requires members_v1Kent Overstreet1-3/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Add a recovery pass for making sure root inode is readableKent Overstreet2-1/+17
If the root inode/subvolume is unreadable we can repair automatically - but only if we're still in recovery, so that we can rewind to the appropriate recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Flag for repair on missing subvolumeKent Overstreet1-15/+28
Instead of going emegency read only with a bch2_fs_inconsistent() call, log the error and recovery pass appropriately. If we're still in recovery it'll be repaired immediately, otherwise it'll be repaired on the next mount. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: print_str_as_lines() -> print_str()Kent Overstreet19-43/+44
bch2_print_string_as_lines() is a low level helper that allows messages longer than 1k to be printed without truncation. But we should always be printing with the helpers that take a filesystem object, if we're in fsck they direct output to the userspace process controlling fsck instead of the dmesg log. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_dev_missing_bkey()Kent Overstreet3-5/+35
Part of the ongoing project to kill off bch2_(fs|trans)_inconsistent calls - they generally need to be replaced with either - a fsck_err() call that can repair the error, or - logging an error of the appropriate type in the superblock, and flagging the appropriate recovery pass to repair the error Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Simplify bch2_count_fsck_err()Kent Overstreet5-28/+17
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_run_explicit_recovery_pass_printbuf()Kent Overstreet4-19/+57
We prefer helpers that emit log messages to printbufs rather than printing them directly; that way, we can ensure that different log messages from the same event are grouped together and formatted appropriately in the dmesg log. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Incompatible features may now be enabled at runtimeKent Overstreet5-3/+38
version_upgrade is now a runtime option. In the future we'll want to add compatible upgrades at runtime, and call the full check_version_upgrade() when the option changes, but we don't have compatible optional upgrades just yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Clean up option pre/post hooks, small fixesKent Overstreet5-39/+86
The helpers are now: - bch2_opt_hook_pre_set() - bch2_opts_hooks_pre_set() - bch2_opt_hook_post_set Fix a bug where the filesystem discard option would incorrectly be changed when setting the device option, and don't trigger rebalance scans unnecessarily (when options aren't changing). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Use drop_locks_do() in bch2_inode_hash_find()Kent Overstreet1-3/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Single device modeKent Overstreet8-11/+44
Single device filesystems are now identified by the block device name, not the UUID - and single device filesystems with the same UUID can be mounted simultaneously, without any special options. This allocates a new bit in the superblock, BCH_SB_MULTI_DEVICE, which indicates whether a filesystem has ever been multi device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Initialize c->name earlier on single dev filesystemsKent Overstreet1-15/+18
On single device filesystems, c->name contains the block device name, not the UUID. Initialize this earlier, so that single device mode can use it for initializing sysfs/debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Simplify logicAlan Huang1-6/+2
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Remove spurious +1/-1 operationAlan Huang1-2/+2
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Kill bch2_trans_unlock_noassertAlan Huang3-9/+1
Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Clean up duplicated code in bch2_journal_halt()Kent Overstreet2-11/+8
It's now a wrapper around bch2_journal_halt_locked(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_dev_allocator_set_rw()Kent Overstreet2-7/+12
Add a helper that lets us change bch_member.data_allowed at runtime. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: bch2_dev_journal_alloc() now respects data_allowedKent Overstreet1-0/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Improve bch2_btree_cache_to_text()Kent Overstreet2-3/+9
Make the output slightly clearer, and include a counter for "nodes we couldn't free because we would have gone under our reserve". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: __btree_node_reclaim_checks()Kent Overstreet1-66/+69
Factor out a helper so we're not duplicating checks after locking the btree node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: kill BTREE_CACHE_NOT_FREED_INCREMENT()Kent Overstreet1-26/+20
Small cleanup, just always increment the counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-05-21bcachefs: Improve opts.degradedKent Overstreet5-33/+76
Kill 'opts.very_degraded', and make 'opts.degraded' a persistent option, stored in the superblock. It's now an enum, with available choices ask/yes/very/no. "ask" mode will be handled by the mount helper, for prompting the user (on a machine used interactively) for whether to do a degraded mount. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>