aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/fs/bcachefs/btree_key_cache.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-04-02bcachefs: Kill btree_iter.transKent Overstreet1-4/+4
This was planned to be done ages ago, now finally completed; there are places where we have quite a few btree_trans objects on the stack, so this reduces stack usage somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-02bcachefs: do_trace_key_cache_fill()Kent Overstreet1-9/+15
Reducing stack frame usage; this moves the printbuf out of the main stack frame. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-26bcachefs: Fix deadlockAlan Huang1-1/+1
This fixes two deadlocks: 1.pcpu_alloc_mutex involved one as pointed by syzbot[1] 2.recursion deadlock. The root cause is that we hold the bc lock during alloc_percpu, fix it by following the pattern used by __btree_node_mem_alloc(). [1] https://lore.kernel.org/all/66f97d9a.050a0220.6bad9.001d.GAE@google.com/T/ Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com Tested-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-02-06bcachefs: Fix rcu imbalance in bch2_fs_btree_key_cache_exit()Kent Overstreet1-1/+0
Spotted by sparse. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-21bcachefs: Fix btree_trans_peek_key_cache()Kent Overstreet1-1/+3
BTREE_ITER_cached_nofill has some tricky corner cases; it's used internally for iterators that aren't walking the key cache, but need to be coherent with the key cache. It tells traverse to look up and lock the key cache entry if present, but don't create one if it doesn't exist. That means we have to have a BTREE_ITER_UPTODATE path (because after traverse the path has to be UPTODATE, or we pop assertions) that doesn't point to anything (which is the less bad option, taken by the previous fix). The previous fix for this path missed an issue that can happen in bch2_trans_peek_key_cache(): we can't set should_be_locked on a path that doesn't point to anything and doesn't hold locks. Fixes: bd5b09727f3d ("bcachefs: Don't set btree_path to updtodate if we don't fill") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Don't set btree_path to updtodate if we don't fillKent Overstreet1-3/+1
This fixes various locking asserts, and a null ptr deref in bch2_btree_iter_peek_path(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: fix bch2_btree_key_cache_drop()Kent Overstreet1-2/+12
When evicting, we shouldn't leave a pointer to the key cache entry lying around - that screws up btree path asserts we're adding. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-09bcachefs: Write lock btree node in key cache fillsKent Overstreet1-12/+22
this addresses a key cache coherency bug Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-29bcachefs: trace_key_cache_fillKent Overstreet1-0/+10
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: Assert that we're not violating key cache coherency rulesKent Overstreet1-3/+10
We're not allowed to have a dirty key in the key cache if the key doesn't exist at all in the btree - creation has to bypass the key cache, so that iteration over the btree can check if the key is present in the key cache. Things break in subtle ways if cache coherency is broken, so this needs an assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-21bcachefs: Use __GFP_ACCOUNT for reclaimable memoryKent Overstreet1-0/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Fix format specifier in bch2_btree_key_cache_to_text()Nathan Chancellor1-1/+1
When building for a 32-bit architecture, for which 'size_t' is 'unsigned int', there is a compiler warning due to use of '%lu': In file included from fs/bcachefs/vstructs.h:5, from fs/bcachefs/bcachefs_format.h:80, from fs/bcachefs/bcachefs.h:207, from fs/bcachefs/btree_key_cache.c:3: fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text': fs/bcachefs/btree_key_cache.c:795:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 795 | prt_printf(out, "pending:\t%lu\r\n", per_cpu_sum(bc->nr_pending)); | ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/util.h:78:63: note: in definition of macro 'prt_printf' 78 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:795:38: note: format string is defined here 795 | prt_printf(out, "pending:\t%lu\r\n", per_cpu_sum(bc->nr_pending)); | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as errors Use the proper specifier, '%zu', to resolve the warning. Fixes: e447e49977b8 ("bcachefs: key cache can now allocate from pending") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: key cache can now allocate from pendingKent Overstreet1-14/+44
btree_trans objects can hold the btree_trans_barrier srcu read lock for an extended amount of time (they shouldn't, but it's difficult to guarantee). the srcu barrier blocks memory reclaim, so to avoid too many stranded key cache items, this uses the new pending_rcu_items to allocate from pending items - like we did before, but now without a global lock on the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Rip out freelists from btree key cacheKent Overstreet1-312/+53
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: don't use rht_bucket() in btree_key_cache_scan()Kent Overstreet1-3/+27
rht_bucket() does strange complicated things when a rehash is in progress. Instead, just skip scanning when a rehash is in progress: scanning is going to be more expensive (many more empty slots to cover), and some sort of infinite loop is being observed Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-22bcachefs: clear path->should_be_locked in bch2_btree_key_cache_drop()Kent Overstreet1-0/+1
bch2_btree_key_cache_drop() evicts the key cache entry - it's used when we're doing an update that bypasses the key cache, because for cache coherency reasons a key can't be in the key cache unless it also exists in the btree - i.e. creates have to bypass the cache. After evicting, the path no longer points to a key cache key, and relock() will always fail if should_be_locked is true. Prep for improving path->should_be_locked assertions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()Kent Overstreet1-5/+0
bch2_btree_path_traverse_cached() was previously checking if it could just relock the path, which is a common idiom in path traversal. However, it was using btree_node_relock(), not btree_path_relock(); btree_path_relock() only succeeds if the path was in state BTREE_ITER_NEED_RELOCK. If the path was in state BTREE_ITER_NEED_TRAVERSE a full traversal is needed; this led to a null ptr deref in bch2_btree_path_traverse_cached(). And the short circuit check here isn't needed, since it was already done in the main bch2_btree_path_traverse_one(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Simplify btree key cache fill pathKent Overstreet1-199/+118
Don't allocate the new bkey_cached until after we've done the btree lookup; this means we can kill bkey_cached.valid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: bch2_btree_key_cache_drop() now evictsKent Overstreet1-1/+6
As part of improving btree key cache coherency, the bkey_cached.valid flag is going away. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: btree_path_cached_set()Kent Overstreet1-26/+24
new helper - small refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Leave a buffer in the btree key cache to avoid lock thrashingKent Overstreet1-0/+8
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Fix reporting of freed objects from key cache shrinkerKent Overstreet1-8/+5
We count objects as freed when we move them to the srcu-pending lists because we're doing the equivalent of a kfree_srcu(); the only difference is managing the pending list ourself means we can allocate from the pending list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: increase key cache shrinker batch sizeKent Overstreet1-1/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Enable automatic shrinking for rhashtablesKent Overstreet1-4/+5
Since the key cache shrinker walks the rhashtable, a mostly empty rhashtable leads to really nasty reclaim performance issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-28bcachefs: Fix locking assertKent Overstreet1-5/+5
We now track whether a transaction is locked, and verify that we don't have nodes locked when the transaction isn't locked; reorder relocks to not pop the new assert. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: x-macroize journal flags enumsKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: uninline set_btree_iter_dontneed()Kent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Fix format specifiers in bch2_btree_key_cache_to_text()Nathan Chancellor1-2/+2
When building for a 32-bit target, for which 'size_t' is 'unsigned int', there are two warnings around mismatched format specifiers and argument types: In file included from fs/bcachefs/vstructs.h:5, from fs/bcachefs/bcachefs_format.h:79, from fs/bcachefs/bcachefs.h:207, from fs/bcachefs/btree_key_cache.c:3: fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text': fs/bcachefs/btree_key_cache.c:1046:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1046 | prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ | | | size_t {aka unsigned int} fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf' 192 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:1046:47: note: format string is defined here 1046 | prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu); | ~~^ | | | long unsigned int | %u fs/bcachefs/btree_key_cache.c:1047:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1047 | prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~ | | | size_t {aka unsigned int} fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf' 192 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:1047:44: note: format string is defined here 1047 | prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu); | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as error Use the proper 'size_t' specifier, '%zu', to clear up the warnings for these platforms. Fixes: f2d47ec26af5 ("bcachefs: Btree key cache instrumentation") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Btree key cache instrumentationKent Overstreet1-4/+49
It turns out the btree key cache shrinker wasn't actually reclaiming anything, prior to the previous patch. This adds instrumentation so that if we have further issues we can see what's going on. Specifically, sysfs internal/btree_key_cache is greatly expanded with new counters, and the SRCU sequence numbers of the first 10 entries on each pending freelist, and we also add trigger_btree_key_cache_shrink for testing without having to prune all the system caches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: Use bch2_btree_path_upgrade() in key cache traverseKent Overstreet1-16/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet1-14/+14
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: prt_printf() now respects \r\n\tKent Overstreet1-6/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit()Kent Overstreet1-7/+9
Reported-by: syzbot+a35cdb62ec34d44fb062@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-20bcachefs: Tweak btree key cache shrinker so it actually freesKent Overstreet1-15/+4
Freeing key cache items is a multi stage process; we need to wait for an SRCU grace period to elapse, and we handle this ourselves - partially to avoid callback overhead, but primarily so that when allocating we can first allocate from the freed items waiting for an SRCU grace period. Previously, the shrinker was counting the items on the 'waiting for SRCU grace period' lists as items being scanned, but this meant that too many items waiting for an SRCU grace period could prevent it from doing any work at all. After this, we're seeing that items skipped due to the accessed bit are the main cause of the shrinker not making any progress, and we actually want the key cache shrinker to run quite aggressively because reclaimed items will still generally be found (more compactly) in the btree node cache - so we also tweak the shrinker to not count those against nr_to_scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-07bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu listHongbo Li1-0/+2
When allocating bkey_cached from bc->freed_pcpu list, it missed decreasing the count of nr_freed_pcpu which would cause the mismatch between the value of nr_freed_pcpu and the list items. This problem also exists in moving new bkey_cached to bc->freed_pcpu list. If these happened, the bug info may appear in bch2_fs_btree_key_cache_exit by the follow code: BUG_ON(list_count_nodes(&bc->freed_pcpu) != bc->nr_freed_pcpu); BUG_ON(list_count_nodes(&bc->freed_nonpcpu) != bc->nr_freed_nonpcpu); Fixes: c65c13f0eac6 ("bcachefs: Run btree key cache shrinker less aggressively") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-06bcachefs: JOURNAL_SPACE_LOWKent Overstreet1-1/+1
"bcachefs; Fix deadlock in bch2_btree_update_start()" was a significant performance regression (nearly 50%) on multithreaded random writes with fio. The reason is that the journal watermark checks multiple things, including the state of the btree write buffer, and on multithreaded update heavy workloads we're bottleneked on write buffer flushing - we don't want kicknig off btree updates to depend on the state of the write buffer. This isn't strictly correct; the interior btree update path does do write buffer updates, but it's a tiny fraction of total accounting updates and we're more concerned with space in the journal itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet1-1/+1
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Fix btree key cache coherency during replayKent Overstreet1-3/+5
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: trans_for_each_path() no longer uses path->idxKent Overstreet1-1/+2
path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_insert_entry -> btree_path_idx_tKent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree_iter -> btree_path_idx_tKent Overstreet1-5/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: skip journal more often in key cache reclaimKent Overstreet1-1/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs; kill bch2_btree_key_cache_flush()Kent Overstreet1-16/+0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: delete useless commit_do()Kent Overstreet1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: convert bch_fs_flags to x-macroKent Overstreet1-2/+2
Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Don't rejournal keys in key cache flushKent Overstreet1-6/+11
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Rename BTREE_INSERT flagsKent Overstreet1-3/+3
BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-03bcachefs: Don't drop journal pins in exit pathKent Overstreet1-2/+0
There's no need to drop journal pins in our exit paths - the code was trying to have everything cleaned up on any shutdown, but better to just tweak the assertions a bit. This fixes a bug where calling into journal reclaim in the exit path would cass a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-14bcachefs: Kill journal pre-reservationsKent Overstreet1-14/+0
This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Run btree key cache shrinker less aggressivelyKent Overstreet1-4/+19
The btree key cache maintains lists of items that have been freed, but can't yet be reclaimed because a bch2_trans_relock() call might find them - we're waiting for SRCU readers to release. Previously, we wouldn't count these items against the number we're attempting to scan for, which would mean we'd evict more live key cache entries - doing quite a bit of potentially unecessary work. With recent work to make sure we don't hold SRCU locks for too long, it should be safe to count all the items on the freelists against number to scan - even if we can't reclaim them yet, we will be able to soon. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>