<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/drivers/md/bcache, branch linus/master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/drivers/md/bcache?h=linus%2Fmaster</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/drivers/md/bcache?h=linus%2Fmaster'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-06-03T17:25:56Z</updated>
<entry>
<title>Merge tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block</title>
<updated>2022-06-03T17:25:56Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-06-03T17:25:56Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=78c6499c92090d0fd1ddd1684fc3a5dc41d98c92'/>
<id>urn:sha1:78c6499c92090d0fd1ddd1684fc3a5dc41d98c92</id>
<content type='text'>
Pull more block driver updates from Jens Axboe:
 "A collection of stragglers that were late on sending in their changes
  and just followup fixes.

   - NVMe fixes pull request via Christoph:
       - set controller enable bit in a separate write (Niklas Cassel)
       - disable namespace identifiers for the MAXIO MAP1001 (Christoph)
       - fix a comment typo (Julia Lawall)"

   - MD fixes pull request via Song:
       - Remove uses of bdevname (Christoph Hellwig)
       - Bug fixes (Guoqing Jiang, and Xiao Ni)

   - bcache fixes series (Coly)

   - null_blk zoned write fix (Damien)

   - nbd fixes (Yu, Zhang)

   - Fix for loop partition scanning (Christoph)"

* tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block: (23 commits)
  block: null_blk: Fix null_zone_write()
  nvmet: fix typo in comment
  nvme: set controller enable bit in a separate write
  nvme-pci: disable namespace identifiers for the MAXIO MAP1001
  bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
  nbd: use pr_err to output error message
  nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
  nbd: fix io hung while disconnecting device
  nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
  nbd: fix race between nbd_alloc_config() and module removal
  nbd: call genl_unregister_family() first in nbd_cleanup()
  md: bcache: check the return value of kzalloc() in detached_dev_do_request()
  bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()
  block, loop: support partitions without scanning
  bcache: avoid journal no-space deadlock by reserving 1 journal bucket
  bcache: remove incremental dirty sector counting for bch_sectors_dirty_init()
  bcache: improve multithreaded bch_sectors_dirty_init()
  bcache: improve multithreaded bch_btree_check()
  md: fix double free of io_acct_set bioset
  md: Don't set mddev private to NULL in raid0 pers-&gt;free
  ...
</content>
</entry>
<entry>
<title>bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()</title>
<updated>2022-05-28T12:48:26Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2022-05-28T12:45:50Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=a1a2d8f0162b27e85e7ce0ae6a35c96a490e0559'/>
<id>urn:sha1:a1a2d8f0162b27e85e7ce0ae6a35c96a490e0559</id>
<content type='text'>
The kworker routine update_writeback_rate() is schedued to update the
writeback rate in every 5 seconds by default. Before calling
__update_writeback_rate() to do real job, semaphore dc-&gt;writeback_lock
should be held by the kworker routine.

At the same time, bcache writeback thread routine bch_writeback_thread()
also needs to hold dc-&gt;writeback_lock before flushing dirty data back
into the backing device. If the dirty data set is large, it might be
very long time for bch_writeback_thread() to scan all dirty buckets and
releases dc-&gt;writeback_lock. In such case update_writeback_rate() can be
starved for long enough time so that kernel reports a soft lockup warn-
ing started like:
  watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713]

Such soft lockup condition is unnecessary, because after the writeback
thread finishes its job and releases dc-&gt;writeback_lock, the kworker
update_writeback_rate() may continue to work and everything is fine
indeed.

This patch avoids the unnecessary soft lockup by the following method,
- Add new member to struct cached_dev
  - dc-&gt;rate_update_retry (0 by default)
- In update_writeback_rate() call down_read_trylock(&amp;dc-&gt;writeback_lock)
  firstly, if it fails then lock contention happens.
- If dc-&gt;rate_update_retry &lt;= BCH_WBRATE_UPDATE_MAX_SKIPS (15), doesn't
  acquire the lock and reschedules the kworker for next try.
- If dc-&gt;rate_update_retry &gt; BCH_WBRATE_UPDATE_MAX_SKIPS, no retry
  anymore and call down_read(&amp;dc-&gt;writeback_lock) to wait for the lock.

By the above method, at worst case update_writeback_rate() may retry for
1+ minutes before blocking on dc-&gt;writeback_lock by calling down_read().
For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft
lockup warning message can be avoided.

When retrying to acquire dc-&gt;writeback_lock in update_writeback_rate(),
of course the writeback rate cannot be updated. It is fair, because when
the kworker is blocked on the lock contention of dc-&gt;writeback_lock, the
writeback rate cannot be updated neither.

This change follows Jens Axboe's suggestion to a more clear and simple
version.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20220528124550.32834-2-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>md: bcache: check the return value of kzalloc() in detached_dev_do_request()</title>
<updated>2022-05-27T15:49:48Z</updated>
<author>
<name>Jia-Ju Bai</name>
<email>baijiaju1990@gmail.com</email>
</author>
<published>2022-05-27T15:28:18Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=40f567bbb3b0639d2ec7d1c6ad4b1b018f80cf19'/>
<id>urn:sha1:40f567bbb3b0639d2ec7d1c6ad4b1b018f80cf19</id>
<content type='text'>
The function kzalloc() in detached_dev_do_request() can fail, so its
return value should be checked.

Fixes: bc082a55d25c ("bcache: fix inaccurate io state for detached bcache devices")
Reported-by: TOTE Robot &lt;oslab@tsinghua.edu.cn&gt;
Signed-off-by: Jia-Ju Bai &lt;baijiaju1990@gmail.com&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20220527152818.27545-4-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()</title>
<updated>2022-05-27T15:49:48Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2022-05-27T15:28:16Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c'/>
<id>urn:sha1:7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c</id>
<content type='text'>
The local variables check_state (in bch_btree_check()) and state (in
bch_sectors_dirty_init()) should be fully filled by 0, because before
allocating them on stack, they were dynamically allocated by kzalloc().

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: avoid journal no-space deadlock by reserving 1 journal bucket</title>
<updated>2022-05-24T12:19:33Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2022-05-24T10:23:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=32feee36c30ea06e38ccb8ae6e5c44c6eec790a6'/>
<id>urn:sha1:32feee36c30ea06e38ccb8ae6e5c44c6eec790a6</id>
<content type='text'>
The journal no-space deadlock was reported time to time. Such deadlock
can happen in the following situation.

When all journal buckets are fully filled by active jset with heavy
write I/O load, the cache set registration (after a reboot) will load
all active jsets and inserting them into the btree again (which is
called journal replay). If a journaled bkey is inserted into a btree
node and results btree node split, new journal request might be
triggered. For example, the btree grows one more level after the node
split, then the root node record in cache device super block will be
upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no
space in journal buckets, the journal replay has to wait for new journal
bucket to be reclaimed after at least one journal bucket replayed. This
is one example that how the journal no-space deadlock happens.

The solution to avoid the deadlock is to reserve 1 journal bucket in
run time, and only permit the reserved journal bucket to be used during
cache set registration procedure for things like journal replay. Then
the journal space will never be fully filled, there is no chance for
journal no-space deadlock to happen anymore.

This patch adds a new member "bool do_reserve" in struct journal, it is
inititalized to 0 (false) when struct journal is allocated, and set to
1 (true) by bch_journal_space_reserve() when all initialization done in
run_cache_set(). In the run time when journal_reclaim() tries to
allocate a new journal bucket, free_journal_buckets() is called to check
whether there are enough free journal buckets to use. If there is only
1 free journal bucket and journal-&gt;do_reserve is 1 (true), the last
bucket is reserved and free_journal_buckets() will return 0 to indicate
no free journal bucket. Then journal_reclaim() will give up, and try
next time to see whetheer there is free journal bucket to allocate. By
this method, there is always 1 jouranl bucket reserved in run time.

During the cache set registration, journal-&gt;do_reserve is 0 (false), so
the reserved journal bucket can be used to avoid the no-space deadlock.

Reported-by: Nikhil Kshirsagar &lt;nkshirsagar@gmail.com&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-5-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: remove incremental dirty sector counting for bch_sectors_dirty_init()</title>
<updated>2022-05-24T12:19:33Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2022-05-24T10:23:35Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=80db4e4707e78cb22287da7d058d7274bd4cb370'/>
<id>urn:sha1:80db4e4707e78cb22287da7d058d7274bd4cb370</id>
<content type='text'>
After making bch_sectors_dirty_init() being multithreaded, the existing
incremental dirty sector counting in bch_root_node_dirty_init() doesn't
release btree occupation after iterating 500000 (INIT_KEYS_EACH_TIME)
bkeys. Because a read lock is added on btree root node to prevent the
btree to be split during the dirty sectors counting, other I/O requester
has no chance to gain the write lock even restart bcache_btree().

That is to say, the incremental dirty sectors counting is incompatible
to the multhreaded bch_sectors_dirty_init(). We have to choose one and
drop another one.

In my testing, with 512 bytes random writes, I generate 1.2T dirty data
and a btree with 400K nodes. With single thread and incremental dirty
sectors counting, it takes 30+ minites to register the backing device.
And with multithreaded dirty sectors counting, the backing device
registration can be accomplished within 2 minutes.

The 30+ minutes V.S. 2- minutes difference makes me decide to keep
multithreaded bch_sectors_dirty_init() and drop the incremental dirty
sectors counting. This is what this patch does.

But INIT_KEYS_EACH_TIME is kept, in sectors_dirty_init_fn() the CPU
will be released by cond_resched() after every INIT_KEYS_EACH_TIME keys
iterated. This is to avoid the watchdog reports a bogus soft lockup
warning.

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-4-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: improve multithreaded bch_sectors_dirty_init()</title>
<updated>2022-05-24T12:19:33Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2022-05-24T10:23:34Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=4dc34ae1b45fe26e772a44379f936c72623dd407'/>
<id>urn:sha1:4dc34ae1b45fe26e772a44379f936c72623dd407</id>
<content type='text'>
Commit b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be
multithreaded") makes bch_sectors_dirty_init() to be much faster
when counting dirty sectors by iterating all dirty keys in the btree.
But it isn't in ideal shape yet, still can be improved.

This patch does the following changes to improve current parallel dirty
keys iteration on the btree,
- Add read lock to root node when multiple threads iterating the btree,
  to prevent the root node gets split by I/Os from other registered
  bcache devices.
- Remove local variable "char name[32]" and generate kernel thread name
  string directly when calling kthread_run().
- Allocate "struct bch_dirty_init_state state" directly on stack and
  avoid the unnecessary dynamic memory allocation for it.
- Decrease BCH_DIRTY_INIT_THRD_MAX from 64 to 12 which is enough indeed.
- Increase &amp;state-&gt;started to count created kernel thread after it
  succeeds to create.
- When wait for all dirty key counting threads to finish, use
  wait_event() to replace wait_event_interruptible().

With the above changes, the code is more clear, and some potential error
conditions are avoided.

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-3-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>bcache: improve multithreaded bch_btree_check()</title>
<updated>2022-05-24T12:19:33Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2022-05-24T10:23:33Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=622536443b6731ec82c563aae7807165adbe9178'/>
<id>urn:sha1:622536443b6731ec82c563aae7807165adbe9178</id>
<content type='text'>
Commit 8e7102273f59 ("bcache: make bch_btree_check() to be
multithreaded") makes bch_btree_check() to be much faster when checking
all btree nodes during cache device registration. But it isn't in ideal
shap yet, still can be improved.

This patch does the following thing to improve current parallel btree
nodes check by multiple threads in bch_btree_check(),
- Add read lock to root node while checking all the btree nodes with
  multiple threads. Although currently it is not mandatory but it is
  good to have a read lock in code logic.
- Remove local variable 'char name[32]', and generate kernel thread name
  string directly when calling kthread_run().
- Allocate local variable "struct btree_check_state check_state" on the
  stack and avoid unnecessary dynamic memory allocation for it.
- Reduce BCH_BTR_CHKTHREAD_MAX from 64 to 12 which is enough indeed.
- Increase check_state-&gt;started to count created kernel thread after it
  succeeds to create.
- When wait for all checking kernel threads to finish, use wait_event()
  to replace wait_event_interruptible().

With this change, the code is more clear, and some potential error
conditions are avoided.

Fixes: 8e7102273f59 ("bcache: make bch_btree_check() to be multithreaded")
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220524102336.10684-2-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block</title>
<updated>2022-05-23T20:56:39Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-05-23T20:56:39Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=115cd47132d71bd7e4aa1093e15d861a59e73a94'/>
<id>urn:sha1:115cd47132d71bd7e4aa1093e15d861a59e73a94</id>
<content type='text'>
Pull block updates from Jens Axboe:
 "Here are the core block changes for 5.19. This contains:

   - blk-throttle accounting fix (Laibin)

   - Series removing redundant assignments (Michal)

   - Expose bio cache via the bio_set, so that DM can use it (Mike)

   - Finish off the bio allocation interface cleanups by dealing with
     the weirdest member of the family. bio_kmalloc combines a kmalloc
     for the bio and bio_vecs with a hidden bio_init call and magic
     cleanup semantics (Christoph)

   - Clean up the block layer API so that APIs consumed by file systems
     are (almost) only struct block_device based, so that file systems
     don't have to poke into block layer internals like the
     request_queue (Christoph)

   - Clean up the blk_execute_rq* API (Christoph)

   - Clean up various lose end in the blk-cgroup code to make it easier
     to follow in preparation of reworking the blkcg assignment for bios
     (Christoph)

   - Fix use-after-free issues in BFQ when processes with merged queues
     get moved to different cgroups (Jan)

   - BFQ fixes (Jan)

   - Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming,
     Wolfgang, me)"

* tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits)
  blk-mq: fix typo in comment
  bfq: Remove bfq_requeue_request_body()
  bfq: Remove superfluous conversion from RQ_BIC()
  bfq: Allow current waker to defend against a tentative one
  bfq: Relax waker detection for shared queues
  blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()
  blk-throttle: Set BIO_THROTTLED when bio has been throttled
  blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
  blk-cgroup: always terminate io.stat lines
  block, bfq: make bfq_has_work() more accurate
  block, bfq: protect 'bfqd-&gt;queued' by 'bfqd-&gt;lock'
  block: cleanup the VM accounting in submit_bio
  block: Fix the bio.bi_opf comment
  block: reorder the REQ_ flags
  blk-iocost: combine local_stat and desc_stat to stat
  block: improve the error message from bio_check_eod
  block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone
  block: remove superfluous calls to blkcg_bio_issue_init
  kthread: unexport kthread_blkcg
  blk-cgroup: cleanup blkcg_maybe_throttle_current
  ...
</content>
</entry>
<entry>
<title>bcache: fix wrong bdev parameter when calling bio_alloc_clone() in do_bio_hook()</title>
<updated>2022-04-19T17:28:17Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2022-04-19T16:04:25Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=9dca4168a37c9cfe182f077f0d2289292e9e3656'/>
<id>urn:sha1:9dca4168a37c9cfe182f077f0d2289292e9e3656</id>
<content type='text'>
Commit abfc426d1b2f ("block: pass a block_device to bio_clone_fast")
calls the modified bio_alloc_clone() in bcache code as:
	bio_init_clone(bio-&gt;bi_bdev, bio, orig_bio, GFP_NOIO);

But the first parameter is wrong, where bio-&gt;bi_bdev should be
orig_bio-&gt;bi_bdev. The wrong bi_bdev panics the kernel when submitting
cache bio.

This patch fixes the wrong bdev parameter usage and avoid the panic.

Fixes: abfc426d1b2f ("block: pass a block_device to bio_clone_fast")
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Link: https://lore.kernel.org/r/20220419160425.4148-3-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
