aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/include/linux/blk-cgroup.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-09-26blk-cgroup: pass a gendisk to blkcg_schedule_throttleChristoph Hellwig1-3/+2
Pass the gendisk to blkcg_schedule_throttle as part of moving the blk-cgroup infrastructure to be gendisk based. Remove the unused !BLK_CGROUP stub while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220921180501.1539876-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h>Christoph Hellwig1-10/+5
Remove all the includes that aren't actually needed from <linux/blk-cgroup.h> and push them to the actual source files where needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02blk-cgroup: remove pointless CONFIG_BLOCK ifdefsChristoph Hellwig1-4/+0
No need to make BLK_CGROUP stubs conditional on CONFIG_BLOCK as they can't be used without that. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02blk-cgroup: replace bio_blkcg with bio_blkcg_cssChristoph Hellwig1-78/+5
All callers of bio_blkcg actually want the CSS, so replace it with an interface that does return the CSS. This now allows to move struct blkcg_gq to block/blk-cgroup.h instead of exposing it in a public header. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02blk-cgroup: move struct blkcg to block/blk-cgroup.hChristoph Hellwig1-31/+1
There is no real need to expose the blkcg structure to the whole kernel. Move it to the private header an expose a helper to let the writeback code access the cgwb_list member. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02blk-cgroup: move blkcg_{pin,unpin}_online out of lineChristoph Hellwig1-44/+2
Move these two functions out of line as there is no good reason to inline them. Also switch to passing a cgroup_subsys_state instead of doing the conversion in the caller to prepare for making the blkcg structure private to blk-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02blk-cgroup: move blk_cgroup_congested out lineChristoph Hellwig1-19/+1
There is no urgent need to inline this function, so move it out of line. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02blk-cgroup: move blkcg_{get,set}_fc_appid out of lineChristoph Hellwig1-56/+2
No need to have these helpers inline. Also remove the stubs and just use an IS_ENABLED for the get side (the set side already is only built conditionlly). Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220420042723.1010598-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-22block: avoid calling blkg_free() in atomic contextMing Lei1-1/+4
blkg_free() can currently be called in atomic context, either spin lock is held, or run in rcu callback. Meantime either request queue's release handler or ->pd_free_fn can sleep. Fix the issue by scheduling a work function for freeing blkcg_gq the instance. [ 148.553894] BUG: sleeping function called from invalid context at block/blk-sysfs.c:767 [ 148.557381] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/13 [ 148.560741] preempt_count: 101, expected: 0 [ 148.562577] RCU nest depth: 0, expected: 0 [ 148.564379] 1 lock held by swapper/13/0: [ 148.566127] #0: ffffffff82615f80 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire+0x0/0x1b [ 148.569640] Preemption disabled at: [ 148.569642] [<ffffffff8123f9c3>] ___slab_alloc+0x554/0x661 [ 148.573559] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Not tainted 5.17.0_up+ #110 [ 148.576834] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014 [ 148.579768] Call Trace: [ 148.580567] <IRQ> [ 148.581262] dump_stack_lvl+0x56/0x7c [ 148.582367] ? ___slab_alloc+0x554/0x661 [ 148.583526] __might_resched+0x1af/0x1c8 [ 148.584678] blk_release_queue+0x24/0x109 [ 148.585861] kobject_cleanup+0xc9/0xfe [ 148.586979] blkg_free+0x46/0x63 [ 148.587962] rcu_do_batch+0x1c5/0x3db [ 148.589057] rcu_core+0x14a/0x184 [ 148.590065] __do_softirq+0x14d/0x2c7 [ 148.591167] __irq_exit_rcu+0x7a/0xd4 [ 148.592264] sysvec_apic_timer_interrupt+0x82/0xa5 [ 148.593649] </IRQ> [ 148.594354] <TASK> [ 148.595058] asm_sysvec_apic_timer_interrupt+0x12/0x20 Cc: Tejun Heo <tj@kernel.org> Fixes: 0a9a25ca7843 ("block: let blkcg_gq grab request queue's refcnt") Reported-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-block/20220322093322.GA27283@lst.de/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220323011308.2010380-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11block: partition include/linux/blk-cgroup.hMing Lei1-454/+5
Partition include/linux/blk-cgroup.h into two parts: one is public part, the other is block layer private part. Suggested by Christoph Hellwig. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220211101149.2368042-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11block: remove THROTL_IOPS_MAXMing Lei1-2/+0
No one uses THROTL_IOPS_MAX any more, so remove it. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220211101149.2368042-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16blk-cgroup: stop using seq_get_bufChristoph Hellwig1-2/+2
seq_get_buf is a crutch that undoes all the memory safety of the seq_file interface. Use the normal seq_printf interfaces instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20210810152623.1796144-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-10scsi: blkcg: Add app identifier support for blkcgMuneendra Kumar1-0/+63
Add a unique application identifier (i.e fc_app_id member) in blkcg. This allows identification of traffic belonging to an specific both on the host and in the fabric infrastructure. As an example, this allows the storage stack to uniquely identify traffic belong to particular virtual machine. Link: https://lore.kernel.org/r/20210608043556.274139-3-muneendra.kumar@broadcom.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-12-01block: simplify bdev/disk lookup in blkdev_getChristoph Hellwig1-2/+2
To simplify block device lookup and a few other upcoming areas, make sure that we always have a struct block_device available for each disk and each partition, and only find existing block devices in bdget. The only downside of this is that each device and partition uses a little more memory. The upside will be that a lot of code can be simplified. With that all we need to look up the block device is to lookup the inode and do a few sanity checks on the gendisk, instead of the separate lookup for the gendisk. For blk-cgroup which wants to access a gendisk without opening it, a new blkdev_{get,put}_no_open low-level interface is added to replace the previous get_gendisk use. Note that the change to look up block device directly instead of the two step lookup using struct gendisk causes a subtile change in behavior: accessing a non-existing partition on an existing block device can now cause a call to request_module. That call is harmless, and in practice no recent system will access these nodes as they aren't created by udev and static /dev/ setups are unusual. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08writeback: remove struct bdi_writeback_congestedChristoph Hellwig1-6/+0
We never set any congested bits in the group writeback instances of it. And for the simpler bdi-wide case a simple scalar field is all that that is needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-29blk-cgroup: remove blkcg_bio_issue_checkChristoph Hellwig1-54/+2
blkcg_bio_issue_check is a giant inline function that does three entirely different things. Factor out the blk-cgroup related bio initalization into a new helper, and the open code the sequence in the only caller, relying on the fact that all the actual functionality is stubbed out for non-cgroup builds. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-29blk-cgroup: move rcu locking from blkcg_bio_issue_check to blk_throtl_bioChristoph Hellwig1-2/+0
The only thing in blkcg_bio_issue_check that needs to be under rcu_read_lock is blk_throtl_bio, so move the locking there. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-29blk-cgroup: remove the !bio->bi_blkg check in blkcg_bio_issue_checkChristoph Hellwig1-14/+1
This is purely a sanity check for grave programming errors. Remove it to simplify further work in this area. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-29block: move the bio cgroup associatation helpers to blk-cgroup.cChristoph Hellwig1-30/+0
Keep the cgroup code together. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-30blk-iocost: switch to fixed non-auto-decaying use_delayTejun Heo1-10/+33
The use_delay mechanism was introduced by blk-iolatency to hold memory allocators accountable for the reclaim and other shared IOs they cause. The duration of the delay is dynamically balanced between iolatency increasing the value on each target miss and it auto-decaying as time passes and threads get delayed on it. While this works well for iolatency, iocost's control model isn't compatible with it. There is no repeated "violation" events which can be balanced against auto-decaying. iocost instead knows how much a given cgroup is over budget and wants to prevent that cgroup from issuing IOs while over budget. Until now, iocost has been adding the cost of force-issued IOs. However, this doesn't reflect the amount which is already over budget and is simply not enough to counter the auto-decaying allowing anon-memory leaking low priority cgroup to go over its alloted share of IOs. As auto-decaying doesn't make much sense for iocost, this patch introduces a different mode of operation for use_delay - when blkcg_set_delay() are used insted of blkcg_add/use_delay(), the delay duration is not auto-decayed until it is explicitly cleared with blkcg_clear_delay(). iocost is updated to keep the delay duration synchronized to the budget overage amount. With this change, iocost can effectively police cgroups which generate significant amount of force-issued IOs. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-29block: replace BIO_QUEUE_ENTERED with BIO_CGROUP_ACCTChristoph Hellwig1-4/+6
BIO_QUEUE_ENTERED is only used for cgroup accounting now, so rename the flag and move setting it into the cgroup code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-01blkcg: don't offline parent blkcg firstTejun Heo1-1/+5
blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs don't get offlined while there are active cgwbs on them. However, it ends up making offlining unordered sometimes causing parents to be offlined before children. Let's fix this by making child blkcgs pin the parents' online states. Note that pin/unpin names are chosen over get/put intentionally because css uses get/put online for something different. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-01blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use itTejun Heo1-26/+13
blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs don't get offlined while there are active cgwbs on them. However, it ends up making offlining unordered sometimes causing parents to be offlined before children. To fix it, we want child blkcgs to pin the parents' online states turning the refcnt into a more generic online pinning mechanism. In prepartion, * blkcg->cgwb_refcnt -> blkcg->online_pin * blkcg_cgwb_get/put() -> blkcg_pin/unpin_online() * Take them out of CONFIG_CGROUP_WRITEBACK Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-12blk-cgroup: remove blkcg_drain_queueGuoqing Jiang1-2/+0
Since blk_drain_queue had already been removed, so this function is not needed anymore. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-18blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1Tejun Heo1-1/+2
Currently, cgroup rstat is supported only on cgroup2 hierarchy and rstat functions shouldn't be called on cgroup1 cgroups. While converting blk-cgroup core statistics to rstat, f73316482977 ("blk-cgroup: reimplement basic IO stats using cgroup rstat") accidentally ended up calling cgroup_rstat_updated() on cgroup1 cgroups causing crashes. Longer term, we probably should add cgroup1 support to rstat but for now let's mask the call directly. Fixes: f73316482977 ("blk-cgroup: reimplement basic IO stats using cgroup rstat") Tested-by: Faiz Abbas <faiz_abbas@ti.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTATTejun Heo1-159/+0
blkg_rwstat is now only used by bfq-iosched and blk-throtl when on cgroup1. Let's move it into its own files and gate it behind a config option. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07blk-cgroup: reimplement basic IO stats using cgroup rstatTejun Heo1-5/+43
blk-cgroup has been using blkg_rwstat to track basic IO stats. Unfortunately, reading recursive stats scales badly as itinvolves walking all descendants. On systems with a huge number of cgroups (dead or alive), this can lead to substantial CPU cost when reading IO stats. This patch reimplements basic IO stats using cgroup rstat which uses more memory but makes recursive stat reading O(# descendants which have been active since last reading) instead of O(# descendants). * blk-cgroup core no longer uses sync/async stats. Introduce new stat enums - BLKG_IOSTAT_{READ|WRITE|DISCARD}. * Add blkg_iostat[_set] which encapsulates byte and io stats, last values for propagation delta calculation and u64_stats_sync for correctness on 32bit archs. * Update the new percpu stat counters directly and implement blkcg_rstat_flush() to implement propagation. * blkg_print_stat() can now bring the stats up to date by calling cgroup_rstat_flush() and print them instead of directly summing up all descendants. * It now allocates 96 bytes per cpu. It used to be 40 bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Dan Schatzberg <dschatzberg@fb.com> Cc: Daniel Xu <dlxu@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive()Tejun Heo1-5/+0
These don't have users anymore. Remove them. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28blkcg: separate blkcg_conf_get_disk() out of blkg_conf_prep()Tejun Heo1-0/+1
Separate out blkcg_conf_get_disk() so that it can be used by blkcg policy interface file input parsers before the policy is actually enabled. This doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-28blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()Tejun Heo1-1/+2
Instead of @node, pass in @q and @blkcg so that the alloc function has more context. This doesn't cause any behavior change and will be used by io.weight implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-04block: Fix spelling in the header above blkg_lookup()Bart Van Assche1-1/+1
See also commit 8f4236d9008b ("block: remove QUEUE_FLAG_BYPASS and ->bypass") # v5.0. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-16blkcg: allow blkcg_policy->pd_stat() to print non-debug info tooTejun Heo1-0/+1
Currently, ->pd_stat() is called only when moduleparam blkcg_debug_stats is set which prevents it from printing non-debug policy-specific statistics. Let's move debug testing down so that ->pd_stat() can print non-debug stat too. This patch doesn't cause any visible behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-10blkcg: implement REQ_CGROUP_PUNTTejun Heo1-1/+15
When a shared kthread needs to issue a bio for a cgroup, doing so synchronously can lead to priority inversions as the kthread can be trapped waiting for that cgroup. This patch implements REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing to a dedicated per-blkcg work item to avoid such priority inversions. This will be used to fix priority inversions in btrfs compression and should be generally useful as we grow filesystem support for comprehensive IO control. Cc: Chris Mason <clm@fb.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-20blk-cgroup: move struct blkg_stat to bfqChristoph Hellwig1-71/+0
This structure and assorted infrastructure is only used by the bfq I/O scheduler. Move it there instead of bloating the common code. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-20blk-cgroup: introduce a new struct blkg_rwstat_sampleChristoph Hellwig1-10/+12
When sampling the blkcg counts we don't need atomics or per-cpu variables. Introduce a new structure just containing plain u64 counters. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-20blk-cgroup: pass blkg_rwstat structures by referenceChristoph Hellwig1-7/+7
Returning a structure generates rather bad code, so switch to passing by reference. Also don't require the structure to be zeroed and add to the 0-initialized counters, but actually set the counters to the calculated value. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-20blk-cgroup: factor out a helper to read rwstat counterChristoph Hellwig1-0/+7
Trying to break up the crazy statements to something readable. Also switch to an unsigned counter as it can't ever turn negative. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-21blkcg: clean up blkg_tryget_closest()Dennis Zhou1-5/+16
The implementation of blkg_tryget_closest() wasn't super obvious and became a point of suspicion when debugging [1]. So let's clean it up so it's obviously not the problem. Also add missing RCU read locking to bio_clone_blkg_association(), which got exposed by adding the RCU read lock held check in blkg_tryget_closest(). [1] https://lore.kernel.org/linux-block/a7e97e4b-0dd8-3a54-23b7-a0f27b17fde8@kernel.dk/ Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-12blkcg: handle dying request_queue when associating a blkgDennis Zhou1-1/+1
Between v3 [1] and v4 [2] of the blkg association series, the association point moved from generic_make_request_checks(), which is called after the request enters the queue, to bio_set_dev(), which is when the bio is formed before submit_bio(). When the request_queue goes away, the blkgs supporting the request_queue are destroyed and then the q->root_blkg is set to %NULL. This patch adds a %NULL check to blkg_tryget_closest() to prevent the NPE caused by the above. It also adds a guard to see if the request_queue is dying when creating a blkg to prevent creating a blkg for a dead request_queue. [1] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/ [2] https://lore.kernel.org/lkml/20181126211946.77067-1-dennis@kernel.org/ Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device") Reported-and-tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: put back rcu lock in blkcg_bio_issue_check()Dennis Zhou1-0/+3
I was a little overzealous in removing the rcu_read_lock() call from blkcg_bio_issue_check() and it broke blk-throttle. Put it back. Fixes: e35403a034bf ("blkcg: associate blkg when associating a device") Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: rename blkg_try_get() to blkg_tryget()Dennis Zhou1-7/+5
blkg reference counting now uses percpu_ref rather than atomic_t. Let's make this consistent with css_tryget. This renames blkg_try_get to blkg_tryget and now returns a bool rather than the blkg or %NULL. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: change blkg reference counting to use percpu_refDennis Zhou1-10/+5
Every bio is now associated with a blkg putting blkg_get, blkg_try_get, and blkg_put on the hot path. Switch over the refcnt in blkg to use percpu_ref. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: remove additional reference to the cssDennis Zhou1-41/+0
The previous patch in this series removed carrying around a pointer to the css in blkg. However, the blkg association logic still relied on taking a reference on the css to ensure we wouldn't fail in getting a reference for the blkg. Here the implicit dependency on the css is removed. The association continues to rely on the tryget logic walking up the blkg tree. This streamlines the three ways that association can happen: normal, swap, and writeback. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: remove bio->bi_css and instead use bio->bi_blkgDennis Zhou1-4/+4
Prior patches ensured that any bio that interacts with a request_queue is properly associated with a blkg. This makes bio->bi_css unnecessary as blkg maintains a reference to blkcg already. This removes the bio field bi_css and transfers corresponding uses to access via bi_blkg. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: consolidate bio_issue_init() to be a part of coreDennis Zhou1-0/+9
bio_issue_init among other things initializes the timestamp for an IO. Rather than have this logic handled by policies, this consolidates it to be on the init paths (normal, clone, bounce clone). Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: associate blkg when associating a deviceDennis Zhou1-8/+10
Previously, blkg association was handled by controller specific code in blk-throttle and blk-iolatency. However, because a blkg represents a relationship between a blkcg and a request_queue, it makes sense to keep the blkg->q and bio->bi_disk->queue consistent. This patch moves association into the bio_set_dev macro(). This should cover the majority of cases where the device is set/changed keeping the two pointers consistent. Fallback code is added to blkcg_bio_issue_check() to catch any missing paths. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: convert blkg_lookup_create() to find closest blkgDennis Zhou1-9/+15
There are several scenarios where blkg_lookup_create() can fail such as the blkcg dying, request_queue is dying, or simply being OOM. Most handle this by simply falling back to the q->root_blkg and calling it a day. This patch implements the notion of closest blkg. During blkg_lookup_create(), if it fails to create, return the closest blkg found or the q->root_blkg. blkg_try_get_closest() is introduced and used during association so a bio is always attached to a blkg. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: update blkg_lookup_create() to do lockingDennis Zhou1-1/+3
To know when to create a blkg, the general pattern is to do a blkg_lookup() and if that fails, lock and do the lookup again, and if that fails finally create. It doesn't make much sense for everyone who wants to do creation to write this themselves. This changes blkg_lookup_create() to do locking and implement this pattern. The old blkg_lookup_create() is renamed to __blkg_lookup_create(). If a call site wants to do its own error handling or already owns the queue lock, they can use __blkg_lookup_create(). This will be used in upcoming patches. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-12-07blkcg: fix ref count issue with bio_blkcg() using task_cssDennis Zhou1-8/+90
The bio_blkcg() function turns out to be inconsistent and consequently dangerous to use. The first part returns a blkcg where a reference is owned by the bio meaning it does not need to be rcu protected. However, the third case, the last line, is problematic: return css_to_blkcg(task_css(current, io_cgrp_id)); This can race against task migration and the cgroup dying. It is also semantically different as it must be called rcu protected and is susceptible to failure when trying to get a reference to it. This patch adds association ahead of calling bio_blkcg() rather than after. This makes association a required and explicit step along the code paths for calling bio_blkcg(). In blk-iolatency, association is moved above the bio_blkcg() call to ensure it will not return %NULL. BFQ uses the old bio_blkcg() function, but I do not want to address it in this series due to the complexity. I have created a private version documenting the inconsistency and noting not to use it. Signed-off-by: Dennis Zhou <dennis@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-11-15block: remove the queue_lock indirectionChristoph Hellwig1-2/+2
With the legacy request path gone there is no good reason to keep queue_lock as a pointer, we can always use the embedded lock now. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Fixed floppy and blk-cgroup missing conversions and half done edits. Signed-off-by: Jens Axboe <axboe@kernel.dk>