aboutsummaryrefslogtreecommitdiffstats
path: root/block (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2012-06-15scsi: Silence unnecessary warnings about ioctl to partitionJan Kara1-1/+4
Sometimes, warnings about ioctls to partition happen often enough that they form majority of the warnings in the kernel log and users complain. In some cases warnings are about ioctls such as SG_IO so it's not good to get rid of the warnings completely as they can ease debugging of userspace problems when ioctl is refused. Since I have seen warnings from lots of commands, including some proprietary userspace applications, I don't think disallowing the ioctls for processes with CAP_SYS_RAWIO will happen in the near future if ever. So lets just stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed. CC: Paolo Bonzini <pbonzini@redhat.com> CC: James Bottomley <JBottomley@parallels.com> CC: linux-scsi@vger.kernel.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-15block: Drop dead function blk_abort_queue()Asias He1-41/+0
This function was only used by btrfs code in btrfs_abort_devices() (seems in a wrong way). It was removed in commit d07eb9117050c9ed3f78296ebcc06128b52693be, So, Let's remove the dead code to avoid any confusion. Changes in v2: update commit log, btrfs_abort_devices() was removed already. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-kernel@vger.kernel.org Cc: Chris Mason <chris.mason@oracle.com> Cc: linux-btrfs@vger.kernel.org Cc: David Sterba <dave@jikos.cz> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-15block: Mitigate lock unbalance caused by lock switchingAsias He1-5/+5
Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally supplied queue_lock before blk_drain_queue(). Switching the lock would introduce lock unbalance because theads which have taken the external lock might unlock the internal lock in the during the queue drain. This patch mitigate this by disconnecting the lock after the queue draining since queue draining makes a lot of request_queue users go away. However, please note, this patch only makes the problem less likely to happen. Anyone who still holds a ref might try to issue a new request on a dead queue after the blk_cleanup_queue() finishes draining, the lock unbalance might still happen in this case. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0+ #288 Not tainted ------------------------------------- fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at: [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/17706: #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>] get_request_wait+0x19a/0x250 stack backtrace: Pid: 17706, comm: fio Not tainted 3.4.0+ #288 Call Trace: [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380 [<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100 [<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330 [<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0 [<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0 [<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0 [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380 [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380 [<ffffffff810e0079>] lock_release+0xd9/0x250 [<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40 [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380 [<ffffffff81328faa>] generic_make_request+0xca/0x100 [<ffffffff81329056>] submit_bio+0x76/0xf0 [<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60 [<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70 [<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0 [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80 [<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60 [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80 [<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60 [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80 [<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760 [<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0 [<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0 [<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0 [<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0 [<ffffffff811e8250>] ? aio_fsync+0x30/0x30 [<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0 [<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80 [<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff811eae50>] sys_io_submit+0x10/0x20 [<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b Changes since v2: Update commit log to explain how the code is still broken even if we delay the lock switching after the drain. Changes since v1: Update commit log as Tejun suggested. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-15block: Avoid missed wakeup in request waitqueueAsias He1-1/+14
After hot-unplug a stressed disk, I found that rl->wait[] is not empty while rl->count[] is empty and there are theads still sleeping on get_request after the queue cleanup. With simple debug code, I found there are exactly nr_sleep - nr_wakeup of theads in D state. So there are missed wakeup. $ dmesg | grep nr_sleep [ 52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173 $ vmstat 1 1 173 0 712640 24292 96172 0 0 0 0 419 757 0 0 0 100 0 To quote Tejun: Ah, okay, freed_request() wakes up single waiter with the assumption that after the wakeup there will at least be one successful allocation which in turn will continue the wakeup chain until the wait list is empty - ie. waiter wakeup is dependent on successful request allocation happening after each wakeup. With queue marked dead, any woken up waiter fails the allocation path, so the wakeup chaining is lost and we're left with hung waiters. What we need is wake_up_all() after drain completion. This patch fixes the missed wakeup by waking up all the theads which are sleeping on wait queue after queue drain. Changes in v2: Drop waitqueue_active() optimization Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Asias He <asias@redhat.com> Fixed a bug by me, where stacked devices would oops on calling blk_drain_queue() since ->rq.wait[] do not get initialized unless it's a full queue setup. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-06blkcg: drop local variable @q from blkg_destroy()Tejun Heo1-2/+1
blkg_destroy() caches @blkg->q in local variable @q. While there are two places which needs @blkg->q, only lockdep_assert_held() used the local variable leading to unused local variable warning if lockdep is configured out. Drop the local variable and just use @blkg->q directly. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rakesh Iyer <rni@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-04blkcg: fix blkg_alloc() failure pathTejun Heo1-5/+1
When policy data allocation fails in the middle, blkg_alloc() invokes blkg_free() to destroy the half constructed blkg. This ends up calling pd_exit_fn() on policy datas which didn't go through pd_init_fn(). Fix it by making blkg_alloc() call pd_init_fn() immediately after each policy data allocation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-04block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHEDTejun Heo1-12/+17
cfq may be built w/ or w/o blkcg support depending on CONFIG_CFQ_CGROUP_IOSCHED. If blkcg support is disabled, most of related code is ifdef'd out but some part is left dangling - blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register() calls are made on it. Feeding zero filled policy to blkcg_policy_register() is incorrect and triggers the following WARN_ON() if CONFIG_BLK_CGROUP && !CONFIG_CFQ_GROUP_IOSCHED. ------------[ cut here ]------------ WARNING: at block/blk-cgroup.c:867 Modules linked in: Modules linked in: CPU: 3 Not tainted 3.4.0-09547-gfb21aff #1 Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8) Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000 000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048 000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8 00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70 Krnl Code: 00000000003d76be: a7280001 lhi %r2,1 00000000003d76c2: a7f4ffdf brc 15,3d7680 #00000000003d76c6: a7f40001 brc 15,3d76c8 >00000000003d76ca: a7c8ffea lhi %r12,-22 00000000003d76ce: a7f4ffce brc 15,3d766a 00000000003d76d2: a7f40001 brc 15,3d76d4 00000000003d76d6: a7c80000 lhi %r12,0 00000000003d76da: a7f4ffc2 brc 15,3d765e Call Trace: ([<0000000000b6f000>] initcall_debug+0x0/0x4) [<0000000000989e8a>] cfq_init+0x62/0xd4 [<00000000001000ba>] do_one_initcall+0x3a/0x170 [<000000000096fb60>] kernel_init+0x214/0x2bc [<0000000000623202>] kernel_thread_starter+0x6/0xc [<00000000006231fc>] kernel_thread_starter+0x0/0xc no locks held by swapper/0/1. Last Breaking-Event-Address: [<00000000003d76c6>] blkcg_policy_register+0xc6/0xe0 ---[ end trace b8ef4903fcbf9dd3 ]--- This patch fixes the problem by ensuring all blkcg support code is inside CONFIG_CFQ_GROUP_IOSCHED. * blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved inside the first CONFIG_CFQ_GROUP_IOSCHED block. __maybe_unused is dropped from blkcg_policy_cfq decl. * blkcg_deactivate_poilcy() invocation is moved inside ifdef. This also makes the activation logic match cfq_init_queue(). * All blkcg_policy_[un]register() invocations are moved inside ifdef. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <20120601112954.GC3535@osiris.boeblingen.de.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-04block: fix return value on cfq_init() failureTejun Heo1-0/+1
cfq_init() would return zero after kmem cache creation failure. Fix so that it returns -ENOMEM. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-31block: avoid infinite loop in get_task_io_context()Eric Dumazet1-1/+5
Calling get_task_io_context() on a exiting task which isn't %current can loop forever. This triggers at boot time on my dev machine. BUG: soft lockup - CPU#3 stuck for 22s ! [mountall.1603] Fix this by making create_task_io_context() returns -EBUSY in this case to break the loop. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alan Cox <alan@linux.intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-23blkcg: tg_stats_alloc_lock is an irq lockTejun Heo1-4/+6
tg_stats_alloc_lock nests inside queue lock and should always be held with irq disabled. throtl_pd_{init|exit}() were using non-irqsafe spinlock ops which triggered inverse lock ordering via irq warning via RCU freeing of blkg invoking throtl_pd_exit() w/o disabling IRQ. Update both functions to use irq safe operations. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> LKML-Reference: <1335339396.16988.80.camel@lappy> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-16s390/dasd: re-prioritize partition detection messageStefan Haberland1-1/+1
To avoid confusion while formatting a DASD device change the level of the "Expected VOL1 label not found" message from warning to info. Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-15block: fix buffer overflow when printing partition UUIDsTejun Heo1-4/+6
6d1d8050b4bc8 "block, partition: add partition_meta_info to hd_struct" added part_unpack_uuid() which assumes that the passed in buffer has enough space for sprintfing "%pU" - 37 characters including '\0'. Unfortunately, b5af921ec0233 "init: add support for root devices specified by partition UUID" supplied 33 bytes buffer to the function leading to the following panic with stackprotector enabled. Kernel panic - not syncing: stack-protector: Kernel stack corrupted in: ffffffff81b14c7e [<ffffffff815e226b>] panic+0xba/0x1c6 [<ffffffff81b14c7e>] ? printk_all_partitions+0x259/0x26xb [<ffffffff810566bb>] __stack_chk_fail+0x1b/0x20 [<ffffffff81b15c7e>] printk_all_paritions+0x259/0x26xb [<ffffffff81aedfe0>] mount_block_root+0x1bc/0x27f [<ffffffff81aee0fa>] mount_root+0x57/0x5b [<ffffffff81aee23b>] prepare_namespace+0x13d/0x176 [<ffffffff8107eec0>] ? release_tgcred.isra.4+0x330/0x30 [<ffffffff81aedd60>] kernel_init+0x155/0x15a [<ffffffff81087b97>] ? schedule_tail+0x27/0xb0 [<ffffffff815f4d24>] kernel_thread_helper+0x5/0x10 [<ffffffff81aedc0b>] ? start_kernel+0x3c5/0x3c5 [<ffffffff815f4d20>] ? gs_change+0x13/0x13 Increase the buffer size, remove the dangerous part_unpack_uuid() and use snprintf() directly from printk_all_partitions(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Szymon Gruszczynski <sz.gruszczynski@googlemail.com> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: use radix tree to index blkgs from blkcgTejun Heo2-8/+50
blkg lookup is currently performed by traversing linked list anchored at blkcg->blkg_list. This is very unscalable and with blk-throttle enabled and enough request queues on the system, this can get very ugly quickly (blk-throttle performs look up on every bio submission). This patch makes blkcg use radix tree to index blkgs combined with simple last-looked-up hint. This is mostly identical to how icqs are indexed from ioc. Note that because __blkg_lookup() may be invoked without holding queue lock, hint is only updated from __blkg_lookup_create(). Due to cfq's cfqq caching, this makes hint updates overly lazy. This will be improved with scheduled blkcg aware request allocation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: fix blkcg->css ref leak in __blkg_lookup_create()Tejun Heo1-10/+9
__blkg_lookup_create() leaked blkcg->css ref if blkg allocation failed. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20block: fix elvpriv allocation failure handlingTejun Heo1-17/+36
Request allocation is mempool backed to guarantee forward progress under memory pressure; unfortunately, this property got broken while adding elvpriv data. Failures during elvpriv allocation, including ioc and icq creation failures, currently make get_request() fail as whole. There's no forward progress guarantee for these allocations - they may fail indefinitely under memory pressure stalling IO and deadlocking the system. This patch updates get_request() such that elvpriv allocation failure doesn't make the whole function fail. If elvpriv allocation fails, the allocation is degraded into !ELVPRIV. This will force the request to ELEVATOR_INSERT_BACK disturbing scheduling but elvpriv alloc failures should be rare (nothing is per-request) and anything is better than deadlocking. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20block: collapse blk_alloc_request() into get_request()Tejun Heo1-29/+17
Allocation failure handling in get_request() is about to be updated. To ease the update, collapse blk_alloc_request() into get_request(). This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: collapse blkcg_policy_ops into blkcg_policyTejun Heo4-28/+24
There's no reason to keep blkcg_policy_ops separate. Collapse it into blkcg_policy. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: embed struct blkg_policy_data in policy specific dataTejun Heo4-83/+112
Currently blkg_policy_data carries policy specific data as char flex array instead of being embedded in policy specific data. This was forced by oddities around blkg allocation which are all gone now. This patch makes blkg_policy_data embedded in policy specific data - throtl_grp and cfq_group so that it's more conventional and consistent with how io_cq is handled. * blkcg_policy->pdata_size is renamed to ->pd_size. * Functions which used to take void *pdata now takes struct blkg_policy_data *pd. * blkg_to_pdata/pdata_to_blkg() updated to blkg_to_pd/pd_to_blkg(). * Dummy struct blkg_policy_data definition added. Dummy pdata_to_blkg() definition was unused and inconsistent with the non-dummy version - correct dummy pd_to_blkg() added. * throtl and cfq updated accordingly. * As dummy blkg_to_pd/pd_to_blkg() are provided, blkg_to_cfqg/cfqg_to_blkg() don't need to be ifdef'd. Moved outside ifdef block. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: mass rename of blkcg APITejun Heo4-233/+228
During the recent blkcg cleanup, most of blkcg API has changed to such extent that mass renaming wouldn't cause any noticeable pain. Take the chance and cleanup the naming. * Rename blkio_cgroup to blkcg. * Drop blkio / blkiocg prefixes and consistently use blkcg. * Rename blkio_group to blkcg_gq, which is consistent with io_cq but keep the blkg prefix / variable name. * Rename policy method type and field names to signify they're dealing with policy data. * Rename blkio_policy_type to blkcg_policy. This patch doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: style cleanups for blk-cgroup.hTejun Heo1-56/+52
* Update indentation on struct field declarations. * Uniformly don't use "extern" on function declarations. * Merge the two #ifdef CONFIG_BLK_CGROUP blocks. All changes in this patch are cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: remove blkio_group->path[]Tejun Heo4-15/+37
blkio_group->path[] stores the path of the associated cgroup and is used only for debug messages. Just format the path from blkg->cgroup when printing debug messages. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: blkg_rwstat_read() was missing inlineTejun Heo1-1/+1
blkg_rwstat_read() in blk-cgroup.h was missing inline modifier causing compile warning depending on configuration. Add it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: shoot down blkgs if all policies are deactivatedTejun Heo1-3/+8
There's no reason to keep blkgs around if no policy is activated for the queue. This patch moves queue locking out of blkg_destroy_all() and call it from blkg_deactivate_policy() on deactivation of the last policy on the queue. This change was suggested by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: drop stuff unused after per-queue policy activation updateTejun Heo4-48/+23
* All_q_list is unused. Drop all_q_{mutex|list}. * @for_root of blkg_lookup_create() is always %false when called from outside blk-cgroup.c proper. Factor out __blkg_lookup_create() so that it doesn't check whether @q is bypassing and use the underscored version for the @for_root callsite. * blkg_destroy_all() is used only from blkcg proper and @destroy_root is always %true. Make it static and drop @destroy_root. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: implement per-queue policy activationTejun Heo5-134/+200
All blkcg policies were assumed to be enabled on all request_queues. Due to various implementation obstacles, during the recent blkcg core updates, this was temporarily implemented as shooting down all !root blkgs on elevator switch and policy [de]registration combined with half-broken in-place root blkg updates. In addition to being buggy and racy, this meant losing all blkcg configurations across those events. Now that blkcg is cleaned up enough, this patch replaces the temporary implementation with proper per-queue policy activation. Each blkcg policy should call the new blkcg_[de]activate_policy() to enable and disable the policy on a specific queue. blkcg_activate_policy() allocates and installs policy data for the policy for all existing blkgs. blkcg_deactivate_policy() does the reverse. If a policy is not enabled for a given queue, blkg printing / config functions skip the respective blkg for the queue. blkcg_activate_policy() also takes care of root blkg creation, and cfq_init_queue() and blk_throtl_init() are updated accordingly. This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd() unnecessary. Dropped. v2: cfq_init_queue() was returning uninitialized @ret on root_group alloc failure if !CONFIG_CFQ_GROUP_IOSCHED. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: add request_queue->root_blkgTejun Heo2-7/+13
With per-queue policy activation, root blkg creation will be moved to blkcg core. Add q->root_blkg in preparation. For blk-throtl, this replaces throtl_data->root_tg; however, cfq needs to keep cfqd->root_group for !CONFIG_CFQ_GROUP_IOSCHED. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: make request_queue bypassing on allocationTejun Heo1-12/+25
With the previous change to guarantee bypass visiblity for RCU read lock regions, entering bypass mode involves non-trivial overhead and future changes are scheduled to make use of bypass mode during init path. Combined it may end up adding noticeable delay during boot. This patch makes request_queue start its life in bypass mode, which is ended on queue init completion at the end of blk_init_allocated_queue(), and updates blk_queue_bypass_start() such that draining and RCU synchronization are performed only when the queue actually enters bypass mode. This avoids unnecessarily switching in and out of bypass mode during init avoiding the overhead and any nasty surprises which may step from leaving bypass mode on half-initialized queues. The boot time overhead was pointed out by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: make sure blkg_lookup() returns %NULL if @q is bypassingTejun Heo2-19/+46
Currently, blkg_lookup() doesn't check @q bypass state. This patch updates blk_queue_bypass_start() to do synchronize_rcu() before returning and updates blkg_lookup() to check blk_queue_bypass() and return %NULL if bypassing. This ensures blkg_lookup() returns %NULL if @q is bypassing. This is to guarantee that nobody is accessing policy data while @q is bypassing, which is necessary to allow replacing blkio_cgroup->pd[] in place on policy [de]activation. v2: Added more comments explaining bypass guarantees as suggested by Vivek. v3: Added more comments explaining why there's no synchronize_rcu() in blk_cleanup_queue() as suggested by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: make blkg_conf_prep() take @pol and return with queue lock heldTejun Heo4-10/+14
Add @pol to blkg_conf_prep() and let it return with queue lock held (to be released by blkg_conf_finish()). Note that @pol isn't used yet. This is to prepare for per-queue policy activation and doesn't cause any visible difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: remove static policy ID enumsTejun Heo4-40/+63
Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate @pol->plid dynamically on registration. The maximum number of blkcg policies which can be registered at the same time is defined by BLKCG_MAX_POLS constant added to include/linux/blkdev.h. Note that blkio_policy_register() now may fail. Policy init functions updated accordingly and unnecessary ifdefs removed from cfq_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs()Tejun Heo4-21/+23
The two functions were taking "enum blkio_policy_id plid". Make them take "const struct blkio_policy_type *pol" instead. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: kill blkio_list and replace blkio_list_lock with a mutexTejun Heo2-16/+17
With blkio_policy[], blkio_list is redundant and hinders with per-queue policy activation. Remove it. Also, replace blkio_list_lock with a mutex blkcg_pol_mutex and let it protect the whole [un]registration. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20cfq: fix build breakage & warningsTejun Heo2-11/+10
* CFQ_WEIGHT_* defined inside CONFIG_BLK_CGROUP causes cfq-iosched.c compile failure when the config is disabled. Move it outside the ifdef block. * Dummy cfqg_stats_*() definitions were lacking inline modifiers causing unused functions warning if !CONFIG_CFQ_GROUP_IOSCHED. Add them. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-06block: make auto block plug flush threshold per-disk basedShaohua Li1-1/+2
We do auto block plug flush to reduce latency, the threshold is 16 requests. This works well if the task is accessing one or two drives. The problem is if the task is accessing a raid 0 device and the raid disk number is big, say 8 or 16, 16/8 = 2 or 16/16=1, we will have heavy lock contention. This patch makes the threshold per-disk based. The latency should be still ok accessing one or two drives. The setup with application accessing a lot of drives in the meantime uaually is big machine, avoiding lock contention is more important, because any contention will actually increase latency. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-01blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macrosTejun Heo4-84/+72
Now that all stat handling code lives in policy implementations, there's no need to encode policy ID in cft->private. * Export blkcg_prfill_[rw]stat() from blkcg, remove blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which use hard-code BLKIO_POLICY_PROP. * Use cft->private for offset of the target field directly and drop BLKCG_STAT_{PRIV|POL|OFF}(). Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: pass around pd->pdata instead of pd itself in prfill functionsTejun Heo4-41/+33
Now that all conf and stat fields are moved into policy specific blkio_policy_data->pdata areas, there's no reason to use blkio_policy_data itself in prfill functions. Pass around @pd->pdata instead of @pd. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_conf->iops and ->bps to blk-throttleTejun Heo2-103/+58
blkio_cgroup_conf->iops and ->bps are owned by blk-throttle and has no reason to be defined in blkcg core. Drop them and let conf setting functions directly manipulate throtl_grp->bps[] and ->iops[]. This makes blkio_group_conf empty. Drop it. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_conf->weight to cfqTejun Heo3-50/+45
blkio_group_conf->weight is owned by cfq and has no reason to be defined in blkcg core. Replace it with cfq_group->dev_weight and let conf setting functions directly set it. If dev_weight is zero, the cfqg doesn't have device specific weight configured. Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename blkio_cgroup->weight to blkio_cgroup->cfq_weight. We eventually want per-policy storage in blkio_cgroup but just mark the ownership of the field for now. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_stats_cpu and friends to blk-throttle.cTejun Heo3-125/+114
blkio_group_stats_cpu is used only by blk-throtl and has no reason to be defined in blkcg core. * Move blkio_group_stats_cpu to blk-throttle.c and rename it to tg_stats_cpu. * blkg_policy_data->stats_cpu is replaced with throtl_grp->stats_cpu. prfill functions updated accordingly. * All related macros / functions are renamed so that they have tg_ prefix and the unnecessary @pol arguments are dropped. * Per-cpu stats allocation code is also moved from blk-cgroup.c to blk-throttle.c and gets simplified to only deal with BLKIO_POLICY_THROTL. percpu stat free is performed by the exit method throtl_exit_blkio_group(). * throtl_reset_group_stats() implemented for blkio_reset_group_stats_fn method so that tg->stats_cpu can be reset. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_stats to cfq-iosched.cTejun Heo3-278/+193
blkio_group_stats contains only fields used by cfq and has no reason to be defined in blkcg core. * Move blkio_group_stats to cfq-iosched.c and rename it to cfqg_stats. * blkg_policy_data->stats is replaced with cfq_group->stats. blkg_prfill_[rw]stat() are updated to use offset against pd->pdata instead. * All related macros / functions are renamed so that they have cfqg_ prefix and the unnecessary @pol arguments are dropped. * All stat functions now take cfq_group * instead of blkio_group *. * lockdep assertion on queue lock dropped. Elevator runs under queue lock by default. There isn't much to be gained by adding lockdep assertions at stat function level. * cfqg_stats_reset() implemented for blkio_reset_group_stats_fn method so that cfqg->stats can be reset. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: add blkio_policy_ops operations for exit and stat resetTejun Heo2-4/+16
Add blkio_policy_ops->blkio_exit_group_fn() and ->blkio_reset_group_stats_fn(). These will be used to further modularize blkcg policy implementation. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: cfq doesn't need per-cpu dispatch statsTejun Heo4-95/+48
blkio_group_stats_cpu is used to count dispatch stats using per-cpu counters. This is used by both blk-throtl and cfq-iosched but the sharing is rather silly. * cfq-iosched doesn't need per-cpu dispatch stats. cfq always updates those stats while holding queue_lock. * blk-throtl needs per-cpu dispatch stats but only service_bytes and serviced. It doesn't make use of sectors. This patch makes cfq add and use global stats for service_bytes, serviced and sectors, removes per-cpu sectors counter and moves per-cpu stat printing code to blk-throttle.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move statistics update code to policiesTejun Heo4-397/+259
As with conf/stats file handling code, there's no reason for stat update code to live in blkcg core with policies calling into update them. The current organization is both inflexible and complex. This patch moves stat update code to specific policies. All blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP stats are collapsed into their cfq_blkiocg_update_*_stats() counterparts. blkiocg_update_dispatch_stats() is used by both policies and duplicated as throtl_update_dispatch_stats() and cfq_blkiocg_update_dispatch_stats(). This will be cleaned up later. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01cfq: collapse cfq.h into cfq-iosched.cTejun Heo2-119/+113
block/cfq.h contains some functions which interact with blkcg; however, this is only part of it and cfq-iosched.c already has quite some #ifdef CONFIG_CFQ_GROUP_IOSCHED. With conf/stat handling being moved to specific policies, having these relay functions isolated in cfq.h doesn't make much sense. Collapse cfq.h into cfq-iosched.c for now. Let's split blkcg support properly later if necessary. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move conf/stat file handling code to policiesTejun Heo4-420/+333
blkcg conf/stat handling is convoluted in that details which belong to specific policy implementations are all out in blkcg core and then policies hook into core layer to access and manipulate confs and stats. This sadly achieves both inflexibility (confs/stats can't be modified without messing with blkcg core) and complexity (all the call-ins and call-backs). The previous patches restructured conf and stat handling code such that they can be separated out. This patch relocates the file handling part. All conf/stat file handling code which belongs to BLKIO_POLICY_PROP is moved to cfq-iosched.c and all BKLIO_POLICY_THROTL code to blk-throtl.c. The move is verbatim except for blkio_update_group_{weight|bps|iops}() callbacks which relays conf changes to policies. The configuration settings are handled in policies themselves so the relaying isn't necessary. Conf setting functions are modified to directly call per-policy update functions and the relaying mechanism is dropped. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: implement blkio_policy_type->cftypesTejun Heo2-0/+7
Add blkiop->cftypes which is added and removed together with the policy. This will be used to move conf/stat handling to the policies. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: export conf/stat helpers to prepare for reorganizationTejun Heo2-27/+52
conf/stat handling is about to be moved to policy implementation from blkcg core. Export conf/stat helpers from blkcg core so that blk-throttle and cfq-iosched can use them. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: simplify blkg_conf_prep()Tejun Heo1-54/+10
blkg_conf_prep() implements "MAJ:MIN VAL" parsing manually, which is unnecessary. Just use sscanf("%u:%u %llu"). This might not reject some malformed input (extra input at the end) but we don't care. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: restructure blkio_group configruation settingTejun Heo2-140/+147
As part of userland interface restructuring, this patch updates per-blkio_group configuration setting. Instead of funneling everything through a master function which has hard-coded cases for each config file it may handle, the common part is factored into blkg_conf_prep() and blkg_conf_finish() and different configuration setters are implemented using the helpers. While this doesn't result in immediate LOC reduction, this enables further cleanups and more modular implementation. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: restructure configuration printingTejun Heo2-104/+55
Similarly to the previous stat restructuring, this patch restructures conf printing code such that, * Conf printing uses the same helpers as stat. * Printing function doesn't require hardcoded switching on the config being printed. Note that this isn't complete yet for throttle confs. The next patch will convert setting for these confs and will complete the transition. * Printing uses read_seq_string callback (other methods will be phased out). Note that blkio_group_conf.iops[2] is changed to u64 so that they can be manipulated with the same functions. This is transitional and will go away later. After this patch, per-device configurations - weight, bps and iops - use __blkg_prfill_u64() for printing which uses white space as delimiter instead of tab. Signed-off-by: Tejun Heo <tj@kernel.org>