aboutsummaryrefslogtreecommitdiffstats
path: root/block/blk-cgroup.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2012-04-20blkcg: implement per-queue policy activationTejun Heo1-74/+154
All blkcg policies were assumed to be enabled on all request_queues. Due to various implementation obstacles, during the recent blkcg core updates, this was temporarily implemented as shooting down all !root blkgs on elevator switch and policy [de]registration combined with half-broken in-place root blkg updates. In addition to being buggy and racy, this meant losing all blkcg configurations across those events. Now that blkcg is cleaned up enough, this patch replaces the temporary implementation with proper per-queue policy activation. Each blkcg policy should call the new blkcg_[de]activate_policy() to enable and disable the policy on a specific queue. blkcg_activate_policy() allocates and installs policy data for the policy for all existing blkgs. blkcg_deactivate_policy() does the reverse. If a policy is not enabled for a given queue, blkg printing / config functions skip the respective blkg for the queue. blkcg_activate_policy() also takes care of root blkg creation, and cfq_init_queue() and blk_throtl_init() are updated accordingly. This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd() unnecessary. Dropped. v2: cfq_init_queue() was returning uninitialized @ret on root_group alloc failure if !CONFIG_CFQ_GROUP_IOSCHED. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: add request_queue->root_blkgTejun Heo3-7/+15
With per-queue policy activation, root blkg creation will be moved to blkcg core. Add q->root_blkg in preparation. For blk-throtl, this replaces throtl_data->root_tg; however, cfq needs to keep cfqd->root_group for !CONFIG_CFQ_GROUP_IOSCHED. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: make request_queue bypassing on allocationTejun Heo1-12/+25
With the previous change to guarantee bypass visiblity for RCU read lock regions, entering bypass mode involves non-trivial overhead and future changes are scheduled to make use of bypass mode during init path. Combined it may end up adding noticeable delay during boot. This patch makes request_queue start its life in bypass mode, which is ended on queue init completion at the end of blk_init_allocated_queue(), and updates blk_queue_bypass_start() such that draining and RCU synchronization are performed only when the queue actually enters bypass mode. This avoids unnecessarily switching in and out of bypass mode during init avoiding the overhead and any nasty surprises which may step from leaving bypass mode on half-initialized queues. The boot time overhead was pointed out by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: make sure blkg_lookup() returns %NULL if @q is bypassingTejun Heo2-19/+46
Currently, blkg_lookup() doesn't check @q bypass state. This patch updates blk_queue_bypass_start() to do synchronize_rcu() before returning and updates blkg_lookup() to check blk_queue_bypass() and return %NULL if bypassing. This ensures blkg_lookup() returns %NULL if @q is bypassing. This is to guarantee that nobody is accessing policy data while @q is bypassing, which is necessary to allow replacing blkio_cgroup->pd[] in place on policy [de]activation. v2: Added more comments explaining bypass guarantees as suggested by Vivek. v3: Added more comments explaining why there's no synchronize_rcu() in blk_cleanup_queue() as suggested by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: make blkg_conf_prep() take @pol and return with queue lock heldTejun Heo4-10/+14
Add @pol to blkg_conf_prep() and let it return with queue lock held (to be released by blkg_conf_finish()). Note that @pol isn't used yet. This is to prepare for per-queue policy activation and doesn't cause any visible difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: remove static policy ID enumsTejun Heo5-41/+69
Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate @pol->plid dynamically on registration. The maximum number of blkcg policies which can be registered at the same time is defined by BLKCG_MAX_POLS constant added to include/linux/blkdev.h. Note that blkio_policy_register() now may fail. Policy init functions updated accordingly and unnecessary ifdefs removed from cfq_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs()Tejun Heo4-21/+23
The two functions were taking "enum blkio_policy_id plid". Make them take "const struct blkio_policy_type *pol" instead. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: kill blkio_list and replace blkio_list_lock with a mutexTejun Heo2-16/+17
With blkio_policy[], blkio_list is redundant and hinders with per-queue policy activation. Remove it. Also, replace blkio_list_lock with a mutex blkcg_pol_mutex and let it protect the whole [un]registration. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20cfq: fix build breakage & warningsTejun Heo2-11/+10
* CFQ_WEIGHT_* defined inside CONFIG_BLK_CGROUP causes cfq-iosched.c compile failure when the config is disabled. Move it outside the ifdef block. * Dummy cfqg_stats_*() definitions were lacking inline modifiers causing unused functions warning if !CONFIG_CFQ_GROUP_IOSCHED. Add them. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-01blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macrosTejun Heo4-84/+72
Now that all stat handling code lives in policy implementations, there's no need to encode policy ID in cft->private. * Export blkcg_prfill_[rw]stat() from blkcg, remove blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which use hard-code BLKIO_POLICY_PROP. * Use cft->private for offset of the target field directly and drop BLKCG_STAT_{PRIV|POL|OFF}(). Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: pass around pd->pdata instead of pd itself in prfill functionsTejun Heo4-41/+33
Now that all conf and stat fields are moved into policy specific blkio_policy_data->pdata areas, there's no reason to use blkio_policy_data itself in prfill functions. Pass around @pd->pdata instead of @pd. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_conf->iops and ->bps to blk-throttleTejun Heo2-103/+58
blkio_cgroup_conf->iops and ->bps are owned by blk-throttle and has no reason to be defined in blkcg core. Drop them and let conf setting functions directly manipulate throtl_grp->bps[] and ->iops[]. This makes blkio_group_conf empty. Drop it. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_conf->weight to cfqTejun Heo3-50/+45
blkio_group_conf->weight is owned by cfq and has no reason to be defined in blkcg core. Replace it with cfq_group->dev_weight and let conf setting functions directly set it. If dev_weight is zero, the cfqg doesn't have device specific weight configured. Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename blkio_cgroup->weight to blkio_cgroup->cfq_weight. We eventually want per-policy storage in blkio_cgroup but just mark the ownership of the field for now. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_stats_cpu and friends to blk-throttle.cTejun Heo3-125/+114
blkio_group_stats_cpu is used only by blk-throtl and has no reason to be defined in blkcg core. * Move blkio_group_stats_cpu to blk-throttle.c and rename it to tg_stats_cpu. * blkg_policy_data->stats_cpu is replaced with throtl_grp->stats_cpu. prfill functions updated accordingly. * All related macros / functions are renamed so that they have tg_ prefix and the unnecessary @pol arguments are dropped. * Per-cpu stats allocation code is also moved from blk-cgroup.c to blk-throttle.c and gets simplified to only deal with BLKIO_POLICY_THROTL. percpu stat free is performed by the exit method throtl_exit_blkio_group(). * throtl_reset_group_stats() implemented for blkio_reset_group_stats_fn method so that tg->stats_cpu can be reset. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_stats to cfq-iosched.cTejun Heo3-278/+193
blkio_group_stats contains only fields used by cfq and has no reason to be defined in blkcg core. * Move blkio_group_stats to cfq-iosched.c and rename it to cfqg_stats. * blkg_policy_data->stats is replaced with cfq_group->stats. blkg_prfill_[rw]stat() are updated to use offset against pd->pdata instead. * All related macros / functions are renamed so that they have cfqg_ prefix and the unnecessary @pol arguments are dropped. * All stat functions now take cfq_group * instead of blkio_group *. * lockdep assertion on queue lock dropped. Elevator runs under queue lock by default. There isn't much to be gained by adding lockdep assertions at stat function level. * cfqg_stats_reset() implemented for blkio_reset_group_stats_fn method so that cfqg->stats can be reset. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: add blkio_policy_ops operations for exit and stat resetTejun Heo2-4/+16
Add blkio_policy_ops->blkio_exit_group_fn() and ->blkio_reset_group_stats_fn(). These will be used to further modularize blkcg policy implementation. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: cfq doesn't need per-cpu dispatch statsTejun Heo4-95/+48
blkio_group_stats_cpu is used to count dispatch stats using per-cpu counters. This is used by both blk-throtl and cfq-iosched but the sharing is rather silly. * cfq-iosched doesn't need per-cpu dispatch stats. cfq always updates those stats while holding queue_lock. * blk-throtl needs per-cpu dispatch stats but only service_bytes and serviced. It doesn't make use of sectors. This patch makes cfq add and use global stats for service_bytes, serviced and sectors, removes per-cpu sectors counter and moves per-cpu stat printing code to blk-throttle.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move statistics update code to policiesTejun Heo4-397/+259
As with conf/stats file handling code, there's no reason for stat update code to live in blkcg core with policies calling into update them. The current organization is both inflexible and complex. This patch moves stat update code to specific policies. All blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP stats are collapsed into their cfq_blkiocg_update_*_stats() counterparts. blkiocg_update_dispatch_stats() is used by both policies and duplicated as throtl_update_dispatch_stats() and cfq_blkiocg_update_dispatch_stats(). This will be cleaned up later. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01cfq: collapse cfq.h into cfq-iosched.cTejun Heo2-119/+113
block/cfq.h contains some functions which interact with blkcg; however, this is only part of it and cfq-iosched.c already has quite some #ifdef CONFIG_CFQ_GROUP_IOSCHED. With conf/stat handling being moved to specific policies, having these relay functions isolated in cfq.h doesn't make much sense. Collapse cfq.h into cfq-iosched.c for now. Let's split blkcg support properly later if necessary. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move conf/stat file handling code to policiesTejun Heo4-420/+333
blkcg conf/stat handling is convoluted in that details which belong to specific policy implementations are all out in blkcg core and then policies hook into core layer to access and manipulate confs and stats. This sadly achieves both inflexibility (confs/stats can't be modified without messing with blkcg core) and complexity (all the call-ins and call-backs). The previous patches restructured conf and stat handling code such that they can be separated out. This patch relocates the file handling part. All conf/stat file handling code which belongs to BLKIO_POLICY_PROP is moved to cfq-iosched.c and all BKLIO_POLICY_THROTL code to blk-throtl.c. The move is verbatim except for blkio_update_group_{weight|bps|iops}() callbacks which relays conf changes to policies. The configuration settings are handled in policies themselves so the relaying isn't necessary. Conf setting functions are modified to directly call per-policy update functions and the relaying mechanism is dropped. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: implement blkio_policy_type->cftypesTejun Heo2-0/+7
Add blkiop->cftypes which is added and removed together with the policy. This will be used to move conf/stat handling to the policies. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: export conf/stat helpers to prepare for reorganizationTejun Heo2-27/+52
conf/stat handling is about to be moved to policy implementation from blkcg core. Export conf/stat helpers from blkcg core so that blk-throttle and cfq-iosched can use them. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: simplify blkg_conf_prep()Tejun Heo1-54/+10
blkg_conf_prep() implements "MAJ:MIN VAL" parsing manually, which is unnecessary. Just use sscanf("%u:%u %llu"). This might not reject some malformed input (extra input at the end) but we don't care. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: restructure blkio_group configruation settingTejun Heo2-140/+147
As part of userland interface restructuring, this patch updates per-blkio_group configuration setting. Instead of funneling everything through a master function which has hard-coded cases for each config file it may handle, the common part is factored into blkg_conf_prep() and blkg_conf_finish() and different configuration setters are implemented using the helpers. While this doesn't result in immediate LOC reduction, this enables further cleanups and more modular implementation. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: restructure configuration printingTejun Heo2-104/+55
Similarly to the previous stat restructuring, this patch restructures conf printing code such that, * Conf printing uses the same helpers as stat. * Printing function doesn't require hardcoded switching on the config being printed. Note that this isn't complete yet for throttle confs. The next patch will convert setting for these confs and will complete the transition. * Printing uses read_seq_string callback (other methods will be phased out). Note that blkio_group_conf.iops[2] is changed to u64 so that they can be manipulated with the same functions. This is transitional and will go away later. After this patch, per-device configurations - weight, bps and iops - use __blkg_prfill_u64() for printing which uses white space as delimiter instead of tab. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: drop blkiocg_file_write_u64()Tejun Heo1-28/+7
blkiocg_file_write_u64() has single switch case. Drop blkiocg_file_write_u64(), rename blkio_weight_write() to blkcg_set_weight() and use it directly for .write_u64 callback. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: restructure statistics printingTejun Heo2-374/+243
blkcg stats handling is a mess. None of the stats has much to do with blkcg core but they are all implemented in blkcg core. Code sharing is achieved by mixing common code with hard-coded cases for each stat counter. This patch restructures statistics printing such that * Common logic exists as helper functions and specific print functions use the helpers to implement specific cases. * Printing functions serving multiple counters don't require hardcoded switching on specific counters. * Printing uses read_seq_string callback (other methods will be phased out). This change enables further cleanups and relocating stats code to the policy implementation it belongs to. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: introduce blkg_stat and blkg_rwstatTejun Heo2-207/+293
blkcg uses u64_stats_sync to avoid reading wrong u64 statistic values on 32bit archs and some stat counters have subtypes to distinguish read/writes and sync/async IOs. The stat code paths are confusing and involve a lot of going back and forth between blkcg core and specific policy implementations, and synchronization and subtype handling are open coded in blkcg core. This patch introduces struct blkg_stat and blkg_rwstat which, with accompanying operations, encapsulate stat updating and accessing with proper synchronization. blkg_stat is simple u64 counter with 64bit read-access protection. blkg_rwstat is the one with rw and [a]sync subcounters and takes @rw flags to distinguish IO subtypes (%REQ_WRITE and %REQ_SYNC) and replaces stat_sub_type indexed arrays. All counters in blkio_group_stats and blkio_group_stats_cpu are replaced with either blkg_stat or blkg_rwstat along with all users. This does add one u64_stats_sync per counter and increase stats_sync operations but they're empty/noops on 64bit archs and blkcg doesn't have too many counters, especially with DEBUG_BLK_CGROUP off. While the currently resulting code isn't necessarily simpler at the moment, this will enable further clean up of blkcg stats code. - BLKIO_STAT_{READ|WRITE|SYNC|ASYNC|TOTAL} renamed to BLKG_RWSTAT_{READ|WRITE|SYNC|ASYNC|TOTAL}. - blkg_stat_add() replaces blkio_add_stat() and blkio_check_and_dec_stat(). Note that BUG_ON() on underflow in the latter function no longer exists. It's *way* better to have underflowed stat counters than oopsing. - blkio_group_stats->dequeue is now a proper u64 stat counter instead of ulong. - reset_stats() updated to clear each stat counters individually and BLKG_STATS_DEBUG_CLEAR_{START|SIZE} are removed. - Some functions reconstruct rw flags from direction and sync booleans. This will be removed by future patches. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcountersTejun Heo1-3/+6
BLKIO_STAT_CPU_SECTORS doesn't need read/write/sync/async subcounters and is counted by blkio_group_stats_cpu->sectors; however, it still holds a member in blkio_group_stats_cpu->stat_arr_cpu. Rearrange stat_type_cpu and define BLKIO_STAT_CPU_ARR_NR and use it for stat_arr_cpu[] size so that only SERVICE_BYTES and SERVICED have subcounters. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: remove unused @pol and @plid parametersTejun Heo4-16/+9
@pol to blkg_to_pdata() and @plid to blkg_lookup_create() are no longer necessary. Drop them. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01cgroup: make css->refcnt clearing on cgroup removal optionalTejun Heo3-9/+80
Currently, cgroup removal tries to drain all css references. If there are active css references, the removal logic waits and retries ->pre_detroy() until either all refs drop to zero or removal is cancelled. This semantics is unusual and adds non-trivial complexity to cgroup core and IMHO is fundamentally misguided in that it couples internal implementation details (references to internal data structure) with externally visible operation (rmdir). To userland, this is a behavior peculiarity which is unnecessary and difficult to expect (css refs is otherwise invisible from userland), and, to policy implementations, this is an unnecessary restriction (e.g. blkcg wants to hold css refs for caching purposes but can't as that becomes visible as rmdir hang). Unfortunately, memcg currently depends on ->pre_destroy() retrials and cgroup removal vetoing and can't be immmediately switched to the new behavior. This patch introduces the new behavior of not waiting for css refs to drain and maintains the old behavior for subsystems which have __DEPRECATED_clear_css_refs set. Once, memcg is updated, we can drop the code paths for the old behavior as proposed in the following patch. Note that the following patch is incorrect in that dput work item is in cgroup and may lose some of dputs when multiples css's are released back-to-back, and __css_put() triggers check_for_release() when refcnt reaches 0 instead of 1; however, it shows what part can be removed. http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251 Note that, in not-too-distant future, cgroup core will start emitting warning messages for subsys which require the old behavior, so please get moving. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
2012-04-01cgroup: use negative bias on css->refcnt to block css_tryget()Tejun Heo2-56/+75
When a cgroup is about to be removed, cgroup_clear_css_refs() is called to check and ensure that there are no active css references. This is currently achieved by dropping the refcnt to zero iff it has only the base ref. If all css refs could be dropped to zero, ref clearing is successful and CSS_REMOVED is set on all css. If not, the base ref is restored. While css ref is zero w/o CSS_REMOVED set, any css_tryget() attempt on it busy loops so that they are atomic w.r.t. the whole css ref clearing. This does work but dropping and re-instating the base ref is somewhat hairy and makes it difficult to add more logic to the put path as there are two of them - the regular css_put() and the reversible base ref clearing. This patch updates css ref clearing such that blocking new css_tryget() and putting the base ref are separate operations. CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and css_tryget() busy loops while refcnt is negative. After all css refs are deactivated, if they were all one, ref clearing succeeded and CSS_REMOVED is set and the base ref is put using the regular css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts and the original postive values are restored. css_refcnt() accessor which always returns the unbiased positive reference counts is added and used to simplify refcnt usages. While at it, relocate and reformat comments in cgroup_has_css_refs(). This separates css->refcnt deactivation and putting the base ref, which enables the next patch to make ref clearing optional. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: implement cgroup_rm_cftypes()Tejun Heo2-10/+45
Implement cgroup_rm_cftypes() which removes an array of cftypes from a subsystem. It can be called whether the target subsys is attached or not. cgroup core will remove the specified file from all existing cgroups. This will be used to improve sub-subsys modularity and will be helpful for unified hierarchy. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: introduce struct cfentTejun Heo2-36/+78
This patch adds cfent (cgroup file entry) which is the association between a cgroup and a file. This is in-cgroup representation of files under a cgroup directory. This simplifies walking walking cgroup files and thus cgroup_clear_directory(), which is now implemented in two parts - cgroup_rm_file() and a loop around it. cgroup_rm_file() will be used to implement cftype removal and cfent is scheduled to serve cgroup specific per-file data (e.g. for sysfs-like "sever" semantics). v2: - cfe was freed from cgroup_rm_file() which led to use-after-free if the file had openers at the time of removal. Moved to cgroup_diput(). - cgroup_clear_directory() triggered WARN_ON_ONCE() if d_subdirs wasn't empty after removing all files. This triggered spuriously if some files were open during directory clearing. Removed. v3: - In cgroup_diput(), WARN_ONCE(!list_empty(&cfe->node)) could be spuriously triggered for root cgroups because they don't go through cgroup_clear_directory() on unmount. Don't trigger WARN for root cgroups. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Glauber Costa <glommer@parallels.com>
2012-04-01cgroup: relocate __d_cgrp() and __d_cft()Tejun Heo1-10/+10
Move the two macros upwards as they'll be used earlier in the file. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: remove cgroup_add_file[s]()Tejun Heo2-47/+20
No controller is using cgroup_add_files[s](). Unexport them, and convert cgroup_add_files() to handle NULL entry terminated array instead of taking count explicitly and continue creation on failure for internal use. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: convert memcg controller to the new cftype interfaceTejun Heo2-15/+13
Convert memcg to use the new cftype based interface. kmem support abuses ->populate() for mem_cgroup_sockets_init() so it can't be removed at the moment. tcp_memcontrol is updated so that tcp_files[] is registered via a __initcall. This change also allows removing the forward declaration of tcp_files[]. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Hugh Dickins <hughd@google.com> Cc: Greg Thelen <gthelen@google.com>
2012-04-01memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAPTejun Heo1-34/+31
Instead of conditioning creation of memsw files on do_swap_account, always create the files if compiled-in and fail read/write attempts with -EOPNOTSUPP if !do_swap_account. This is suggested by KAMEZAWA to simplify memcg file creation so that it can use cgroup->subsys_cftypes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: convert all non-memcg controllers to the new cftype interfaceTejun Heo8-75/+28
Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio, net_cls and device controllers to use the new cftype based interface. Termination entry is added to cftype arrays and populate callbacks are replaced with cgroup_subsys->base_cftypes initializations. This is functionally identical transformation. There shouldn't be any visible behavior change. memcg is rather special and will be converted separately. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vivek Goyal <vgoyal@redhat.com>
2012-04-01cgroup: relocate cftype and cgroup_subsys definitions in controllersTejun Heo4-83/+65
blk-cgroup, netprio_cgroup, cls_cgroup and tcp_memcontrol unnecessarily define cftype array and cgroup_subsys structures at the top of the file, which is unconventional and necessiates forward declaration of methods. This patch relocates those below the definitions of the methods and removes the forward declarations. Note that forward declaration of tcp_files[] is added in tcp_memcontrol.c for tcp_init_cgroup(). This will be removed soon by another patch. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: merge cft_release_agent cftype array into the base files arrayTejun Heo1-12/+7
Now that cftype can express whether a file should only be on root, cft_release_agent can be merged into the base files cftypes array. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: implement cgroup_add_cftypes() and friendsTejun Heo2-3/+162
Currently, cgroup directories are populated by subsys->populate() callback explicitly creating files on each cgroup creation. This level of flexibility isn't needed or desirable. It provides largely unused flexibility which call for abuses while severely limiting what the core layer can do through the lack of structure and conventions. Per each cgroup file type, the only distinction that cgroup users is making is whether a cgroup is root or not, which can easily be expressed with flags. This patch introduces cgroup_add_cftypes(). These deal with cftypes instead of individual files - controllers indicate that certain types of files exist for certain subsystem. Newly added CFTYPE_*_ON_ROOT flags indicate whether a cftype should be excluded or created only on the root cgroup. cgroup_add_cftypes() can be called any time whether the target subsystem is currently attached or not. cgroup core will create files on the existing cgroups as necessary. Also, cgroup_subsys->base_cftypes is added to ease registration of the base files for the subsystem. If non-NULL on subsys init, the cftypes pointed to by ->base_cftypes are automatically registered on subsys init / load. Further patches will convert the existing users and remove the file based interface. Note that this interface allows dynamic addition of files to an active controller. This will be used for sub-controller modularity and unified hierarchy in the longer term. This patch implements the new mechanism but doesn't apply it to any user. v2: replaced DECLARE_CGROUP_CFTYPES[_COND]() with cgroup_subsys->base_cftypes, which works better for cgroup_subsys which is loaded as module. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: build list of all cgroups under a given cgroupfs_rootTejun Heo2-0/+12
Build a list of all cgroups anchored at cgroupfs_root->allcg_list and going through cgroup->allcg_node. The list is protected by cgroup_mutex and will be used to improve cgroup file handling. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir()Tejun Heo1-4/+2
cgroup_populate_dir() currently clears all files and then repopulate the directory; however, the clearing part is only useful when it's called from cgroup_remount(). Relocate the invocation to cgroup_remount(). This is to prepare for further cgroup file handling updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2012-04-01cgroup: deprecate remount option changesTejun Heo2-0/+14
This patch marks the following features for deprecation. * Rebinding subsys by remount: Never reached useful state - only works on empty hierarchies. * release_agent update by remount: release_agent itself will be replaced with conventional fsnotify notification. v2: Lennart pointed out that "name=" is necessary for mounts w/o any controller attached. Drop "name=" deprecation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Lennart Poettering <mzxreary@0pointer.de>
2012-03-31Linux 3.4-rc1Linus Torvalds1-2/+2
2012-03-31vfs: fix out-of-date dentry_unhash() commentJ. Bruce Fields1-1/+1
64252c75a2196a0cf1e0d3777143ecfe0e3ae650 "vfs: remove dget() from dentry_unhash()" changed the implementation but not the comment. Cc: Sage Weil <sage@newdream.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31vfs: split __lookup_hashMiklos Szeredi1-64/+44
Split __lookup_hash into two component functions: lookup_dcache - tries cached lookup, returns whether real lookup is needed lookup_real - calls i_op->lookup This eliminates code duplication between d_alloc_and_lookup() and d_inode_lookup(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - take __lookup_hash()-calling case out of line.Al Viro1-15/+16
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31untangling do_lookup() - switch to calling __lookup_hash()Al Viro1-67/+46
now we have __lookup_hash() open-coded if !dentry case; just call the damn thing instead... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>