diff options
Diffstat (limited to '')
-rw-r--r-- | Documentation/cgroups/blkio-controller.txt | 30 | ||||
-rw-r--r-- | Documentation/cgroups/cgroups.txt | 24 | ||||
-rw-r--r-- | Documentation/cgroups/cpusets.txt | 17 | ||||
-rw-r--r-- | Documentation/cgroups/memory.txt | 20 |
4 files changed, 44 insertions, 47 deletions
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index 4ed7b5ceeed2..465351d4cf85 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -140,7 +140,7 @@ Proportional weight policy files - Specifies per cgroup weight. This is default weight of the group on all the devices until and unless overridden by per device rule. (See blkio.weight_device). - Currently allowed range of weights is from 100 to 1000. + Currently allowed range of weights is from 10 to 1000. - blkio.weight_device - One can specify per cgroup per device rules using this interface. @@ -343,34 +343,6 @@ Common files among various policies CFQ sysfs tunable ================= -/sys/block/<disk>/queue/iosched/group_isolation ------------------------------------------------ - -If group_isolation=1, it provides stronger isolation between groups at the -expense of throughput. By default group_isolation is 0. In general that -means that if group_isolation=0, expect fairness for sequential workload -only. Set group_isolation=1 to see fairness for random IO workload also. - -Generally CFQ will put random seeky workload in sync-noidle category. CFQ -will disable idling on these queues and it does a collective idling on group -of such queues. Generally these are slow moving queues and if there is a -sync-noidle service tree in each group, that group gets exclusive access to -disk for certain period. That means it will bring the throughput down if -group does not have enough IO to drive deeper queue depths and utilize disk -capacity to the fullest in the slice allocated to it. But the flip side is -that even a random reader should get better latencies and overall throughput -if there are lots of sequential readers/sync-idle workload running in the -system. - -If group_isolation=0, then CFQ automatically moves all the random seeky queues -in the root group. That means there will be no service differentiation for -that kind of workload. This leads to better throughput as we do collective -idling on root sync-noidle tree. - -By default one should run with group_isolation=0. If that is not sufficient -and one wants stronger isolation between groups, then set group_isolation=1 -but this will come at cost of reduced throughput. - /sys/block/<disk>/queue/iosched/slice_idle ------------------------------------------ On a faster hardware CFQ can be slow, especially with sequential workload. diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 44b8b7af8019..aedf1bd02fdd 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -110,22 +110,22 @@ university server with various users - students, professors, system tasks etc. The resource planning for this server could be along the following lines: - CPU : Top cpuset + CPU : "Top cpuset" / \ CPUSet1 CPUSet2 - | | - (Profs) (Students) + | | + (Professors) (Students) In addition (system tasks) are attached to topcpuset (so that they can run anywhere) with a limit of 20% - Memory : Professors (50%), students (30%), system (20%) + Memory : Professors (50%), Students (30%), system (20%) - Disk : Prof (50%), students (30%), system (20%) + Disk : Professors (50%), Students (30%), system (20%) Network : WWW browsing (20%), Network File System (60%), others (20%) / \ - Prof (15%) students (5%) + Professors (15%) students (5%) Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go into NFS network class. @@ -349,6 +349,10 @@ To mount a cgroup hierarchy with all available subsystems, type: The "xxx" is not interpreted by the cgroup code, but will appear in /proc/mounts so may be any useful identifying string that you like. +Note: Some subsystems do not work without some user input first. For instance, +if cpusets are enabled the user will have to populate the cpus and mems files +for each new cgroup created before that group can be used. + To mount a cgroup hierarchy with just the cpuset and memory subsystems, type: # mount -t cgroup -o cpuset,memory hier1 /dev/cgroup @@ -426,6 +430,14 @@ You can attach the current shell task by echoing 0: # echo 0 > tasks +Note: Since every task is always a member of exactly one cgroup in each +mounted hierarchy, to remove a task from its current cgroup you must +move it into a new cgroup (possibly the root cgroup) by writing to the +new cgroup's tasks file. + +Note: If the ns cgroup is active, moving a process to another cgroup can +fail. + 2.3 Mounting hierarchies by name -------------------------------- diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index 5d0d5692a365..98a30829af7a 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt @@ -693,7 +693,7 @@ There are ways to query or modify cpusets: - via the C library libcgroup. (http://sourceforge.net/projects/libcg/) - via the python application cset. - (http://developer.novell.com/wiki/index.php/Cpuset) + (http://code.google.com/p/cpuset/) The sched_setaffinity calls can also be done at the shell prompt using SGI's runon or Robert Love's taskset. The mbind and set_mempolicy @@ -725,13 +725,14 @@ Now you want to do something with this cpuset. In this directory you can find several files: # ls -cpuset.cpu_exclusive cpuset.memory_spread_slab -cpuset.cpus cpuset.mems -cpuset.mem_exclusive cpuset.sched_load_balance -cpuset.mem_hardwall cpuset.sched_relax_domain_level -cpuset.memory_migrate notify_on_release -cpuset.memory_pressure tasks -cpuset.memory_spread_page +cgroup.clone_children cpuset.memory_pressure +cgroup.event_control cpuset.memory_spread_page +cgroup.procs cpuset.memory_spread_slab +cpuset.cpu_exclusive cpuset.mems +cpuset.cpus cpuset.sched_load_balance +cpuset.mem_exclusive cpuset.sched_relax_domain_level +cpuset.mem_hardwall notify_on_release +cpuset.memory_migrate tasks Reading them will give you information about the state of this cpuset: the CPUs and Memory Nodes it can use, the processes that are using diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index 7781857dc940..7c163477fcd8 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -52,8 +52,10 @@ Brief summary of control files. tasks # attach a task(thread) and show list of threads cgroup.procs # show list of processes cgroup.event_control # an interface for event_fd() - memory.usage_in_bytes # show current memory(RSS+Cache) usage. - memory.memsw.usage_in_bytes # show current memory+Swap usage + memory.usage_in_bytes # show current res_counter usage for memory + (See 5.5 for details) + memory.memsw.usage_in_bytes # show current res_counter usage for memory+Swap + (See 5.5 for details) memory.limit_in_bytes # set/show limit of memory usage memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage memory.failcnt # show the number of memory usage hits limits @@ -453,6 +455,15 @@ memory under it will be reclaimed. You can reset failcnt by writing 0 to failcnt file. # echo 0 > .../memory.failcnt +5.5 usage_in_bytes + +For efficiency, as other kernel components, memory cgroup uses some optimization +to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the +method and doesn't show 'exact' value of memory(and swap) usage, it's an fuzz +value for efficient access. (Of course, when necessary, it's synchronized.) +If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP) +value in memory.stat(see 5.2). + 6. Hierarchy support The memory controller supports a deep hierarchy and hierarchical accounting. @@ -485,8 +496,9 @@ The feature can be disabled by # echo 0 > memory.use_hierarchy -NOTE1: Enabling/disabling will fail if the cgroup already has other - cgroups created below it. +NOTE1: Enabling/disabling will fail if either the cgroup already has other + cgroups created below it, or if the parent cgroup has use_hierarchy + enabled. NOTE2: When panic_on_oom is set to "2", the whole system will panic in case of an OOM event in any cgroup. |