aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/cgroups
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--Documentation/cgroups/blkio-controller.txt30
-rw-r--r--Documentation/cgroups/cgroups.txt24
-rw-r--r--Documentation/cgroups/cpusets.txt17
-rw-r--r--Documentation/cgroups/memory.txt20
4 files changed, 44 insertions, 47 deletions
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 4ed7b5ceeed2..465351d4cf85 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -140,7 +140,7 @@ Proportional weight policy files
- Specifies per cgroup weight. This is default weight of the group
on all the devices until and unless overridden by per device rule.
(See blkio.weight_device).
- Currently allowed range of weights is from 100 to 1000.
+ Currently allowed range of weights is from 10 to 1000.
- blkio.weight_device
- One can specify per cgroup per device rules using this interface.
@@ -343,34 +343,6 @@ Common files among various policies
CFQ sysfs tunable
=================
-/sys/block/<disk>/queue/iosched/group_isolation
------------------------------------------------
-
-If group_isolation=1, it provides stronger isolation between groups at the
-expense of throughput. By default group_isolation is 0. In general that
-means that if group_isolation=0, expect fairness for sequential workload
-only. Set group_isolation=1 to see fairness for random IO workload also.
-
-Generally CFQ will put random seeky workload in sync-noidle category. CFQ
-will disable idling on these queues and it does a collective idling on group
-of such queues. Generally these are slow moving queues and if there is a
-sync-noidle service tree in each group, that group gets exclusive access to
-disk for certain period. That means it will bring the throughput down if
-group does not have enough IO to drive deeper queue depths and utilize disk
-capacity to the fullest in the slice allocated to it. But the flip side is
-that even a random reader should get better latencies and overall throughput
-if there are lots of sequential readers/sync-idle workload running in the
-system.
-
-If group_isolation=0, then CFQ automatically moves all the random seeky queues
-in the root group. That means there will be no service differentiation for
-that kind of workload. This leads to better throughput as we do collective
-idling on root sync-noidle tree.
-
-By default one should run with group_isolation=0. If that is not sufficient
-and one wants stronger isolation between groups, then set group_isolation=1
-but this will come at cost of reduced throughput.
-
/sys/block/<disk>/queue/iosched/slice_idle
------------------------------------------
On a faster hardware CFQ can be slow, especially with sequential workload.
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 44b8b7af8019..aedf1bd02fdd 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -110,22 +110,22 @@ university server with various users - students, professors, system
tasks etc. The resource planning for this server could be along the
following lines:
- CPU : Top cpuset
+ CPU : "Top cpuset"
/ \
CPUSet1 CPUSet2
- | |
- (Profs) (Students)
+ | |
+ (Professors) (Students)
In addition (system tasks) are attached to topcpuset (so
that they can run anywhere) with a limit of 20%
- Memory : Professors (50%), students (30%), system (20%)
+ Memory : Professors (50%), Students (30%), system (20%)
- Disk : Prof (50%), students (30%), system (20%)
+ Disk : Professors (50%), Students (30%), system (20%)
Network : WWW browsing (20%), Network File System (60%), others (20%)
/ \
- Prof (15%) students (5%)
+ Professors (15%) students (5%)
Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go
into NFS network class.
@@ -349,6 +349,10 @@ To mount a cgroup hierarchy with all available subsystems, type:
The "xxx" is not interpreted by the cgroup code, but will appear in
/proc/mounts so may be any useful identifying string that you like.
+Note: Some subsystems do not work without some user input first. For instance,
+if cpusets are enabled the user will have to populate the cpus and mems files
+for each new cgroup created before that group can be used.
+
To mount a cgroup hierarchy with just the cpuset and memory
subsystems, type:
# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup
@@ -426,6 +430,14 @@ You can attach the current shell task by echoing 0:
# echo 0 > tasks
+Note: Since every task is always a member of exactly one cgroup in each
+mounted hierarchy, to remove a task from its current cgroup you must
+move it into a new cgroup (possibly the root cgroup) by writing to the
+new cgroup's tasks file.
+
+Note: If the ns cgroup is active, moving a process to another cgroup can
+fail.
+
2.3 Mounting hierarchies by name
--------------------------------
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index 5d0d5692a365..98a30829af7a 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -693,7 +693,7 @@ There are ways to query or modify cpusets:
- via the C library libcgroup.
(http://sourceforge.net/projects/libcg/)
- via the python application cset.
- (http://developer.novell.com/wiki/index.php/Cpuset)
+ (http://code.google.com/p/cpuset/)
The sched_setaffinity calls can also be done at the shell prompt using
SGI's runon or Robert Love's taskset. The mbind and set_mempolicy
@@ -725,13 +725,14 @@ Now you want to do something with this cpuset.
In this directory you can find several files:
# ls
-cpuset.cpu_exclusive cpuset.memory_spread_slab
-cpuset.cpus cpuset.mems
-cpuset.mem_exclusive cpuset.sched_load_balance
-cpuset.mem_hardwall cpuset.sched_relax_domain_level
-cpuset.memory_migrate notify_on_release
-cpuset.memory_pressure tasks
-cpuset.memory_spread_page
+cgroup.clone_children cpuset.memory_pressure
+cgroup.event_control cpuset.memory_spread_page
+cgroup.procs cpuset.memory_spread_slab
+cpuset.cpu_exclusive cpuset.mems
+cpuset.cpus cpuset.sched_load_balance
+cpuset.mem_exclusive cpuset.sched_relax_domain_level
+cpuset.mem_hardwall notify_on_release
+cpuset.memory_migrate tasks
Reading them will give you information about the state of this cpuset:
the CPUs and Memory Nodes it can use, the processes that are using
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 7781857dc940..7c163477fcd8 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -52,8 +52,10 @@ Brief summary of control files.
tasks # attach a task(thread) and show list of threads
cgroup.procs # show list of processes
cgroup.event_control # an interface for event_fd()
- memory.usage_in_bytes # show current memory(RSS+Cache) usage.
- memory.memsw.usage_in_bytes # show current memory+Swap usage
+ memory.usage_in_bytes # show current res_counter usage for memory
+ (See 5.5 for details)
+ memory.memsw.usage_in_bytes # show current res_counter usage for memory+Swap
+ (See 5.5 for details)
memory.limit_in_bytes # set/show limit of memory usage
memory.memsw.limit_in_bytes # set/show limit of memory+Swap usage
memory.failcnt # show the number of memory usage hits limits
@@ -453,6 +455,15 @@ memory under it will be reclaimed.
You can reset failcnt by writing 0 to failcnt file.
# echo 0 > .../memory.failcnt
+5.5 usage_in_bytes
+
+For efficiency, as other kernel components, memory cgroup uses some optimization
+to avoid unnecessary cacheline false sharing. usage_in_bytes is affected by the
+method and doesn't show 'exact' value of memory(and swap) usage, it's an fuzz
+value for efficient access. (Of course, when necessary, it's synchronized.)
+If you want to know more exact memory usage, you should use RSS+CACHE(+SWAP)
+value in memory.stat(see 5.2).
+
6. Hierarchy support
The memory controller supports a deep hierarchy and hierarchical accounting.
@@ -485,8 +496,9 @@ The feature can be disabled by
# echo 0 > memory.use_hierarchy
-NOTE1: Enabling/disabling will fail if the cgroup already has other
- cgroups created below it.
+NOTE1: Enabling/disabling will fail if either the cgroup already has other
+ cgroups created below it, or if the parent cgroup has use_hierarchy
+ enabled.
NOTE2: When panic_on_oom is set to "2", the whole system will panic in
case of an OOM event in any cgroup.