aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2010-05-18 23:01:55 -0700
committerDavid S. Miller <davem@davemloft.net>2010-05-18 23:01:55 -0700
commit2ec8c6bb5d8f3a62a79f463525054bae1e3d4487 (patch)
treefa7f8400ac685fb52e96f64997c7c682fc2aa021 /Documentation
parentqlcnic: adding co maintainer (diff)
parentMerge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip (diff)
downloadlinux-dev-2ec8c6bb5d8f3a62a79f463525054bae1e3d4487.tar.xz
linux-dev-2ec8c6bb5d8f3a62a79f463525054bae1e3d4487.zip
Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts: include/linux/mod_devicetable.h scripts/mod/file2alias.c
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/stallwarn.txt94
-rw-r--r--Documentation/RCU/torture.txt10
-rw-r--r--Documentation/RCU/trace.txt35
-rw-r--r--Documentation/intel_txt.txt16
-rw-r--r--Documentation/kernel-parameters.txt8
-rw-r--r--Documentation/kprobes.txt10
-rw-r--r--Documentation/rbtree.txt58
-rw-r--r--Documentation/scheduler/sched-design-CFS.txt54
-rw-r--r--Documentation/scheduler/sched-rt-group.txt20
-rw-r--r--Documentation/trace/events.txt3
-rw-r--r--Documentation/trace/ftrace.txt50
-rw-r--r--Documentation/trace/kprobetrace.txt4
12 files changed, 226 insertions, 136 deletions
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
index 1423d2570d78..44c6dcc93d6d 100644
--- a/Documentation/RCU/stallwarn.txt
+++ b/Documentation/RCU/stallwarn.txt
@@ -3,35 +3,79 @@ Using RCU's CPU Stall Detector
The CONFIG_RCU_CPU_STALL_DETECTOR kernel config parameter enables
RCU's CPU stall detector, which detects conditions that unduly delay
RCU grace periods. The stall detector's idea of what constitutes
-"unduly delayed" is controlled by a pair of C preprocessor macros:
+"unduly delayed" is controlled by a set of C preprocessor macros:
RCU_SECONDS_TILL_STALL_CHECK
This macro defines the period of time that RCU will wait from
the beginning of a grace period until it issues an RCU CPU
- stall warning. It is normally ten seconds.
+ stall warning. This time period is normally ten seconds.
RCU_SECONDS_TILL_STALL_RECHECK
This macro defines the period of time that RCU will wait after
- issuing a stall warning until it issues another stall warning.
- It is normally set to thirty seconds.
+ issuing a stall warning until it issues another stall warning
+ for the same stall. This time period is normally set to thirty
+ seconds.
RCU_STALL_RAT_DELAY
- The CPU stall detector tries to make the offending CPU rat on itself,
- as this often gives better-quality stack traces. However, if
- the offending CPU does not detect its own stall in the number
- of jiffies specified by RCU_STALL_RAT_DELAY, then other CPUs will
- complain. This is normally set to two jiffies.
+ The CPU stall detector tries to make the offending CPU print its
+ own warnings, as this often gives better-quality stack traces.
+ However, if the offending CPU does not detect its own stall in
+ the number of jiffies specified by RCU_STALL_RAT_DELAY, then
+ some other CPU will complain. This delay is normally set to
+ two jiffies.
-The following problems can result in an RCU CPU stall warning:
+When a CPU detects that it is stalling, it will print a message similar
+to the following:
+
+INFO: rcu_sched_state detected stall on CPU 5 (t=2500 jiffies)
+
+This message indicates that CPU 5 detected that it was causing a stall,
+and that the stall was affecting RCU-sched. This message will normally be
+followed by a stack dump of the offending CPU. On TREE_RCU kernel builds,
+RCU and RCU-sched are implemented by the same underlying mechanism,
+while on TREE_PREEMPT_RCU kernel builds, RCU is instead implemented
+by rcu_preempt_state.
+
+On the other hand, if the offending CPU fails to print out a stall-warning
+message quickly enough, some other CPU will print a message similar to
+the following:
+
+INFO: rcu_bh_state detected stalls on CPUs/tasks: { 3 5 } (detected by 2, 2502 jiffies)
+
+This message indicates that CPU 2 detected that CPUs 3 and 5 were both
+causing stalls, and that the stall was affecting RCU-bh. This message
+will normally be followed by stack dumps for each CPU. Please note that
+TREE_PREEMPT_RCU builds can be stalled by tasks as well as by CPUs,
+and that the tasks will be indicated by PID, for example, "P3421".
+It is even possible for a rcu_preempt_state stall to be caused by both
+CPUs -and- tasks, in which case the offending CPUs and tasks will all
+be called out in the list.
+
+Finally, if the grace period ends just as the stall warning starts
+printing, there will be a spurious stall-warning message:
+
+INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffies)
+
+This is rare, but does happen from time to time in real life.
+
+So your kernel printed an RCU CPU stall warning. The next question is
+"What caused it?" The following problems can result in RCU CPU stall
+warnings:
o A CPU looping in an RCU read-side critical section.
-o A CPU looping with interrupts disabled.
+o A CPU looping with interrupts disabled. This condition can
+ result in RCU-sched and RCU-bh stalls.
-o A CPU looping with preemption disabled.
+o A CPU looping with preemption disabled. This condition can
+ result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
+ stalls.
+
+o A CPU looping with bottom halves disabled. This condition can
+ result in RCU-sched and RCU-bh stalls.
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule().
@@ -39,20 +83,24 @@ o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
o A bug in the RCU implementation.
o A hardware failure. This is quite unlikely, but has occurred
- at least once in a former life. A CPU failed in a running system,
+ at least once in real life. A CPU failed in a running system,
becoming unresponsive, but not causing an immediate crash.
This resulted in a series of RCU CPU stall warnings, eventually
leading the realization that the CPU had failed.
-The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning.
-SRCU does not do so directly, but its calls to synchronize_sched() will
-result in RCU-sched detecting any CPU stalls that might be occurring.
-
-To diagnose the cause of the stall, inspect the stack traces. The offending
-function will usually be near the top of the stack. If you have a series
-of stall warnings from a single extended stall, comparing the stack traces
-can often help determine where the stall is occurring, which will usually
-be in the function nearest the top of the stack that stays the same from
-trace to trace.
+The RCU, RCU-sched, and RCU-bh implementations have CPU stall
+warning. SRCU does not have its own CPU stall warnings, but its
+calls to synchronize_sched() will result in RCU-sched detecting
+RCU-sched-related CPU stalls. Please note that RCU only detects
+CPU stalls when there is a grace period in progress. No grace period,
+no CPU stall warnings.
+
+To diagnose the cause of the stall, inspect the stack traces.
+The offending function will usually be near the top of the stack.
+If you have a series of stall warnings from a single extended stall,
+comparing the stack traces can often help determine where the stall
+is occurring, which will usually be in the function nearest the top of
+that portion of the stack which remains the same from trace to trace.
+If you can reliably trigger the stall, ftrace can be quite helpful.
RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE.
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index 0e50bc2aa1e2..5d9016795fd8 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -182,16 +182,6 @@ Similarly, sched_expedited RCU provides the following:
sched_expedited-torture: Reader Pipe: 12660320201 95875 0 0 0 0 0 0 0 0 0
sched_expedited-torture: Reader Batch: 12660424885 0 0 0 0 0 0 0 0 0 0
sched_expedited-torture: Free-Block Circulation: 1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0
- state: -1 / 0:0 3:0 4:0
-
-As before, the first four lines are similar to those for RCU.
-The last line shows the task-migration state. The first number is
--1 if synchronize_sched_expedited() is idle, -2 if in the process of
-posting wakeups to the migration kthreads, and N when waiting on CPU N.
-Each of the colon-separated fields following the "/" is a CPU:state pair.
-Valid states are "0" for idle, "1" for waiting for quiescent state,
-"2" for passed through quiescent state, and "3" when a race with a
-CPU-hotplug event forces use of the synchronize_sched() primitive.
USAGE
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt
index 8608fd85e921..efd8cc95c06b 100644
--- a/Documentation/RCU/trace.txt
+++ b/Documentation/RCU/trace.txt
@@ -256,23 +256,23 @@ o Each element of the form "1/1 0:127 ^0" represents one struct
The output of "cat rcu/rcu_pending" looks as follows:
rcu_sched:
- 0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
- 1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
- 2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
- 3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
- 4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
- 5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
- 6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
- 7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
+ 0 np=255892 qsp=53936 rpq=85 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
+ 1 np=261224 qsp=54638 rpq=33 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
+ 2 np=237496 qsp=49664 rpq=23 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
+ 3 np=236249 qsp=48766 rpq=98 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
+ 4 np=221310 qsp=46850 rpq=7 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
+ 5 np=237332 qsp=48449 rpq=9 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
+ 6 np=219995 qsp=46718 rpq=12 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
+ 7 np=249893 qsp=49390 rpq=42 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
rcu_bh:
- 0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
- 1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
- 2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
- 3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
- 4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
- 5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
- 6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
- 7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
+ 0 np=146741 qsp=1419 rpq=6 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
+ 1 np=155792 qsp=12597 rpq=3 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
+ 2 np=136629 qsp=18680 rpq=1 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
+ 3 np=137723 qsp=2843 rpq=0 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
+ 4 np=123110 qsp=12433 rpq=0 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
+ 5 np=137456 qsp=4210 rpq=1 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
+ 6 np=120834 qsp=9902 rpq=2 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
+ 7 np=144888 qsp=26336 rpq=0 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
As always, this is once again split into "rcu_sched" and "rcu_bh"
portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional
@@ -284,6 +284,9 @@ o "np" is the number of times that __rcu_pending() has been invoked
o "qsp" is the number of times that the RCU was waiting for a
quiescent state from this CPU.
+o "rpq" is the number of times that the CPU had passed through
+ a quiescent state, but not yet reported it to RCU.
+
o "cbr" is the number of times that this CPU had RCU callbacks
that had passed through a grace period, and were thus ready
to be invoked.
diff --git a/Documentation/intel_txt.txt b/Documentation/intel_txt.txt
index f40a1f030019..87c8990dbbd9 100644
--- a/Documentation/intel_txt.txt
+++ b/Documentation/intel_txt.txt
@@ -161,13 +161,15 @@ o In order to put a system into any of the sleep states after a TXT
has been restored, it will restore the TPM PCRs and then
transfer control back to the kernel's S3 resume vector.
In order to preserve system integrity across S3, the kernel
- provides tboot with a set of memory ranges (kernel
- code/data/bss, S3 resume code, and AP trampoline) that tboot
- will calculate a MAC (message authentication code) over and then
- seal with the TPM. On resume and once the measured environment
- has been re-established, tboot will re-calculate the MAC and
- verify it against the sealed value. Tboot's policy determines
- what happens if the verification fails.
+ provides tboot with a set of memory ranges (RAM and RESERVED_KERN
+ in the e820 table, but not any memory that BIOS might alter over
+ the S3 transition) that tboot will calculate a MAC (message
+ authentication code) over and then seal with the TPM. On resume
+ and once the measured environment has been re-established, tboot
+ will re-calculate the MAC and verify it against the sealed value.
+ Tboot's policy determines what happens if the verification fails.
+ Note that the c/s 194 of tboot which has the new MAC code supports
+ this.
That's pretty much it for TXT support.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 839b21b0699a..567b7a8eb878 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -324,6 +324,8 @@ and is between 256 and 4096 characters. It is defined in the file
they are unmapped. Otherwise they are
flushed before they will be reused, which
is a lot of faster
+ off - do not initialize any AMD IOMMU found in
+ the system
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
@@ -784,8 +786,12 @@ and is between 256 and 4096 characters. It is defined in the file
as early as possible in order to facilitate early
boot debugging.
- ftrace_dump_on_oops
+ ftrace_dump_on_oops[=orig_cpu]
[FTRACE] will dump the trace buffers on oops.
+ If no parameter is passed, ftrace will dump
+ buffers of all CPUs, but if you pass orig_cpu, it will
+ dump only the buffer of the CPU that triggered the
+ oops.
ftrace_filter=[function-list]
[FTRACE] Limit the functions traced by the function
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 2f9115c0ae62..61c291cddf18 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -165,8 +165,8 @@ the user entry_handler invocation is also skipped.
1.4 How Does Jump Optimization Work?
-If you configured your kernel with CONFIG_OPTPROBES=y (currently
-this option is supported on x86/x86-64, non-preemptive kernel) and
+If your kernel is built with CONFIG_OPTPROBES=y (currently this flag
+is automatically set 'y' on x86/x86-64, non-preemptive kernel) and
the "debug.kprobes_optimization" kernel parameter is set to 1 (see
sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
instruction instead of a breakpoint instruction at each probepoint.
@@ -271,8 +271,6 @@ tweak the kernel's execution path, you need to suppress optimization,
using one of the following techniques:
- Specify an empty function for the kprobe's post_handler or break_handler.
or
-- Config CONFIG_OPTPROBES=n.
- or
- Execute 'sysctl -w debug.kprobes_optimization=n'
2. Architectures Supported
@@ -307,10 +305,6 @@ it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
so you can use "objdump -d -l vmlinux" to see the source-to-object
code mapping.
-If you want to reduce probing overhead, set "Kprobes jump optimization
-support" (CONFIG_OPTPROBES) to "y". You can find this option under the
-"Kprobes" line.
-
4. API Reference
The Kprobes API includes a "register" function and an "unregister"
diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index aae8355d3166..221f38be98f4 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -190,3 +190,61 @@ Example:
for (node = rb_first(&mytree); node; node = rb_next(node))
printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
+Support for Augmented rbtrees
+-----------------------------
+
+Augmented rbtree is an rbtree with "some" additional data stored in each node.
+This data can be used to augment some new functionality to rbtree.
+Augmented rbtree is an optional feature built on top of basic rbtree
+infrastructure. rbtree user who wants this feature will have an augment
+callback function in rb_root initialized.
+
+This callback function will be called from rbtree core routines whenever
+a node has a change in one or both of its children. It is the responsibility
+of the callback function to recalculate the additional data that is in the
+rb node using new children information. Note that if this new additional
+data affects the parent node's additional data, then callback function has
+to handle it and do the recursive updates.
+
+
+Interval tree is an example of augmented rb tree. Reference -
+"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
+More details about interval trees:
+
+Classical rbtree has a single key and it cannot be directly used to store
+interval ranges like [lo:hi] and do a quick lookup for any overlap with a new
+lo:hi or to find whether there is an exact match for a new lo:hi.
+
+However, rbtree can be augmented to store such interval ranges in a structured
+way making it possible to do efficient lookup and exact match.
+
+This "extra information" stored in each node is the maximum hi
+(max_hi) value among all the nodes that are its descendents. This
+information can be maintained at each node just be looking at the node
+and its immediate children. And this will be used in O(log n) lookup
+for lowest match (lowest start address among all possible matches)
+with something like:
+
+find_lowest_match(lo, hi, node)
+{
+ lowest_match = NULL;
+ while (node) {
+ if (max_hi(node->left) > lo) {
+ // Lowest overlap if any must be on left side
+ node = node->left;
+ } else if (overlap(lo, hi, node)) {
+ lowest_match = node;
+ break;
+ } else if (lo > node->lo) {
+ // Lowest overlap if any must be on right side
+ node = node->right;
+ } else {
+ break;
+ }
+ }
+ return lowest_match;
+}
+
+Finding exact match will be to first find lowest match and then to follow
+successor nodes looking for exact match, until the start of a node is beyond
+the hi value we are looking for.
diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt
index 6f33593e59e2..8239ebbcddce 100644
--- a/Documentation/scheduler/sched-design-CFS.txt
+++ b/Documentation/scheduler/sched-design-CFS.txt
@@ -211,7 +211,7 @@ provide fair CPU time to each such task group. For example, it may be
desirable to first provide fair CPU time to each user on the system and then to
each task belonging to a user.
-CONFIG_GROUP_SCHED strives to achieve exactly that. It lets tasks to be
+CONFIG_CGROUP_SCHED strives to achieve exactly that. It lets tasks to be
grouped and divides CPU time fairly among such groups.
CONFIG_RT_GROUP_SCHED permits to group real-time (i.e., SCHED_FIFO and
@@ -220,38 +220,11 @@ SCHED_RR) tasks.
CONFIG_FAIR_GROUP_SCHED permits to group CFS (i.e., SCHED_NORMAL and
SCHED_BATCH) tasks.
-At present, there are two (mutually exclusive) mechanisms to group tasks for
-CPU bandwidth control purposes:
-
- - Based on user id (CONFIG_USER_SCHED)
-
- With this option, tasks are grouped according to their user id.
-
- - Based on "cgroup" pseudo filesystem (CONFIG_CGROUP_SCHED)
-
- This options needs CONFIG_CGROUPS to be defined, and lets the administrator
+ These options need CONFIG_CGROUPS to be defined, and let the administrator
create arbitrary groups of tasks, using the "cgroup" pseudo filesystem. See
Documentation/cgroups/cgroups.txt for more information about this filesystem.
-Only one of these options to group tasks can be chosen and not both.
-
-When CONFIG_USER_SCHED is defined, a directory is created in sysfs for each new
-user and a "cpu_share" file is added in that directory.
-
- # cd /sys/kernel/uids
- # cat 512/cpu_share # Display user 512's CPU share
- 1024
- # echo 2048 > 512/cpu_share # Modify user 512's CPU share
- # cat 512/cpu_share # Display user 512's CPU share
- 2048
- #
-
-CPU bandwidth between two users is divided in the ratio of their CPU shares.
-For example: if you would like user "root" to get twice the bandwidth of user
-"guest," then set the cpu_share for both the users such that "root"'s cpu_share
-is twice "guest"'s cpu_share.
-
-When CONFIG_CGROUP_SCHED is defined, a "cpu.shares" file is created for each
+When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each
group created using the pseudo filesystem. See example steps below to create
task groups and modify their CPU share using the "cgroups" pseudo filesystem.
@@ -273,24 +246,3 @@ task groups and modify their CPU share using the "cgroups" pseudo filesystem.
# #Launch gmplayer (or your favourite movie player)
# echo <movie_player_pid> > multimedia/tasks
-
-8. Implementation note: user namespaces
-
-User namespaces are intended to be hierarchical. But they are currently
-only partially implemented. Each of those has ramifications for CFS.
-
-First, since user namespaces are hierarchical, the /sys/kernel/uids
-presentation is inadequate. Eventually we will likely want to use sysfs
-tagging to provide private views of /sys/kernel/uids within each user
-namespace.
-
-Second, the hierarchical nature is intended to support completely
-unprivileged use of user namespaces. So if using user groups, then
-we want the users in a user namespace to be children of the user
-who created it.
-
-That is currently unimplemented. So instead, every user in a new
-user namespace will receive 1024 shares just like any user in the
-initial user namespace. Note that at the moment creation of a new
-user namespace requires each of CAP_SYS_ADMIN, CAP_SETUID, and
-CAP_SETGID.
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 86eabe6c3419..605b0d40329d 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -126,23 +126,12 @@ priority!
2.3 Basis for grouping tasks
----------------------------
-There are two compile-time settings for allocating CPU bandwidth. These are
-configured using the "Basis for grouping tasks" multiple choice menu under
-General setup > Group CPU Scheduler:
-
-a. CONFIG_USER_SCHED (aka "Basis for grouping tasks" = "user id")
-
-This lets you use the virtual files under
-"/sys/kernel/uids/<uid>/cpu_rt_runtime_us" to control he CPU time reserved for
-each user .
-
-The other option is:
-
-.o CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups")
+Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
+CPU bandwidth to task groups.
This uses the /cgroup virtual file system and
"/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each
-control group instead.
+control group.
For more information on working with control groups, you should read
Documentation/cgroups/cgroups.txt as well.
@@ -161,8 +150,7 @@ For now, this can be simplified to just the following (but see Future plans):
===============
There is work in progress to make the scheduling period for each group
-("/sys/kernel/uids/<uid>/cpu_rt_period_us" or
-"/cgroup/<cgroup>/cpu.rt_period_us" respectively) configurable as well.
+("/cgroup/<cgroup>/cpu.rt_period_us") configurable as well.
The constraint on the period is that a subgroup must have a smaller or
equal period to its parent. But realistically its not very useful _yet_
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 02ac6ed38b2d..778ddf38b82c 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -90,7 +90,8 @@ In order to facilitate early boot debugging, use boot option:
trace_event=[event-list]
-The format of this boot option is the same as described in section 2.1.
+event-list is a comma separated list of events. See section 2.1 for event
+format.
3. Defining an event-enabled tracepoint
=======================================
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 03485bfbd797..557c1edeccaf 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -155,6 +155,9 @@ of ftrace. Here is a list of some of the key files:
to be traced. Echoing names of functions into this file
will limit the trace to only those functions.
+ This interface also allows for commands to be used. See the
+ "Filter commands" section for more details.
+
set_ftrace_notrace:
This has an effect opposite to that of
@@ -1337,12 +1340,14 @@ ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one
can either use the sysctl function or set it via the proc system
interface.
- sysctl kernel.ftrace_dump_on_oops=1
+ sysctl kernel.ftrace_dump_on_oops=n
or
- echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
+ echo n > /proc/sys/kernel/ftrace_dump_on_oops
+If n = 1, ftrace will dump buffers of all CPUs, if n = 2 ftrace will
+only dump the buffer of the CPU that triggered the oops.
Here's an example of such a dump after a null pointer
dereference in a kernel module:
@@ -1822,6 +1827,47 @@ this special filter via:
echo > set_graph_function
+Filter commands
+---------------
+
+A few commands are supported by the set_ftrace_filter interface.
+Trace commands have the following format:
+
+<function>:<command>:<parameter>
+
+The following commands are supported:
+
+- mod
+ This command enables function filtering per module. The
+ parameter defines the module. For example, if only the write*
+ functions in the ext3 module are desired, run:
+
+ echo 'write*:mod:ext3' > set_ftrace_filter
+
+ This command interacts with the filter in the same way as
+ filtering based on function names. Thus, adding more functions
+ in a different module is accomplished by appending (>>) to the
+ filter file. Remove specific module functions by prepending
+ '!':
+
+ echo '!writeback*:mod:ext3' >> set_ftrace_filter
+
+- traceon/traceoff
+ These commands turn tracing on and off when the specified
+ functions are hit. The parameter determines how many times the
+ tracing system is turned on and off. If unspecified, there is
+ no limit. For example, to disable tracing when a schedule bug
+ is hit the first 5 times, run:
+
+ echo '__schedule_bug:traceoff:5' > set_ftrace_filter
+
+ These commands are cumulative whether or not they are appended
+ to set_ftrace_filter. To remove a command, prepend it by '!'
+ and drop the parameter:
+
+ echo '!__schedule_bug:traceoff' > set_ftrace_filter
+
+
trace_pipe
----------
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index a9100b28eb84..ec94748ae65b 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -40,7 +40,9 @@ Synopsis of kprobe_events
$stack : Fetch stack address.
$retval : Fetch return value.(*)
+|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
- NAME=FETCHARG: Set NAME as the argument name of FETCHARG.
+ NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
+ FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
+ (u8/u16/u32/u64/s8/s16/s32/s64) are supported.
(*) only for return probe.
(**) this is useful for fetching a field of data structures.