aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation/trace/events.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/trace/events.rst')
-rw-r--r--Documentation/trace/events.rst154
1 files changed, 105 insertions, 49 deletions
diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index f792b1959a33..759907c20e75 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -24,27 +24,27 @@ tracing information should be printed.
---------------------------------
The events which are available for tracing can be found in the file
-/sys/kernel/debug/tracing/available_events.
+/sys/kernel/tracing/available_events.
To enable a particular event, such as 'sched_wakeup', simply echo it
-to /sys/kernel/debug/tracing/set_event. For example::
+to /sys/kernel/tracing/set_event. For example::
- # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
+ # echo sched_wakeup >> /sys/kernel/tracing/set_event
.. Note:: '>>' is necessary, otherwise it will firstly disable all the events.
To disable an event, echo the event name to the set_event file prefixed
with an exclamation point::
- # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
+ # echo '!sched_wakeup' >> /sys/kernel/tracing/set_event
To disable all events, echo an empty line to the set_event file::
- # echo > /sys/kernel/debug/tracing/set_event
+ # echo > /sys/kernel/tracing/set_event
To enable all events, echo ``*:*`` or ``*:`` to the set_event file::
- # echo *:* > /sys/kernel/debug/tracing/set_event
+ # echo *:* > /sys/kernel/tracing/set_event
The events are organized into subsystems, such as ext4, irq, sched,
etc., and a full event name looks like this: <subsystem>:<event>. The
@@ -53,29 +53,29 @@ file. All of the events in a subsystem can be specified via the syntax
``<subsystem>:*``; for example, to enable all irq events, you can use the
command::
- # echo 'irq:*' > /sys/kernel/debug/tracing/set_event
+ # echo 'irq:*' > /sys/kernel/tracing/set_event
2.2 Via the 'enable' toggle
---------------------------
-The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
+The events available are also listed in /sys/kernel/tracing/events/ hierarchy
of directories.
To enable event 'sched_wakeup'::
- # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
+ # echo 1 > /sys/kernel/tracing/events/sched/sched_wakeup/enable
To disable it::
- # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
+ # echo 0 > /sys/kernel/tracing/events/sched/sched_wakeup/enable
To enable all events in sched subsystem::
- # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
+ # echo 1 > /sys/kernel/tracing/events/sched/enable
To enable all events::
- # echo 1 > /sys/kernel/debug/tracing/events/enable
+ # echo 1 > /sys/kernel/tracing/events/enable
When reading one of these enable files, there are four results:
@@ -126,7 +126,7 @@ is the size of the data item, in bytes.
For example, here's the information displayed for the 'sched_wakeup'
event::
- # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
+ # cat /sys/kernel/tracing/events/sched/sched_wakeup/format
name: sched_wakeup
ID: 60
@@ -198,6 +198,41 @@ The glob (~) accepts a wild card character (\*,?) and character classes
prev_comm ~ "*sh*"
prev_comm ~ "ba*sh"
+If the field is a pointer that points into user space (for example
+"filename" from sys_enter_openat), then you have to append ".ustring" to the
+field name::
+
+ filename.ustring ~ "password"
+
+As the kernel will have to know how to retrieve the memory that the pointer
+is at from user space.
+
+You can convert any long type to a function address and search by function name::
+
+ call_site.function == security_prepare_creds
+
+The above will filter when the field "call_site" falls on the address within
+"security_prepare_creds". That is, it will compare the value of "call_site" and
+the filter will return true if it is greater than or equal to the start of
+the function "security_prepare_creds" and less than the end of that function.
+
+The ".function" postfix can only be attached to values of size long, and can only
+be compared with "==" or "!=".
+
+Cpumask fields or scalar fields that encode a CPU number can be filtered using
+a user-provided cpumask in cpulist format. The format is as follows::
+
+ CPUS{$cpulist}
+
+Operators available to cpumask filtering are:
+
+& (intersection), ==, !=
+
+For example, this will filter events that have their .target_cpu field present
+in the given cpumask::
+
+ target_cpu & CPUS{17-42}
+
5.2 Setting filters
-------------------
@@ -206,19 +241,19 @@ to the 'filter' file for the given event.
For example::
- # cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
+ # cd /sys/kernel/tracing/events/sched/sched_wakeup
# echo "common_preempt_count > 4" > filter
A slightly more involved example::
- # cd /sys/kernel/debug/tracing/events/signal/signal_generate
+ # cd /sys/kernel/tracing/events/signal/signal_generate
# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
If there is an error in the expression, you'll get an 'Invalid
argument' error when setting it, and the erroneous string along with
an error message can be seen by looking at the filter e.g.::
- # cd /sys/kernel/debug/tracing/events/signal/signal_generate
+ # cd /sys/kernel/tracing/events/signal/signal_generate
# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
-bash: echo: write error: Invalid argument
# cat filter
@@ -230,6 +265,16 @@ Currently the caret ('^') for an error always appears at the beginning of
the filter string; the error message should still be useful though
even without more accurate position info.
+5.2.1 Filter limitations
+------------------------
+
+If a filter is placed on a string pointer ``(char *)`` that does not point
+to a string on the ring buffer, but instead points to kernel or user space
+memory, then, for safety reasons, at most 1024 bytes of the content is
+copied onto a temporary buffer to do the compare. If the copy of the memory
+faults (the pointer points to memory that should not be accessed), then the
+string compare will be treated as not matching.
+
5.3 Clearing filters
--------------------
@@ -239,7 +284,7 @@ file.
To clear the filters for all events in a subsystem, write a '0' to the
subsystem's filter file.
-5.3 Subsystem filters
+5.4 Subsystem filters
---------------------
For convenience, filters for every event in a subsystem can be set or
@@ -258,7 +303,7 @@ above points:
Clear the filters on all events in the sched subsystem::
- # cd /sys/kernel/debug/tracing/events/sched
+ # cd /sys/kernel/tracing/events/sched
# echo 0 > filter
# cat sched_switch/filter
none
@@ -268,7 +313,7 @@ Clear the filters on all events in the sched subsystem::
Set a filter using only common fields for all events in the sched
subsystem (all events end up with the same filter)::
- # cd /sys/kernel/debug/tracing/events/sched
+ # cd /sys/kernel/tracing/events/sched
# echo common_pid == 0 > filter
# cat sched_switch/filter
common_pid == 0
@@ -279,14 +324,14 @@ Attempt to set a filter using a non-common field for all events in the
sched subsystem (all events but those that have a prev_pid field retain
their old filters)::
- # cd /sys/kernel/debug/tracing/events/sched
+ # cd /sys/kernel/tracing/events/sched
# echo prev_pid == 0 > filter
# cat sched_switch/filter
prev_pid == 0
# cat sched_wakeup/filter
common_pid == 0
-5.4 PID filtering
+5.5 PID filtering
-----------------
The set_event_pid file in the same directory as the top events directory
@@ -294,7 +339,7 @@ exists, will filter all events from tracing any task that does not have the
PID listed in the set_event_pid file.
::
- # cd /sys/kernel/debug/tracing
+ # cd /sys/kernel/tracing
# echo $$ > set_event_pid
# echo 1 > events/enable
@@ -390,14 +435,14 @@ The following commands are supported:
specifies that this enablement happens only once::
# echo 'enable_event:kmem:kmalloc:1' > \
- /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+ /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger
The following trigger causes kmalloc events to stop being traced
when a read system call exits. This disablement happens on every
read system call exit::
# echo 'disable_event:kmem:kmalloc' > \
- /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+ /sys/kernel/tracing/events/syscalls/sys_exit_read/trigger
The format is::
@@ -407,10 +452,10 @@ The following commands are supported:
To remove the above commands::
# echo '!enable_event:kmem:kmalloc:1' > \
- /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+ /sys/kernel/tracing/events/syscalls/sys_enter_read/trigger
# echo '!disable_event:kmem:kmalloc' > \
- /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+ /sys/kernel/tracing/events/syscalls/sys_exit_read/trigger
Note that there can be any number of enable/disable_event triggers
per triggering event, but there can only be one trigger per
@@ -429,13 +474,13 @@ The following commands are supported:
kmalloc tracepoint is hit::
# echo 'stacktrace' > \
- /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+ /sys/kernel/tracing/events/kmem/kmalloc/trigger
The following trigger dumps a stacktrace the first 5 times a kmalloc
request happens with a size >= 64K::
# echo 'stacktrace:5 if bytes_req >= 65536' > \
- /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+ /sys/kernel/tracing/events/kmem/kmalloc/trigger
The format is::
@@ -444,16 +489,16 @@ The following commands are supported:
To remove the above commands::
# echo '!stacktrace' > \
- /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+ /sys/kernel/tracing/events/kmem/kmalloc/trigger
# echo '!stacktrace:5 if bytes_req >= 65536' > \
- /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+ /sys/kernel/tracing/events/kmem/kmalloc/trigger
The latter can also be removed more simply by the following (without
the filter)::
# echo '!stacktrace:5' > \
- /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+ /sys/kernel/tracing/events/kmem/kmalloc/trigger
Note that there can be only one stacktrace trigger per triggering
event.
@@ -469,20 +514,20 @@ The following commands are supported:
capture those events when the trigger event occurred::
# echo 'snapshot if nr_rq > 1' > \
- /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+ /sys/kernel/tracing/events/block/block_unplug/trigger
To only snapshot once::
# echo 'snapshot:1 if nr_rq > 1' > \
- /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+ /sys/kernel/tracing/events/block/block_unplug/trigger
To remove the above commands::
# echo '!snapshot if nr_rq > 1' > \
- /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+ /sys/kernel/tracing/events/block/block_unplug/trigger
# echo '!snapshot:1 if nr_rq > 1' > \
- /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+ /sys/kernel/tracing/events/block/block_unplug/trigger
Note that there can be only one snapshot trigger per triggering
event.
@@ -500,20 +545,20 @@ The following commands are supported:
trigger event::
# echo 'traceoff:1 if nr_rq > 1' > \
- /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+ /sys/kernel/tracing/events/block/block_unplug/trigger
To always disable tracing when nr_rq > 1::
# echo 'traceoff if nr_rq > 1' > \
- /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+ /sys/kernel/tracing/events/block/block_unplug/trigger
To remove the above commands::
# echo '!traceoff:1 if nr_rq > 1' > \
- /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+ /sys/kernel/tracing/events/block/block_unplug/trigger
# echo '!traceoff if nr_rq > 1' > \
- /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+ /sys/kernel/tracing/events/block/block_unplug/trigger
Note that there can be only one traceon or traceoff trigger per
triggering event.
@@ -589,8 +634,19 @@ name::
{ .type = "int", .name = "my_int_field" },
};
-See synth_field_size() for available types. If field_name contains [n]
-the field is considered to be an array.
+See synth_field_size() for available types.
+
+If field_name contains [n], the field is considered to be a static array.
+
+If field_names contains[] (no subscript), the field is considered to
+be a dynamic array, which will only take as much space in the event as
+is required to hold the array.
+
+Because space for an event is reserved before assigning field values
+to the event, using dynamic arrays implies that the piecewise
+in-kernel API described below can't be used with dynamic arrays. The
+other non-piecewise in-kernel APIs can, however, be used with dynamic
+arrays.
If the event is created from within a module, a pointer to the module
must be passed to synth_event_create(). This will ensure that the
@@ -787,13 +843,13 @@ To trace a synthetic using the piecewise method described above, the
synth_event_trace_start() function is used to 'open' the synthetic
event trace::
- struct synth_trace_state trace_state;
+ struct synth_event_trace_state trace_state;
ret = synth_event_trace_start(schedtest_event_file, &trace_state);
It's passed the trace_event_file representing the synthetic event
using the same methods as described above, along with a pointer to a
-struct synth_trace_state object, which will be zeroed before use and
+struct synth_event_trace_state object, which will be zeroed before use and
used to maintain state between this and following calls.
Once the event has been opened, which means space for it has been
@@ -805,7 +861,7 @@ lookup per field.
To assign the values one after the other without lookups,
synth_event_add_next_val() should be used. Each call is passed the
-same synth_trace_state object used in the synth_event_trace_start(),
+same synth_event_trace_state object used in the synth_event_trace_start(),
along with the value to set the next field in the event. After each
field is set, the 'cursor' points to the next field, which will be set
by the subsequent call, continuing until all the fields have been set
@@ -834,7 +890,7 @@ this method would be (without error-handling code)::
ret = synth_event_add_next_val(395, &trace_state);
To assign the values in any order, synth_event_add_val() should be
-used. Each call is passed the same synth_trace_state object used in
+used. Each call is passed the same synth_event_trace_state object used in
the synth_event_trace_start(), along with the field name of the field
to set and the value to set it to. The same sequence of calls as in
the above examples using this method would be (without error-handling
@@ -856,7 +912,7 @@ can be used but not both at the same time.
Finally, the event won't be actually traced until it's 'closed',
which is done using synth_event_trace_end(), which takes only the
-struct synth_trace_state object used in the previous calls::
+struct synth_event_trace_state object used in the previous calls::
ret = synth_event_trace_end(&trace_state);
@@ -873,7 +929,7 @@ functions can be used.
To create a kprobe event, an empty or partially empty kprobe event
should first be created using kprobe_event_gen_cmd_start(). The name
-of the event and the probe location should be specfied along with one
+of the event and the probe location should be specified along with one
or args each representing a probe field should be supplied to this
function. Before calling kprobe_event_gen_cmd_start(), the user
should create and initialize a dynevent_cmd object using
@@ -953,7 +1009,7 @@ The basic idea is simple and amounts to providing a general-purpose
layer that can be used to generate trace event commands. The
generated command strings can then be passed to the command-parsing
and event creation code that already exists in the trace event
-subystem for creating the corresponding trace events.
+subsystem for creating the corresponding trace events.
In a nutshell, the way it works is that the higher-level interface
code creates a struct dynevent_cmd object, then uses a couple
@@ -1026,7 +1082,7 @@ to add an operator between the pair (here none) and a separator to be
appended onto the end of the arg pair (here ';').
There's also a dynevent_str_add() function that can be used to simply
-add a string as-is, with no spaces, delimeters, or arg check.
+add a string as-is, with no spaces, delimiters, or arg check.
Any number of dynevent_*_add() calls can be made to build up the string
(until its length surpasses cmd->maxlen). When all the arguments have