aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/Documentation/perf-record.txt
diff options
context:
space:
mode:
Diffstat (limited to 'tools/perf/Documentation/perf-record.txt')
-rw-r--r--tools/perf/Documentation/perf-record.txt80
1 files changed, 76 insertions, 4 deletions
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 3cf7bac67239..e41ae950fdc3 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -30,8 +30,10 @@ OPTIONS
- a symbolic event name (use 'perf list' to list all events)
- - a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
- hexadecimal event descriptor.
+ - a raw PMU event in the form of rN where N is a hexadecimal value
+ that represents the raw register encoding with the layout of the
+ event control registers as described by entries in
+ /sys/bus/event_source/devices/cpu/format/*.
- a symbolic or raw PMU event followed by an optional colon
and a list of event modifiers, e.g., cpu-cycles:p. See the
@@ -273,6 +275,11 @@ OPTIONS
User can change the size by passing the size after comma like
"--call-graph dwarf,4096".
+ When "fp" recording is used, perf tries to save stack enties
+ up to the number specified in sysctl.kernel.perf_event_max_stack
+ by default. User can change the number by passing it after comma
+ like "--call-graph fp,32".
+
-q::
--quiet::
Don't print any message, useful for scripting.
@@ -311,6 +318,11 @@ OPTIONS
--sample-cpu::
Record the sample cpu.
+--sample-identifier::
+ Record the sample identifier i.e. PERF_SAMPLE_IDENTIFIER bit set in
+ the sample_type member of the struct perf_event_attr argument to the
+ perf_event_open system call.
+
-n::
--no-samples::
Don't sample.
@@ -385,6 +397,10 @@ following filters are defined:
- abort_tx: only when the target is a hardware transaction abort
- cond: conditional branches
- save_type: save branch type during sampling in case binary is not available later
+ For the platforms with Intel Arch LBR support (12th-Gen+ client or
+ 4th-Gen Xeon+ server), the save branch type is unconditionally enabled
+ when the taken branch stack sampling is enabled.
+ - priv: save privilege state during sampling in case binary is not available later
+
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
@@ -395,6 +411,7 @@ is enabled for all the sampling events. The sampled branch type is the same for
The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
Note that this feature may not be available on all processors.
+-W::
--weight::
Enable weightened sampling. An additional weight is recorded per sample and can be
displayed with the weight and local_weight sort keys. This currently works for TSX
@@ -418,8 +435,10 @@ if combined with -a or -C options.
-D::
--delay=::
After starting the program, wait msecs before measuring (-1: start with events
-disabled). This is useful to filter out the startup phase of the program, which
-is often very different.
+disabled), or enable events only for specified ranges of msecs (e.g.
+-D 10-20,30-40 means wait 10 msecs, enable for 10 msecs, wait 10 msecs, enable
+for 10 msecs, then stop). Note, delaying enabling of events is useful to filter
+out the startup phase of the program, which is often very different.
-I::
--intr-regs::
@@ -711,6 +730,59 @@ measurements:
wait -n ${perf_pid}
exit $?
+--threads=<spec>::
+Write collected trace data into several data files using parallel threads.
+<spec> value can be user defined list of masks. Masks separated by colon
+define CPUs to be monitored by a thread and affinity mask of that thread
+is separated by slash:
+
+ <cpus mask 1>/<affinity mask 1>:<cpus mask 2>/<affinity mask 2>:...
+
+CPUs or affinity masks must not overlap with other corresponding masks.
+Invalid CPUs are ignored, but masks containing only invalid CPUs are not
+allowed.
+
+For example user specification like the following:
+
+ 0,2-4/2-4:1,5-7/5-7
+
+specifies parallel threads layout that consists of two threads,
+the first thread monitors CPUs 0 and 2-4 with the affinity mask 2-4,
+the second monitors CPUs 1 and 5-7 with the affinity mask 5-7.
+
+<spec> value can also be a string meaning predefined parallel threads
+layout:
+
+ cpu - create new data streaming thread for every monitored cpu
+ core - create new thread to monitor CPUs grouped by a core
+ package - create new thread to monitor CPUs grouped by a package
+ numa - create new threed to monitor CPUs grouped by a NUMA domain
+
+Predefined layouts can be used on systems with large number of CPUs in
+order not to spawn multiple per-cpu streaming threads but still avoid LOST
+events in data directory files. Option specified with no or empty value
+defaults to CPU layout. Masks defined or provided by the option value are
+filtered through the mask provided by -C option.
+
+--debuginfod[=URLs]::
+ Specify debuginfod URL to be used when cacheing perf.data binaries,
+ it follows the same syntax as the DEBUGINFOD_URLS variable, like:
+
+ http://192.168.122.174:8002
+
+ If the URLs is not specified, the value of DEBUGINFOD_URLS
+ system environment variable is used.
+
+--off-cpu::
+ Enable off-cpu profiling with BPF. The BPF program will collect
+ task scheduling information with (user) stacktrace and save them
+ as sample data of a software event named "offcpu-time". The
+ sample period will have the time the task slept in nanoseconds.
+
+ Note that BPF can collect stack traces using frame pointer ("fp")
+ only, as of now. So the applications built without the frame
+ pointer might see bogus addresses.
+
include::intel-hybrid.txt[]
SEE ALSO