diff options
Diffstat (limited to 'tools/perf/Documentation')
-rw-r--r-- | tools/perf/Documentation/Makefile | 4 | ||||
-rw-r--r-- | tools/perf/Documentation/itrace.txt | 12 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-bench.txt | 8 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-c2c.txt | 13 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-config.txt | 5 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-intel-pt.txt | 55 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-list.txt | 8 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-record.txt | 32 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-report.txt | 11 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-script.txt | 11 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-stat.txt | 35 | ||||
-rw-r--r-- | tools/perf/Documentation/perf-top.txt | 20 | ||||
-rw-r--r-- | tools/perf/Documentation/perf.data-file-format.txt | 16 | ||||
-rw-r--r-- | tools/perf/Documentation/security.txt | 237 |
14 files changed, 451 insertions, 16 deletions
diff --git a/tools/perf/Documentation/Makefile b/tools/perf/Documentation/Makefile index 31824d5269cc..6e54979c2124 100644 --- a/tools/perf/Documentation/Makefile +++ b/tools/perf/Documentation/Makefile @@ -48,7 +48,7 @@ man5dir=$(mandir)/man5 man7dir=$(mandir)/man7 ASCIIDOC=asciidoc -ASCIIDOC_EXTRA = --unsafe -f asciidoc.conf +ASCIIDOC_EXTRA += --unsafe -f asciidoc.conf ASCIIDOC_HTML = xhtml11 MANPAGE_XSL = manpage-normal.xsl XMLTO_EXTRA = @@ -59,7 +59,7 @@ HTML_REF = origin/html ifdef USE_ASCIIDOCTOR ASCIIDOC = asciidoctor -ASCIIDOC_EXTRA = -a compat-mode +ASCIIDOC_EXTRA += -a compat-mode ASCIIDOC_EXTRA += -I. -rasciidoctor-extensions ASCIIDOC_EXTRA += -a mansource="perf" -a manmanual="perf Manual" ASCIIDOC_HTML = xhtml5 diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt index 82ff7dad40c2..e817179c5027 100644 --- a/tools/perf/Documentation/itrace.txt +++ b/tools/perf/Documentation/itrace.txt @@ -1,5 +1,5 @@ i synthesize instructions events - b synthesize branches events + b synthesize branches events (branch misses for Arm SPE) c synthesize branches events (calls only) r synthesize branches events (returns only) x synthesize transactions events @@ -9,8 +9,14 @@ of aux-output (refer to perf record) e synthesize error events d create a debug log + f synthesize first level cache events + m synthesize last level cache events + t synthesize TLB events + a synthesize remote access events g synthesize a call chain (use with i or x) + G synthesize a call chain on existing event records l synthesize last branch entries (use with i or x) + L synthesize last branch entries on existing event records s skip initial number of events The default is all events i.e. the same as --itrace=ibxwpe, @@ -31,6 +37,10 @@ Also the number of last branch entries (default 64, max. 1024) for instructions or transactions events can be specified. + Similar to options g and l, size may also be specified for options G and L. + On x86, note that G and L work poorly when data has been recorded with + large PEBS. Refer linkperf:perf-intel-pt[1] man page for details. + It is also possible to skip events generated (instructions, branches, transactions, ptwrite, power) at the beginning. This is useful to ignore initialization code. diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt index 0921a3c67381..bad16512c48d 100644 --- a/tools/perf/Documentation/perf-bench.txt +++ b/tools/perf/Documentation/perf-bench.txt @@ -61,6 +61,9 @@ SUBSYSTEM 'epoll':: Eventpoll (epoll) stressing benchmarks. +'internals':: + Benchmark internal perf functionality. + 'all':: All benchmark subsystems. @@ -214,6 +217,11 @@ Suite for evaluating concurrent epoll_wait calls. *ctl*:: Suite for evaluating multiple epoll_ctl calls. +SUITES FOR 'internals' +~~~~~~~~~~~~~~~~~~~~~~ +*synthesize*:: +Suite for evaluating perf's event synthesis performance. + SEE ALSO -------- linkperf:perf[1] diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt index e6150f21267d..98efdab5fbd4 100644 --- a/tools/perf/Documentation/perf-c2c.txt +++ b/tools/perf/Documentation/perf-c2c.txt @@ -40,7 +40,7 @@ RECORD OPTIONS -------------- -e:: --event=:: - Select the PMU event. Use 'perf mem record -e list' + Select the PMU event. Use 'perf c2c record -e list' to list available events. -v:: @@ -111,6 +111,17 @@ REPORT OPTIONS --display:: Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default. +--stitch-lbr:: + Show callgraph with stitched LBRs, which may have more complete + callgraph. The perf.data file must have been obtained using + perf c2c record --call-graph lbr. + Disabled by default. In common cases with call stack overflows, + it can recreate better call stacks than the default lbr call stack + output. But this approach is not full proof. There can be cases + where it creates incorrect call stacks from incorrect matches. + The known limitations include exception handing such as + setjmp/longjmp will have calls/returns not match. + C2C RECORD ---------- The perf c2c record command setup options related to HITM cacheline analysis diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt index f16d8a71d3f5..c7d3df5798e2 100644 --- a/tools/perf/Documentation/perf-config.txt +++ b/tools/perf/Documentation/perf-config.txt @@ -667,6 +667,11 @@ convert.*:: Limit the size of ordered_events queue, so we could control allocation size of perf data files without proper finished round events. +stat.*:: + + stat.big-num:: + (boolean) Change the default for "--big-num". To make + "--no-big-num" the default, set "stat.big-num=false". intel-pt.*:: diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt index 456fdcbf26ac..f4cd49a7fcdb 100644 --- a/tools/perf/Documentation/perf-intel-pt.txt +++ b/tools/perf/Documentation/perf-intel-pt.txt @@ -69,22 +69,22 @@ And profiled with 'perf report' e.g. To also trace kernel space presents a problem, namely kernel self-modifying code. A fairly good kernel image is available in /proc/kcore but to get an accurate image a copy of /proc/kcore needs to be made under the same conditions -as the data capture. A script perf-with-kcore can do that, but beware that the -script makes use of 'sudo' to copy /proc/kcore. If you have perf installed -locally from the source tree you can do: +as the data capture. 'perf record' can make a copy of /proc/kcore if the option +--kcore is used, but access to /proc/kcore is restricted e.g. - ~/libexec/perf-core/perf-with-kcore record pt_ls -e intel_pt// -- ls + sudo perf record -o pt_ls --kcore -e intel_pt// -- ls -which will create a directory named 'pt_ls' and put the perf.data file and -copies of /proc/kcore, /proc/kallsyms and /proc/modules into it. Then to use -'perf report' becomes: +which will create a directory named 'pt_ls' and put the perf.data file (named +simply 'data') and copies of /proc/kcore, /proc/kallsyms and /proc/modules into +it. The other tools understand the directory format, so to use 'perf report' +becomes: - ~/libexec/perf-core/perf-with-kcore report pt_ls + sudo perf report -i pt_ls Because samples are synthesized after-the-fact, the sampling period can be selected for reporting. e.g. sample every microsecond - ~/libexec/perf-core/perf-with-kcore report pt_ls --itrace=i1usge + sudo perf report pt_ls --itrace=i1usge See the sections below for more information about the --itrace option. @@ -687,7 +687,7 @@ The v4.2 kernel introduced support for a context switch metadata event, PERF_RECORD_SWITCH, which allows unprivileged users to see when their processes are scheduled out and in, just not by whom, which is left for the PERF_RECORD_SWITCH_CPU_WIDE, that is only accessible in system wide context, -which in turn requires CAP_SYS_ADMIN. +which in turn requires CAP_PERFMON or CAP_SYS_ADMIN. Please see the 45ac1403f564 ("perf: Add PERF_RECORD_SWITCH to indicate context switches") commit, that introduces these metadata events for further info. @@ -821,7 +821,9 @@ The letters are: e synthesize tracing error events d create a debug log g synthesize a call chain (use with i or x) + G synthesize a call chain on existing event records l synthesize last branch entries (use with i or x) + L synthesize last branch entries on existing event records s skip initial number of events "Instructions" events look like they were recorded by "perf record -e @@ -912,6 +914,39 @@ transactions events can be specified. e.g. Note that last branch entries are cleared for each sample, so there is no overlap from one sample to the next. +The G and L options are designed in particular for sample mode, and work much +like g and l but add call chain and branch stack to the other selected events +instead of synthesized events. For example, to record branch-misses events for +'ls' and then add a call chain derived from the Intel PT trace: + + perf record --aux-sample -e '{intel_pt//u,branch-misses:u}' -- ls + perf report --itrace=Ge + +Although in fact G is a default for perf report, so that is the same as just: + + perf report + +One caveat with the G and L options is that they work poorly with "Large PEBS". +Large PEBS means PEBS records will be accumulated by hardware and the written +into the event buffer in one go. That reduces interrupts, but can give very +late timestamps. Because the Intel PT trace is synchronized by timestamps, +the PEBS events do not match the trace. Currently, Large PEBS is used only in +certain circumstances: + - hardware supports it + - PEBS is used + - event period is specified, instead of frequency + - the sample type is limited to the following flags: + PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_ADDR | + PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_STREAM_ID | + PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_IDENTIFIER | + PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | + PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER | + PERF_SAMPLE_PERIOD (and sometimes) | PERF_SAMPLE_TIME +Because Intel PT sample mode uses a different sample type to the list above, +Large PEBS is not used with Intel PT sample mode. To avoid Large PEBS in other +cases, avoid specifying the event period i.e. avoid the 'perf record' -c option, +--count option, or 'period' config term. + To disable trace decoding entirely, use the option --no-itrace. It is also possible to skip events generated (instructions, branches, transactions) diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 6345db33c533..376a50b3452d 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -115,6 +115,11 @@ raw encoding of 0x1A8 can be used: perf stat -e r1a8 -a sleep 1 perf record -e r1a8 ... +It's also possible to use pmu syntax: + + perf record -e r1a8 -a sleep 1 + perf record -e cpu/r1a8/ ... + You should refer to the processor specific documentation for getting these details. Some of them are referenced in the SEE ALSO section below. @@ -258,6 +263,9 @@ Normally all events in an event group sample, but with :S only the first event (the leader) samples, and it only reads the values of the other events in the group. +However, in the case AUX area events (e.g. Intel PT or CoreSight), the AUX +area event must be the leader, so then the second event samples, not the first. + OPTIONS ------- diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index b3f3b3f1c161..fa8a5fcd27ab 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -458,7 +458,9 @@ This option sets the time out limit. The default value is 500 ms. --switch-events:: Record context switch events i.e. events of type PERF_RECORD_SWITCH or -PERF_RECORD_SWITCH_CPU_WIDE. +PERF_RECORD_SWITCH_CPU_WIDE. In some cases (e.g. Intel PT or CoreSight) +switch events will be enabled automatically, which can be suppressed by +by the option --no-switch-events. --clang-path=PATH:: Path to clang binary to use for compiling BPF scriptlets. @@ -556,6 +558,19 @@ overhead. You can still switch them on with: --switch-output --no-no-buildid --no-no-buildid-cache +--switch-output-event:: +Events that will cause the switch of the perf.data file, auto-selecting +--switch-output=signal, the results are similar as internally the side band +thread will also send a SIGUSR2 to the main one. + +Uses the same syntax as --event, it will just not be recorded, serving only to +switch the perf.data file as soon as the --switch-output event is processed by +a separate sideband thread. + +This sideband thread is also used to other purposes, like processing the +PERF_RECORD_BPF_EVENT records as they happen, asking the kernel for extra BPF +information, etc. + --switch-max-files=N:: When rotating perf.data with --switch-output, only keep N files. @@ -596,6 +611,21 @@ Make a copy of /proc/kcore and place it into a directory with the perf data file Limit the sample data max size, <size> is expected to be a number with appended unit character - B/K/M/G +--num-thread-synthesize:: + The number of threads to run when synthesizing events for existing processes. + By default, the number of threads equals 1. + +ifdef::HAVE_LIBPFM[] +--pfm-events events:: +Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net) +including support for event filters. For example '--pfm-events +inst_retired:any_p:u:c=1:i'. More than one event can be passed to the +option using the comma separator. Hardware events and generic hardware +events cannot be mixed together. The latter must be used with the -e +option. The -e option and this one can be mixed and matched. Events +can be grouped using the {} notation. +endif::HAVE_LIBPFM[] + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1] diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index f569b9ea4002..d068103690cc 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -488,6 +488,17 @@ include::itrace.txt[] This option extends the perf report to show reference callgraphs, which collected by reference event, in no callgraph event. +--stitch-lbr:: + Show callgraph with stitched LBRs, which may have more complete + callgraph. The perf.data file must have been obtained using + perf record --call-graph lbr. + Disabled by default. In common cases with call stack overflows, + it can recreate better call stacks than the default lbr call stack + output. But this approach is not full proof. There can be cases + where it creates incorrect call stacks from incorrect matches. + The known limitations include exception handing such as + setjmp/longjmp will have calls/returns not match. + --socket-filter:: Only report the samples on the processor socket that match with this filter diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 963487e82edc..372dfd110e6d 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -440,6 +440,17 @@ include::itrace.txt[] --show-on-off-events:: Show the --switch-on/off events too. +--stitch-lbr:: + Show callgraph with stitched LBRs, which may have more complete + callgraph. The perf.data file must have been obtained using + perf record --call-graph lbr. + Disabled by default. In common cases with call stack overflows, + it can recreate better call stacks than the default lbr call stack + output. But this approach is not full proof. There can be cases + where it creates incorrect call stacks from incorrect matches. + The known limitations include exception handing such as + setjmp/longjmp will have calls/returns not match. + SEE ALSO -------- linkperf:perf-record[1], linkperf:perf-script-perl[1], diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 4d56586b2fb9..b029ee728a0b 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -71,6 +71,16 @@ report:: --tid=<tid>:: stat events on existing thread id (comma separated list) +ifdef::HAVE_LIBPFM[] +--pfm-events events:: +Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net) +including support for event filters. For example '--pfm-events +inst_retired:any_p:u:c=1:i'. More than one event can be passed to the +option using the comma separator. Hardware events and generic hardware +events cannot be mixed together. The latter must be used with the -e +option. The -e option and this one can be mixed and matched. Events +can be grouped using the {} notation. +endif::HAVE_LIBPFM[] -a:: --all-cpus:: @@ -93,7 +103,9 @@ report:: -B:: --big-num:: - print large numbers with thousands' separators according to locale + print large numbers with thousands' separators according to locale. + Enabled by default. Use "--no-big-num" to disable. + Default setting can be changed with "perf config stat.big-num=false". -C:: --cpu=:: @@ -176,6 +188,8 @@ Print count deltas every N milliseconds (minimum: 1ms) The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution. example: 'perf stat -I 1000 -e cycles -a sleep 5' +If the metric exists, it is calculated by the counts generated in this interval and the metric is printed after #. + --interval-count times:: Print count deltas for fixed number of times. This option should be used together with "-I" option. @@ -232,6 +246,25 @@ filter out the startup phase of the program, which is often very different. Print statistics of transactional execution if supported. +--metric-no-group:: +By default, events to compute a metric are placed in weak groups. The +group tries to enforce scheduling all or none of the events. The +--metric-no-group option places events outside of groups and may +increase the chance of the event being scheduled - leading to more +accuracy. However, as events may not be scheduled together accuracy +for metrics like instructions per cycle can be lower - as both metrics +may no longer be being measured at the same time. + +--metric-no-merge:: +By default metric events in different weak groups can be shared if one +group contains all the events needed by another. In such cases one +group will be eliminated reducing event multiplexing and making it so +that certain groups of metrics sum to 100%. A downside to sharing a +group is that the group may require multiplexing and so accuracy for a +small group that need not have multiplexing is lowered. This option +forbids the event merging logic from sharing events between groups and +may be used to increase accuracy in this case. + STAT RECORD ----------- Stores stat data into perf data file. diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt index 487737a725e9..ee2024691d46 100644 --- a/tools/perf/Documentation/perf-top.txt +++ b/tools/perf/Documentation/perf-top.txt @@ -319,6 +319,26 @@ Default is to monitor all CPUS. go straight to the histogram browser, just like 'perf top' with no events explicitely specified does. +--stitch-lbr:: + Show callgraph with stitched LBRs, which may have more complete + callgraph. The option must be used with --call-graph lbr recording. + Disabled by default. In common cases with call stack overflows, + it can recreate better call stacks than the default lbr call stack + output. But this approach is not full proof. There can be cases + where it creates incorrect call stacks from incorrect matches. + The known limitations include exception handing such as + setjmp/longjmp will have calls/returns not match. + +ifdef::HAVE_LIBPFM[] +--pfm-events events:: +Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net) +including support for event filters. For example '--pfm-events +inst_retired:any_p:u:c=1:i'. More than one event can be passed to the +option using the comma separator. Hardware events and generic hardware +events cannot be mixed together. The latter must be used with the -e +option. The -e option and this one can be mixed and matched. Events +can be grouped using the {} notation. +endif::HAVE_LIBPFM[] INTERACTIVE PROMPTING KEYS -------------------------- diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt index b0152e1095c5..b6472e463284 100644 --- a/tools/perf/Documentation/perf.data-file-format.txt +++ b/tools/perf/Documentation/perf.data-file-format.txt @@ -373,6 +373,22 @@ struct { Indicates that trace contains records of PERF_RECORD_COMPRESSED type that have perf_events records in compressed form. + HEADER_CPU_PMU_CAPS = 28, + + A list of cpu PMU capabilities. The format of data is as below. + +struct { + u32 nr_cpu_pmu_caps; + { + char name[]; + char value[]; + } [nr_cpu_pmu_caps] +}; + + +Example: + cpu pmu capabilities: branches=32, max_precise=3, pmu_name=icelake + other bits are reserved and should ignored for now HEADER_FEAT_BITS = 256, diff --git a/tools/perf/Documentation/security.txt b/tools/perf/Documentation/security.txt new file mode 100644 index 000000000000..4fe3b8b1958f --- /dev/null +++ b/tools/perf/Documentation/security.txt @@ -0,0 +1,237 @@ +Overview +======== + +For general security related questions of perf_event_open() syscall usage, +performance monitoring and observability operations by Perf see here: +https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html + +Enabling LSM based mandatory access control (MAC) to perf_event_open() syscall +============================================================================== + +LSM hooks for mandatory access control for perf_event_open() syscall can be +used starting from Linux v5.3. Below are the steps to extend Fedora (v31) with +Targeted policy with perf_event_open() access control capabilities: + +1. Download selinux-policy SRPM package (e.g. selinux-policy-3.14.4-48.fc31.src.rpm on FC31) + and install it so rpmbuild directory would exist in the current working directory: + + # rpm -Uhv selinux-policy-3.14.4-48.fc31.src.rpm + +2. Get into rpmbuild/SPECS directory and unpack the source code: + + # rpmbuild -bp selinux-policy.spec + +3. Place patch below at rpmbuild/BUILD/selinux-policy-b86eaaf4dbcf2d51dd4432df7185c0eaf3cbcc02 + directory and apply it: + + # patch -p1 < selinux-policy-perf-events-perfmon.patch + patching file policy/flask/access_vectors + patching file policy/flask/security_classes + # cat selinux-policy-perf-events-perfmon.patch +diff -Nura a/policy/flask/access_vectors b/policy/flask/access_vectors +--- a/policy/flask/access_vectors 2020-02-04 18:19:53.000000000 +0300 ++++ b/policy/flask/access_vectors 2020-02-28 23:37:25.000000000 +0300 +@@ -174,6 +174,7 @@ + wake_alarm + block_suspend + audit_read ++ perfmon + } + + # +@@ -1099,3 +1100,15 @@ + + class xdp_socket + inherits socket ++ ++class perf_event ++{ ++ open ++ cpu ++ kernel ++ tracepoint ++ read ++ write ++} ++ ++ +diff -Nura a/policy/flask/security_classes b/policy/flask/security_classes +--- a/policy/flask/security_classes 2020-02-04 18:19:53.000000000 +0300 ++++ b/policy/flask/security_classes 2020-02-28 21:35:17.000000000 +0300 +@@ -200,4 +200,6 @@ + + class xdp_socket + ++class perf_event ++ + # FLASK + +4. Get into rpmbuild/SPECS directory and build policy packages from patched sources: + + # rpmbuild --noclean --noprep -ba selinux-policy.spec + + so you have this: + + # ls -alh rpmbuild/RPMS/noarch/ + total 33M + drwxr-xr-x. 2 root root 4.0K Mar 20 12:16 . + drwxr-xr-x. 3 root root 4.0K Mar 20 12:16 .. + -rw-r--r--. 1 root root 112K Mar 20 12:16 selinux-policy-3.14.4-48.fc31.noarch.rpm + -rw-r--r--. 1 root root 1.2M Mar 20 12:17 selinux-policy-devel-3.14.4-48.fc31.noarch.rpm + -rw-r--r--. 1 root root 2.3M Mar 20 12:17 selinux-policy-doc-3.14.4-48.fc31.noarch.rpm + -rw-r--r--. 1 root root 12M Mar 20 12:17 selinux-policy-minimum-3.14.4-48.fc31.noarch.rpm + -rw-r--r--. 1 root root 4.5M Mar 20 12:16 selinux-policy-mls-3.14.4-48.fc31.noarch.rpm + -rw-r--r--. 1 root root 111K Mar 20 12:16 selinux-policy-sandbox-3.14.4-48.fc31.noarch.rpm + -rw-r--r--. 1 root root 14M Mar 20 12:17 selinux-policy-targeted-3.14.4-48.fc31.noarch.rpm + +5. Install SELinux packages from Fedora repo, if not already done so, and + update with the patched rpms above: + + # rpm -Uhv rpmbuild/RPMS/noarch/selinux-policy-* + +6. Enable SELinux Permissive mode for Targeted policy, if not already done so: + + # cat /etc/selinux/config + + # This file controls the state of SELinux on the system. + # SELINUX= can take one of these three values: + # enforcing - SELinux security policy is enforced. + # permissive - SELinux prints warnings instead of enforcing. + # disabled - No SELinux policy is loaded. + SELINUX=permissive + # SELINUXTYPE= can take one of these three values: + # targeted - Targeted processes are protected, + # minimum - Modification of targeted policy. Only selected processes are protected. + # mls - Multi Level Security protection. + SELINUXTYPE=targeted + +7. Enable filesystem SELinux labeling at the next reboot: + + # touch /.autorelabel + +8. Reboot machine and it will label filesystems and load Targeted policy into the kernel; + +9. Login and check that dmesg output doesn't mention that perf_event class is unknown to SELinux subsystem; + +10. Check that SELinux is enabled and in Permissive mode + + # getenforce + Permissive + +11. Turn SELinux into Enforcing mode: + + # setenforce 1 + # getenforce + Enforcing + +Opening access to perf_event_open() syscall on Fedora with SELinux +================================================================== + +Access to performance monitoring and observability operations by Perf +can be limited for superuser or CAP_PERFMON or CAP_SYS_ADMIN privileged +processes. MAC policy settings (e.g. SELinux) can be loaded into the kernel +and prevent unauthorized access to perf_event_open() syscall. In such case +Perf tool provides a message similar to the one below: + + # perf stat + Error: + Access to performance monitoring and observability operations is limited. + Enforced MAC policy settings (SELinux) can limit access to performance + monitoring and observability operations. Inspect system audit records for + more perf_event access control information and adjusting the policy. + Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open + access to performance monitoring and observability operations for users + without CAP_PERFMON or CAP_SYS_ADMIN Linux capability. + perf_event_paranoid setting is -1: + -1: Allow use of (almost) all events by all users + Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK + >= 0: Disallow raw and ftrace function tracepoint access + >= 1: Disallow CPU event access + >= 2: Disallow kernel profiling + To make the adjusted perf_event_paranoid setting permanent preserve it + in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>) + +To make sure that access is limited by MAC policy settings inspect system +audit records using journalctl command or /var/log/audit/audit.log so the +output would contain AVC denied records related to perf_event: + + # journalctl --reverse --no-pager | grep perf_event + + python3[1318099]: SELinux is preventing perf from open access on the perf_event labeled unconfined_t. + If you believe that perf should be allowed open access on perf_event labeled unconfined_t by default. + setroubleshoot[1318099]: SELinux is preventing perf from open access on the perf_event labeled unconfined_t. For complete SELinux messages run: sealert -l 4595ce5b-e58f-462c-9d86-3bc2074935de + audit[1318098]: AVC avc: denied { open } for pid=1318098 comm="perf" scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=perf_event permissive=0 + +In order to open access to perf_event_open() syscall MAC policy settings can +require to be extended. On SELinux system this can be done by loading a special +policy module extending base policy settings. Perf related policy module can +be generated using the system audit records about blocking perf_event access. +Run the command below to generate my-perf.te policy extension file with +perf_event related rules: + + # ausearch -c 'perf' --raw | audit2allow -M my-perf && cat my-perf.te + + module my-perf 1.0; + + require { + type unconfined_t; + class perf_event { cpu kernel open read tracepoint write }; + } + + #============= unconfined_t ============== + allow unconfined_t self:perf_event { cpu kernel open read tracepoint write }; + +Now compile, pack and load my-perf.pp extension module into the kernel: + + # checkmodule -M -m -o my-perf.mod my-perf.te + # semodule_package -o my-perf.pp -m my-perf.mod + # semodule -X 300 -i my-perf.pp + +After all those taken steps above access to perf_event_open() syscall should +now be allowed by the policy settings. Check access running Perf like this: + + # perf stat + ^C + Performance counter stats for 'system wide': + + 36,387.41 msec cpu-clock # 7.999 CPUs utilized + 2,629 context-switches # 0.072 K/sec + 57 cpu-migrations # 0.002 K/sec + 1 page-faults # 0.000 K/sec + 263,721,559 cycles # 0.007 GHz + 175,746,713 instructions # 0.67 insn per cycle + 19,628,798 branches # 0.539 M/sec + 1,259,201 branch-misses # 6.42% of all branches + + 4.549061439 seconds time elapsed + +The generated perf-event.pp related policy extension module can be removed +from the kernel using this command: + + # semodule -X 300 -r my-perf + +Alternatively the module can be temporarily disabled and enabled back using +these two commands: + + # semodule -d my-perf + # semodule -e my-perf + +If something went wrong +======================= + +To turn SELinux into Permissive mode: + # setenforce 0 + +To fully disable SELinux during kernel boot [3] set kernel command line parameter selinux=0 + +To remove SELinux labeling from local filesystems: + # find / -mount -print0 | xargs -0 setfattr -h -x security.selinux + +To fully turn SELinux off a machine set SELINUX=disabled at /etc/selinux/config file and reboot; + +Links +===== + +[1] https://download-ib01.fedoraproject.org/pub/fedora/linux/updates/31/Everything/SRPMS/Packages/s/selinux-policy-3.14.4-49.fc31.src.rpm +[2] https://docs.fedoraproject.org/en-US/Fedora/11/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-Working_with_SELinux-Enabling_and_Disabling_SELinux.html +[3] https://danwalsh.livejournal.com/10972.html |