aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide/perf-security.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide/perf-security.rst')
-rw-r--r--Documentation/admin-guide/perf-security.rst253
1 files changed, 193 insertions, 60 deletions
diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst
index f73ebfe9bfe2..72effa7c23b9 100644
--- a/Documentation/admin-guide/perf-security.rst
+++ b/Documentation/admin-guide/perf-security.rst
@@ -6,83 +6,211 @@ Perf Events and tool security
Overview
--------
-Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_ can
-impose a considerable risk of leaking sensitive data accessed by monitored
-processes. The data leakage is possible both in scenarios of direct usage of
-perf_events system call API [2]_ and over data files generated by Perf tool user
-mode utility (Perf) [3]_ , [4]_ . The risk depends on the nature of data that
-perf_events performance monitoring units (PMU) [2]_ collect and expose for
-performance analysis. Having that said perf_events/Perf performance monitoring
-is the subject for security access control management [5]_ .
+Usage of Performance Counters for Linux (perf_events) [1]_ , [2]_ , [3]_
+can impose a considerable risk of leaking sensitive data accessed by
+monitored processes. The data leakage is possible both in scenarios of
+direct usage of perf_events system call API [2]_ and over data files
+generated by Perf tool user mode utility (Perf) [3]_ , [4]_ . The risk
+depends on the nature of data that perf_events performance monitoring
+units (PMU) [2]_ and Perf collect and expose for performance analysis.
+Collected system and performance data may be split into several
+categories:
+
+1. System hardware and software configuration data, for example: a CPU
+ model and its cache configuration, an amount of available memory and
+ its topology, used kernel and Perf versions, performance monitoring
+ setup including experiment time, events configuration, Perf command
+ line parameters, etc.
+
+2. User and kernel module paths and their load addresses with sizes,
+ process and thread names with their PIDs and TIDs, timestamps for
+ captured hardware and software events.
+
+3. Content of kernel software counters (e.g., for context switches, page
+ faults, CPU migrations), architectural hardware performance counters
+ (PMC) [8]_ and machine specific registers (MSR) [9]_ that provide
+ execution metrics for various monitored parts of the system (e.g.,
+ memory controller (IMC), interconnect (QPI/UPI) or peripheral (PCIe)
+ uncore counters) without direct attribution to any execution context
+ state.
+
+4. Content of architectural execution context registers (e.g., RIP, RSP,
+ RBP on x86_64), process user and kernel space memory addresses and
+ data, content of various architectural MSRs that capture data from
+ this category.
+
+Data that belong to the fourth category can potentially contain
+sensitive process data. If PMUs in some monitoring modes capture values
+of execution context registers or data from process memory then access
+to such monitoring capabilities requires to be ordered and secured
+properly. So, perf_events/Perf performance monitoring is the subject for
+security access control management [5]_ .
perf_events/Perf access control
-------------------------------
-To perform security checks, the Linux implementation splits processes into two
-categories [6]_ : a) privileged processes (whose effective user ID is 0, referred
-to as superuser or root), and b) unprivileged processes (whose effective UID is
-nonzero). Privileged processes bypass all kernel security permission checks so
-perf_events performance monitoring is fully available to privileged processes
-without access, scope and resource restrictions.
-
-Unprivileged processes are subject to a full security permission check based on
-the process's credentials [5]_ (usually: effective UID, effective GID, and
-supplementary group list).
-
-Linux divides the privileges traditionally associated with superuser into
-distinct units, known as capabilities [6]_ , which can be independently enabled
-and disabled on per-thread basis for processes and files of unprivileged users.
-
-Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated as
-privileged processes with respect to perf_events performance monitoring and
-bypass *scope* permissions checks in the kernel.
-
-Unprivileged processes using perf_events system call API is also subject for
-PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose outcome
-determines whether monitoring is permitted. So unprivileged processes provided
-with CAP_SYS_PTRACE capability are effectively permitted to pass the check.
-
-Other capabilities being granted to unprivileged processes can effectively
-enable capturing of additional data required for later performance analysis of
-monitored processes or a system. For example, CAP_SYSLOG capability permits
-reading kernel space memory addresses from /proc/kallsyms file.
+To perform security checks, the Linux implementation splits processes
+into two categories [6]_ : a) privileged processes (whose effective user
+ID is 0, referred to as superuser or root), and b) unprivileged
+processes (whose effective UID is nonzero). Privileged processes bypass
+all kernel security permission checks so perf_events performance
+monitoring is fully available to privileged processes without access,
+scope and resource restrictions.
+
+Unprivileged processes are subject to a full security permission check
+based on the process's credentials [5]_ (usually: effective UID,
+effective GID, and supplementary group list).
+
+Linux divides the privileges traditionally associated with superuser
+into distinct units, known as capabilities [6]_ , which can be
+independently enabled and disabled on per-thread basis for processes and
+files of unprivileged users.
+
+Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated
+as privileged processes with respect to perf_events performance
+monitoring and bypass *scope* permissions checks in the kernel.
+
+Unprivileged processes using perf_events system call API is also subject
+for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose
+outcome determines whether monitoring is permitted. So unprivileged
+processes provided with CAP_SYS_PTRACE capability are effectively
+permitted to pass the check.
+
+Other capabilities being granted to unprivileged processes can
+effectively enable capturing of additional data required for later
+performance analysis of monitored processes or a system. For example,
+CAP_SYSLOG capability permits reading kernel space memory addresses from
+/proc/kallsyms file.
+
+perf_events/Perf privileged users
+---------------------------------
+
+Mechanisms of capabilities, privileged capability-dumb files [6]_ and
+file system ACLs [10]_ can be used to create a dedicated group of
+perf_events/Perf privileged users who are permitted to execute
+performance monitoring without scope limits. The following steps can be
+taken to create such a group of privileged Perf users.
+
+1. Create perf_users group of privileged Perf users, assign perf_users
+ group to Perf tool executable and limit access to the executable for
+ other users in the system who are not in the perf_users group:
+
+::
+
+ # groupadd perf_users
+ # ls -alhF
+ -rwxr-xr-x 2 root root 11M Oct 19 15:12 perf
+ # chgrp perf_users perf
+ # ls -alhF
+ -rwxr-xr-x 2 root perf_users 11M Oct 19 15:12 perf
+ # chmod o-rwx perf
+ # ls -alhF
+ -rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf
+
+2. Assign the required capabilities to the Perf tool executable file and
+ enable members of perf_users group with performance monitoring
+ privileges [6]_ :
+
+::
+
+ # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
+ # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
+ perf: OK
+ # getcap perf
+ perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep
+
+As a result, members of perf_users group are capable of conducting
+performance monitoring by using functionality of the configured Perf
+tool executable that, when executes, passes perf_events subsystem scope
+checks.
+
+This specific access control management is only available to superuser
+or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_
+capabilities.
perf_events/Perf unprivileged users
-----------------------------------
-perf_events/Perf *scope* and *access* control for unprivileged processes is
-governed by perf_event_paranoid [2]_ setting:
+perf_events/Perf *scope* and *access* control for unprivileged processes
+is governed by perf_event_paranoid [2]_ setting:
-1:
- Impose no *scope* and *access* restrictions on using perf_events performance
- monitoring. Per-user per-cpu perf_event_mlock_kb [2]_ locking limit is
- ignored when allocating memory buffers for storing performance data.
- This is the least secure mode since allowed monitored *scope* is
- maximized and no perf_events specific limits are imposed on *resources*
- allocated for performance monitoring.
+ Impose no *scope* and *access* restrictions on using perf_events
+ performance monitoring. Per-user per-cpu perf_event_mlock_kb [2]_
+ locking limit is ignored when allocating memory buffers for storing
+ performance data. This is the least secure mode since allowed
+ monitored *scope* is maximized and no perf_events specific limits
+ are imposed on *resources* allocated for performance monitoring.
>=0:
*scope* includes per-process and system wide performance monitoring
- but excludes raw tracepoints and ftrace function tracepoints monitoring.
- CPU and system events happened when executing either in user or
- in kernel space can be monitored and captured for later analysis.
- Per-user per-cpu perf_event_mlock_kb locking limit is imposed but
- ignored for unprivileged processes with CAP_IPC_LOCK [6]_ capability.
+ but excludes raw tracepoints and ftrace function tracepoints
+ monitoring. CPU and system events happened when executing either in
+ user or in kernel space can be monitored and captured for later
+ analysis. Per-user per-cpu perf_event_mlock_kb locking limit is
+ imposed but ignored for unprivileged processes with CAP_IPC_LOCK
+ [6]_ capability.
>=1:
- *scope* includes per-process performance monitoring only and excludes
- system wide performance monitoring. CPU and system events happened when
- executing either in user or in kernel space can be monitored and
- captured for later analysis. Per-user per-cpu perf_event_mlock_kb
- locking limit is imposed but ignored for unprivileged processes with
- CAP_IPC_LOCK capability.
+ *scope* includes per-process performance monitoring only and
+ excludes system wide performance monitoring. CPU and system events
+ happened when executing either in user or in kernel space can be
+ monitored and captured for later analysis. Per-user per-cpu
+ perf_event_mlock_kb locking limit is imposed but ignored for
+ unprivileged processes with CAP_IPC_LOCK capability.
>=2:
- *scope* includes per-process performance monitoring only. CPU and system
- events happened when executing in user space only can be monitored and
- captured for later analysis. Per-user per-cpu perf_event_mlock_kb
- locking limit is imposed but ignored for unprivileged processes with
- CAP_IPC_LOCK capability.
+ *scope* includes per-process performance monitoring only. CPU and
+ system events happened when executing in user space only can be
+ monitored and captured for later analysis. Per-user per-cpu
+ perf_event_mlock_kb locking limit is imposed but ignored for
+ unprivileged processes with CAP_IPC_LOCK capability.
+
+perf_events/Perf resource control
+---------------------------------
+
+Open file descriptors
++++++++++++++++++++++
+
+The perf_events system call API [2]_ allocates file descriptors for
+every configured PMU event. Open file descriptors are a per-process
+accountable resource governed by the RLIMIT_NOFILE [11]_ limit
+(ulimit -n), which is usually derived from the login shell process. When
+configuring Perf collection for a long list of events on a large server
+system, this limit can be easily hit preventing required monitoring
+configuration. RLIMIT_NOFILE limit can be increased on per-user basis
+modifying content of the limits.conf file [12]_ . Ordinarily, a Perf
+sampling session (perf record) requires an amount of open perf_event
+file descriptors that is not less than the number of monitored events
+multiplied by the number of monitored CPUs.
+
+Memory allocation
++++++++++++++++++
+
+The amount of memory available to user processes for capturing
+performance monitoring data is governed by the perf_event_mlock_kb [2]_
+setting. This perf_event specific resource setting defines overall
+per-cpu limits of memory allowed for mapping by the user processes to
+execute performance monitoring. The setting essentially extends the
+RLIMIT_MEMLOCK [11]_ limit, but only for memory regions mapped
+specifically for capturing monitored performance events and related data.
+
+For example, if a machine has eight cores and perf_event_mlock_kb limit
+is set to 516 KiB, then a user process is provided with 516 KiB * 8 =
+4128 KiB of memory above the RLIMIT_MEMLOCK limit (ulimit -l) for
+perf_event mmap buffers. In particular, this means that, if the user
+wants to start two or more performance monitoring processes, the user is
+required to manually distribute the available 4128 KiB between the
+monitoring processes, for example, using the --mmap-pages Perf record
+mode option. Otherwise, the first started performance monitoring process
+allocates all available 4128 KiB and the other processes will fail to
+proceed due to the lack of memory.
+
+RLIMIT_MEMLOCK and perf_event_mlock_kb resource constraints are ignored
+for processes with the CAP_IPC_LOCK capability. Thus, perf_events/Perf
+privileged users can be provided with memory above the constraints for
+perf_events/Perf performance monitoring purpose by providing the Perf
+executable with CAP_IPC_LOCK capability.
Bibliography
------------
@@ -94,4 +222,9 @@ Bibliography
.. [5] `<https://www.kernel.org/doc/html/latest/security/credentials.html>`_
.. [6] `<http://man7.org/linux/man-pages/man7/capabilities.7.html>`_
.. [7] `<http://man7.org/linux/man-pages/man2/ptrace.2.html>`_
+.. [8] `<https://en.wikipedia.org/wiki/Hardware_performance_counter>`_
+.. [9] `<https://en.wikipedia.org/wiki/Model-specific_register>`_
+.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_
+.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_
+.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_