From 33aa46f252c703e42c81a76696cd0c240f2281e4 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 25 Mar 2020 15:03:35 +0100 Subject: cpufreq: intel_pstate: Use passive mode by default without HWP After recent changes allowing scale-invariant utilization to be used on x86, the schedutil governor on top of intel_pstate in the passive mode should be on par with (or better than) the active mode "powersave" algorithm of intel_pstate on systems in which hardware-managed P-states (HWP) are not used, so it should not be necessary to use the internal scaling algorithm in those cases. Accordingly, modify intel_pstate to start in the passive mode by default if the processor at hand does not support HWP of if the driver is requested to avoid using HWP through the kernel command line. Among other things, that will allow utilization clamps and the support for RT/DL tasks in the schedutil governor to be utilized on systems in which intel_pstate is used. Signed-off-by: Rafael J. Wysocki --- Documentation/admin-guide/pm/intel_pstate.rst | 32 ++++++++++++++++----------- 1 file changed, 19 insertions(+), 13 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst index ad392f3aee06..39d80bc29ccd 100644 --- a/Documentation/admin-guide/pm/intel_pstate.rst +++ b/Documentation/admin-guide/pm/intel_pstate.rst @@ -62,9 +62,10 @@ on the capabilities of the processor. Active Mode ----------- -This is the default operation mode of ``intel_pstate``. If it works in this -mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` -policies contains the string "intel_pstate". +This is the default operation mode of ``intel_pstate`` for processors with +hardware-managed P-states (HWP) support. If it works in this mode, the +``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies +contains the string "intel_pstate". In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and provides its own scaling algorithms for P-state selection. Those algorithms @@ -138,12 +139,13 @@ internal P-state selection logic to be less performance-focused. Active Mode Without HWP ~~~~~~~~~~~~~~~~~~~~~~~ -This is the default operation mode for processors that do not support the HWP -feature. It also is used by default with the ``intel_pstate=no_hwp`` argument -in the kernel command line. However, in this mode ``intel_pstate`` may refuse -to work with the given processor if it does not recognize it. [Note that -``intel_pstate`` will never refuse to work with any processor with the HWP -feature enabled.] +This operation mode is optional for processors that do not support the HWP +feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in +the command line. The active mode is used in those cases if the +``intel_pstate=active`` argument is passed to the kernel in the command line. +In this mode ``intel_pstate`` may refuse to work with processors that are not +recognized by it. [Note that ``intel_pstate`` will never refuse to work with +any processor with the HWP feature enabled.] In this mode ``intel_pstate`` registers utilization update callbacks with the CPU scheduler in order to run a P-state selection algorithm, either @@ -188,10 +190,14 @@ is not set. Passive Mode ------------ -This mode is used if the ``intel_pstate=passive`` argument is passed to the -kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too). -Like in the active mode without HWP support, in this mode ``intel_pstate`` may -refuse to work with the given processor if it does not recognize it. +This is the default operation mode of ``intel_pstate`` for processors without +hardware-managed P-states (HWP) support. It is always used if the +``intel_pstate=passive`` argument is passed to the kernel in the command line +regardless of whether or not the given processor supports HWP. [Note that the +``intel_pstate=no_hwp`` setting implies ``intel_pstate=passive`` if it is used +without ``intel_pstate=active``.] Like in the active mode without HWP support, +in this mode ``intel_pstate`` may refuse to work with processors that are not +recognized by it. If the driver works in this mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq". -- cgit v1.2.3-59-g8ed1b From 7395683a2498c7000120cdee8e4fb0c632e5561b Mon Sep 17 00:00:00 2001 From: Hanjun Guo Date: Tue, 19 May 2020 14:25:24 +0800 Subject: Documentation: cpuidle: update the document Update the document after the remove of cpuidle_sysfs_switch. Signed-off-by: Hanjun Guo Reviewed-by: Doug Smythies Signed-off-by: Rafael J. Wysocki --- Documentation/ABI/testing/sysfs-devices-system-cpu | 24 ++++++++-------------- Documentation/admin-guide/pm/cpuidle.rst | 20 ++++++++---------- Documentation/driver-api/pm/cpuidle.rst | 5 ++--- 3 files changed, 20 insertions(+), 29 deletions(-) (limited to 'Documentation/admin-guide') diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 2e0e3b45d02a..6b5dafab950c 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -106,10 +106,10 @@ Description: CPU topology files that describe a logical CPU's relationship See Documentation/admin-guide/cputopology.rst for more information. -What: /sys/devices/system/cpu/cpuidle/current_driver - /sys/devices/system/cpu/cpuidle/current_governer_ro - /sys/devices/system/cpu/cpuidle/available_governors +What: /sys/devices/system/cpu/cpuidle/available_governors + /sys/devices/system/cpu/cpuidle/current_driver /sys/devices/system/cpu/cpuidle/current_governor + /sys/devices/system/cpu/cpuidle/current_governer_ro Date: September 2007 Contact: Linux kernel mailing list Description: Discover cpuidle policy and mechanism @@ -119,24 +119,18 @@ Description: Discover cpuidle policy and mechanism consumption during idle. Idle policy (governor) is differentiated from idle mechanism - (driver) - - current_driver: (RO) displays current idle mechanism - - current_governor_ro: (RO) displays current idle policy - - With the cpuidle_sysfs_switch boot option enabled (meant for - developer testing), the following three attributes are visible - instead: - - current_driver: same as described above + (driver). available_governors: (RO) displays a space separated list of - available governors + available governors. + + current_driver: (RO) displays current idle mechanism. current_governor: (RW) displays current idle policy. Users can switch the governor at runtime by writing to this file. + current_governor_ro: (RO) displays current idle policy. + See Documentation/admin-guide/pm/cpuidle.rst and Documentation/driver-api/pm/cpuidle.rst for more information. diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst index 5605cc6f9560..a96a423e3779 100644 --- a/Documentation/admin-guide/pm/cpuidle.rst +++ b/Documentation/admin-guide/pm/cpuidle.rst @@ -159,17 +159,15 @@ governor uses that information depends on what algorithm is implemented by it and that is the primary reason for having more than one governor in the ``CPUIdle`` subsystem. -There are three ``CPUIdle`` governors available, ``menu``, `TEO `_ -and ``ladder``. Which of them is used by default depends on the configuration -of the kernel and in particular on whether or not the scheduler tick can be -`stopped by the idle loop `_. It is possible to change the -governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has -been passed to the kernel, but that is not safe in general, so it should not be -done on production systems (that may change in the future, though). The name of -the ``CPUIdle`` governor currently used by the kernel can be read from the -:file:`current_governor_ro` (or :file:`current_governor` if -``cpuidle_sysfs_switch`` is present in the kernel command line) file under -:file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``. +There are four ``CPUIdle`` governors available, ``menu``, `TEO `_, +``ladder`` and ``haltpoll``. Which of them is used by default depends on the +configuration of the kernel and in particular on whether or not the scheduler +tick can be `stopped by the idle loop `_. Available +governors can be read from the :file:`available_governors`, and the governor +can be changed at runtime. The name of the ``CPUIdle`` governor currently +used by the kernel can be read from the :file:`current_governor_ro` or +:file:`current_governor` file under :file:`/sys/devices/system/cpu/cpuidle/` +in ``sysfs``. Which ``CPUIdle`` driver is used, on the other hand, usually depends on the platform the kernel is running on, but there are platforms with more than one diff --git a/Documentation/driver-api/pm/cpuidle.rst b/Documentation/driver-api/pm/cpuidle.rst index 006cf6db40c6..3588bf078566 100644 --- a/Documentation/driver-api/pm/cpuidle.rst +++ b/Documentation/driver-api/pm/cpuidle.rst @@ -68,9 +68,8 @@ only one in the list (that is, the list was empty before) or the value of its governor currently in use, or the name of the new governor was passed to the kernel as the value of the ``cpuidle.governor=`` command line parameter, the new governor will be used from that point on (there can be only one ``CPUIdle`` -governor in use at a time). Also, if ``cpuidle_sysfs_switch`` is passed to the -kernel in the command line, user space can choose the ``CPUIdle`` governor to -use at run time via ``sysfs``. +governor in use at a time). Also, user space can choose the ``CPUIdle`` +governor to use at run time via ``sysfs``. Once registered, ``CPUIdle`` governors cannot be unregistered, so it is not practical to put them into loadable kernel modules. -- cgit v1.2.3-59-g8ed1b From 213081dadd308d5d41a5110c2dc87fd8ba42de5e Mon Sep 17 00:00:00 2001 From: Srinivas Pandruvada Date: Mon, 11 May 2020 16:57:31 -0700 Subject: Documentation: admin-guide: pm: Document intel-speed-select Added documentation to configure servers to use Intel(R) Speed Select Technology using intel-speed-select tool. Signed-off-by: Srinivas Pandruvada Acked-by: Andriy Shevchenko Signed-off-by: Rafael J. Wysocki --- .../admin-guide/pm/intel-speed-select.rst | 917 +++++++++++++++++++++ Documentation/admin-guide/pm/working-state.rst | 1 + 2 files changed, 918 insertions(+) create mode 100644 Documentation/admin-guide/pm/intel-speed-select.rst (limited to 'Documentation/admin-guide') diff --git a/Documentation/admin-guide/pm/intel-speed-select.rst b/Documentation/admin-guide/pm/intel-speed-select.rst new file mode 100644 index 000000000000..b2ca601c21c6 --- /dev/null +++ b/Documentation/admin-guide/pm/intel-speed-select.rst @@ -0,0 +1,917 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================================ +Intel(R) Speed Select Technology User Guide +============================================================ + +The Intel(R) Speed Select Technology (Intel(R) SST) provides a powerful new +collection of features that give more granular control over CPU performance. +With Intel(R) SST, one server can be configured for power and performance for a +variety of diverse workload requirements. + +Refer to the links below for an overview of the technology: + +- https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html +- https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf + +These capabilities are further enhanced in some of the newer generations of +server platforms where these features can be enumerated and controlled +dynamically without pre-configuring via BIOS setup options. This dynamic +configuration is done via mailbox commands to the hardware. One way to enumerate +and configure these features is by using the Intel Speed Select utility. + +This document explains how to use the Intel Speed Select tool to enumerate and +control Intel(R) SST features. This document gives example commands and explains +how these commands change the power and performance profile of the system under +test. Using this tool as an example, customers can replicate the messaging +implemented in the tool in their production software. + +intel-speed-select configuration tool +====================================== + +Most Linux distribution packages may include the "intel-speed-select" tool. If not, +it can be built by downloading the Linux kernel tree from kernel.org. Once +downloaded, the tool can be built without building the full kernel. + +From the kernel tree, run the following commands:: + +# cd tools/power/x86/intel-speed-select/ +# make +# make install + +Getting Help +------------ + +To get help with the tool, execute the command below:: + +# intel-speed-select --help + +The top-level help describes arguments and features. Notice that there is a +multi-level help structure in the tool. For example, to get help for the feature "perf-profile":: + +# intel-speed-select perf-profile --help + +To get help on a command, another level of help is provided. For example for the command info "info":: + +# intel-speed-select perf-profile info --help + +Summary of platform capability +------------------------------ +To check the current platform and driver capaibilities, execute:: + +#intel-speed-select --info + +For example on a test system:: + + # intel-speed-select --info + Intel(R) Speed Select Technology + Executing on CPU model: X + Platform: API version : 1 + Platform: Driver version : 1 + Platform: mbox supported : 1 + Platform: mmio supported : 1 + Intel(R) SST-PP (feature perf-profile) is supported + TDP level change control is unlocked, max level: 4 + Intel(R) SST-TF (feature turbo-freq) is supported + Intel(R) SST-BF (feature base-freq) is not supported + Intel(R) SST-CP (feature core-power) is supported + +Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) +------------------------------------------------------------------------ + +This feature allows configuration of a server dynamically based on workload +performance requirements. This helps users during deployment as they do not have +to choose a specific server configuration statically. This Intel(R) Speed Select +Technology - Performance Profile (Intel(R) SST-PP) feature introduces a mechanism +that allows multiple optimized performance profiles per system. Each profile +defines a set of CPUs that need to be online and rest offline to sustain a +guaranteed base frequency. Once the user issues a command to use a specific +performance profile and meet CPU online/offline requirement, the user can expect +a change in the base frequency dynamically. This feature is called +"perf-profile" when using the Intel Speed Select tool. + +Number or performance levels +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There can be multiple performance profiles on a system. To get the number of +profiles, execute the command below:: + + # intel-speed-select perf-profile get-config-levels + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + get-config-levels:4 + package-1 + die-0 + cpu-14 + get-config-levels:4 + +On this system under test, there are 4 performance profiles in addition to the +base performance profile (which is performance level 0). + +Lock/Unlock status +~~~~~~~~~~~~~~~~~~ + +Even if there are multiple performance profiles, it is possible that that they +are locked. If they are locked, users cannot issue a command to change the +performance state. It is possible that there is a BIOS setup to unlock or check +with your system vendor. + +To check if the system is locked, execute the following command:: + + # intel-speed-select perf-profile get-lock-status + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + get-lock-status:0 + package-1 + die-0 + cpu-14 + get-lock-status:0 + +In this case, lock status is 0, which means that the system is unlocked. + +Properties of a performance level +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To get properties of a specific performance level (For example for the level 0, below), execute the command below:: + + # intel-speed-select perf-profile info -l 0 + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + perf-profile-level-0 + cpu-count:28 + enable-cpu-mask:000003ff,f0003fff + enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,28,29,30,31,32,33,34,35,36,37,38,39,40,41 + thermal-design-power-ratio:26 + base-frequency(MHz):2600 + speed-select-turbo-freq:disabled + speed-select-base-freq:disabled + ... + ... + +Here -l option is used to specify a performance level. + +If the option -l is omitted, then this command will print information about all +the performance levels. The above command is printing properties of the +performance level 0. + +For this performance profile, the list of CPUs displayed by the +"enable-cpu-mask/enable-cpu-list" at the max can be "online." When that +condition is met, then base frequency of 2600 MHz can be maintained. To +understand more, execute "intel-speed-select perf-profile info" for performance +level 4:: + + # intel-speed-select perf-profile info -l 4 + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + perf-profile-level-4 + cpu-count:28 + enable-cpu-mask:000000fa,f0000faf + enable-cpu-list:0,1,2,3,5,7,8,9,10,11,28,29,30,31,33,35,36,37,38,39 + thermal-design-power-ratio:28 + base-frequency(MHz):2800 + speed-select-turbo-freq:disabled + speed-select-base-freq:unsupported + ... + ... + +There are fewer CPUs in the "enable-cpu-mask/enable-cpu-list". Consequently, if +the user only keeps these CPUs online and the rest "offline," then the base +frequency is increased to 2.8 GHz compared to 2.6 GHz at performance level 0. + +Get current performance level +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To get the current performance level, execute:: + + # intel-speed-select perf-profile get-config-current-level + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + get-config-current_level:0 + +First verify that the base_frequency displayed by the cpufreq sysfs is correct:: + + # cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency + 2600000 + +This matches the base-frequency (MHz) field value displayed from the +"perf-profile info" command for performance level 0(cpufreq frequency is in +KHz). + +To check if the average frequency is equal to the base frequency for a 100% busy +workload, disable turbo:: + +# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo + +Then runs a busy workload on all CPUs, for example:: + +#stress -c 64 + +To verify the base frequency, run turbostat:: + + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 + + Package Core CPU Bzy_MHz + - - 2600 + 0 0 0 2600 + 0 1 1 2600 + 0 2 2 2600 + 0 3 3 2600 + 0 4 4 2600 + . . . . + + +Changing performance level +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To the change the performance level to 4, execute:: + + # intel-speed-select -d perf-profile set-config-level -l 4 -o + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + perf-profile + set_tdp_level:success + +In the command above, "-o" is optional. If it is specified, then it will also +offline CPUs which are not present in the enable_cpu_mask for this performance +level. + +Now if the base_frequency is checked:: + + #cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency + 2800000 + +Which shows that the base frequency now increased from 2600 MHz at performance +level 0 to 2800 MHz at performance level 4. As a result, any workload, which can +use fewer CPUs, can see a boost of 200 MHz compared to performance level 0. + +Check presence of other Intel(R) SST features +--------------------------------------------- + +Each of the performance profiles also specifies weather there is support of +other two Intel(R) SST features (Intel(R) Speed Select Technology - Base Frequency +(Intel(R) SST-BF) and Intel(R) Speed Select Technology - Turbo Frequency (Intel +SST-TF)). + +For example, from the output of "perf-profile info" above, for level 0 and level +4: + +For level 0:: + speed-select-turbo-freq:disabled + speed-select-base-freq:disabled + +For level 4:: + speed-select-turbo-freq:disabled + speed-select-base-freq:unsupported + +Given these results, the "speed-select-base-freq" (Intel(R) SST-BF) in level 4 +changed from "disabled" to "unsupported" compared to performance level 0. + +This means that at performance level 4, the "speed-select-base-freq" feature is +not supported. However, at performance level 0, this feature is "supported", but +currently "disabled", meaning the user has not activated this feature. Whereas +"speed-select-turbo-freq" (Intel(R) SST-TF) is supported at both performance +levels, but currently not activated by the user. + +The Intel(R) SST-BF and the Intel(R) SST-TF features are built on a foundation +technology called Intel(R) Speed Select Technology - Core Power (Intel(R) SST-CP). +The platform firmware enables this feature when Intel(R) SST-BF or Intel(R) SST-TF +is supported on a platform. + +Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) +--------------------------------------------------------------- + +Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) is an interface that +allows users to define per core priority. This defines a mechanism to distribute +power among cores when there is a power constrained scenario. This defines a +class of service (CLOS) configuration. + +The user can configure up to 4 class of service configurations. Each CLOS group +configuration allows definitions of parameters, which affects how the frequency +can be limited and power is distributed. Each CPU core can be tied to a class of +service and hence an associated priority. The granularity is at core level not +at per CPU level. + +Enable CLOS based prioritization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To use CLOS based prioritization feature, firmware must be informed to enable +and use a priority type. There is a default per platform priority type, which +can be changed with optional command line parameter. + +To enable and check the options, execute:: + + # intel-speed-select core-power enable --help + Intel(R) Speed Select Technology + Executing on CPU model: X + Enable core-power for a package/die + Clos Enable: Specify priority type with [--priority|-p] + 0: Proportional, 1: Ordered + +There are two types of priority types: + +- Ordered + +Priority for ordered throttling is defined based on the index of the assigned +CLOS group. Where CLOS0 gets highest priority (throttled last). + +Priority order is: +CLOS0 > CLOS1 > CLOS2 > CLOS3. + +- Proportional + +When proportional priority is used, there is an additional parameter called +frequency_weight, which can be specified per CLOS group. The goal of +proportional priority is to provide each core with the requested min., then +distribute all remaining (excess/deficit) budgets in proportion to a defined +weight. This proportional priority can be configured using "core-power config" +command. + +To enable with the platform default priority type, execute:: + + # intel-speed-select core-power enable + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + core-power + enable:success + package-1 + die-0 + cpu-6 + core-power + enable:success + +The scope of this enable is per package or die scoped when a package contains +multiple dies. To check if CLOS is enabled and get priority type, "core-power +info" command can be used. For example to check the status of core-power feature +on CPU 0, execute:: + + # intel-speed-select -c 0 core-power info + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + core-power + support-status:supported + enable-status:enabled + clos-enable-status:enabled + priority-type:proportional + package-1 + die-0 + cpu-24 + core-power + support-status:supported + enable-status:enabled + clos-enable-status:enabled + priority-type:proportional + +Configuring CLOS groups +~~~~~~~~~~~~~~~~~~~~~~~ + +Each CLOS group has its own attributes including min, max, freq_weight and +desired. These parameters can be configured with "core-power config" command. +Defaults will be used if user skips setting a parameter except clos id, which is +mandatory. To check core-power config options, execute:: + + # intel-speed-select core-power config --help + Intel(R) Speed Select Technology + Executing on CPU model: X + Set core-power configuration for one of the four clos ids + Specify targeted clos id with [--clos|-c] + Specify clos Proportional Priority [--weight|-w] + Specify clos min in MHz with [--min|-n] + Specify clos max in MHz with [--max|-m] + +For example:: + + # intel-speed-select core-power config -c 0 + Intel(R) Speed Select Technology + Executing on CPU model: X + clos epp is not specified, default: 0 + clos frequency weight is not specified, default: 0 + clos min is not specified, default: 0 MHz + clos max is not specified, default: 25500 MHz + clos desired is not specified, default: 0 + package-0 + die-0 + cpu-0 + core-power + config:success + package-1 + die-0 + cpu-6 + core-power + config:success + +The user has the option to change defaults. For example, the user can change the +"min" and set the base frequency to always get guaranteed base frequency. + +Get the current CLOS configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To check the current configuration, "core-power get-config" can be used. For +example, to get the configuration of CLOS 0:: + + # intel-speed-select core-power get-config -c 0 + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + core-power + clos:0 + epp:0 + clos-proportional-priority:0 + clos-min:0 MHz + clos-max:Max Turbo frequency + clos-desired:0 MHz + package-1 + die-0 + cpu-24 + core-power + clos:0 + epp:0 + clos-proportional-priority:0 + clos-min:0 MHz + clos-max:Max Turbo frequency + clos-desired:0 MHz + +Associating a CPU with a CLOS group +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To associate a CPU to a CLOS group "core-power assoc" command can be used:: + + # intel-speed-select core-power assoc --help + Intel(R) Speed Select Technology + Executing on CPU model: X + Associate a clos id to a CPU + Specify targeted clos id with [--clos|-c] + + +For example to associate CPU 10 to CLOS group 3, execute:: + + # intel-speed-select -c 10 core-power assoc -c 3 + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-10 + core-power + assoc:success + +Once a CPU is associated, its sibling CPUs are also associated to a CLOS group. +Once associated, avoid changing Linux "cpufreq" subsystem scaling frequency +limits. + +To check the existing association for a CPU, "core-power get-assoc" command can +be used. For example, to get association of CPU 10, execute:: + + # intel-speed-select -c 10 core-power get-assoc + Intel(R) Speed Select Technology + Executing on CPU model: X + package-1 + die-0 + cpu-10 + get-assoc + clos:3 + +This shows that CPU 10 is part of a CLOS group 3. + + +Disable CLOS based prioritization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To disable, execute:: + +# intel-speed-select core-power disable + +Some features like Intel(R) SST-TF can only be enabled when CLOS based prioritization +is enabled. For this reason, disabling while Intel(R) SST-TF is enabled can cause +Intel(R) SST-TF to fail. This will cause the "disable" command to display an error +if Intel(R) SST-TF is already enabled. In turn, to disable, the Intel(R) SST-TF +feature must be disabled first. + +Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) +------------------------------------------------------------------- + +The Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) feature lets +the user control base frequency. If some critical workload threads demand +constant high guaranteed performance, then this feature can be used to execute +the thread at higher base frequency on specific sets of CPUs (high priority +CPUs) at the cost of lower base frequency (low priority CPUs) on other CPUs. +This feature does not require offline of the low priority CPUs. + +The support of Intel(R) SST-BF depends on the Intel(R) Speed Select Technology - +Performance Profile (Intel(R) SST-PP) performance level configuration. It is +possible that only certain performance levels support Intel(R) SST-BF. It is also +possible that only base performance level (level = 0) has support of Intel +SST-BF. Consequently, first select the desired performance level to enable this +feature. + +In the system under test here, Intel(R) SST-BF is supported at the base +performance level 0, but currently disabled. For example for the level 0:: + + # intel-speed-select -c 0 perf-profile info -l 0 + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + perf-profile-level-0 + ... + + speed-select-base-freq:disabled + ... + +Before enabling Intel(R) SST-BF and measuring its impact on a workload +performance, execute some workload and measure performance and get a baseline +performance to compare against. + +Here the user wants more guaranteed performance. For this reason, it is likely +that turbo is disabled. To disable turbo, execute:: + +#echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo + +Based on the output of the "intel-speed-select perf-profile info -l 0" base +frequency of guaranteed frequency 2600 MHz. + + +Measure baseline performance for comparison +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To compare, pick a multi-threaded workload where each thread can be scheduled on +separate CPUs. "Hackbench pipe" test is a good example on how to improve +performance using Intel(R) SST-BF. + +Below, the workload is measuring average scheduler wakeup latency, so a lower +number means better performance:: + + # taskset -c 3,4 perf bench -r 100 sched pipe + # Running 'sched/pipe' benchmark: + # Executed 1000000 pipe operations between two processes + Total time: 6.102 [sec] + 6.102445 usecs/op + 163868 ops/sec + +While running the above test, if we take turbostat output, it will show us that +2 of the CPUs are busy and reaching max. frequency (which would be the base +frequency as the turbo is disabled). The turbostat output:: + + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 + Package Core CPU Bzy_MHz + 0 0 0 1000 + 0 1 1 1005 + 0 2 2 1000 + 0 3 3 2600 + 0 4 4 2600 + 0 5 5 1000 + 0 6 6 1000 + 0 7 7 1005 + 0 8 8 1005 + 0 9 9 1000 + 0 10 10 1000 + 0 11 11 995 + 0 12 12 1000 + 0 13 13 1000 + +From the above turbostat output, both CPU 3 and 4 are very busy and reaching +full guaranteed frequency of 2600 MHz. + +Intel(R) SST-BF Capabilities +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To get capabilities of Intel(R) SST-BF for the current performance level 0, +execute:: + + # intel-speed-select base-freq info -l 0 + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + speed-select-base-freq + high-priority-base-frequency(MHz):3000 + high-priority-cpu-mask:00000216,00002160 + high-priority-cpu-list:5,6,8,13,33,34,36,41 + low-priority-base-frequency(MHz):2400 + tjunction-temperature(C):125 + thermal-design-power(W):205 + +The above capabilities show that there are some CPUs on this system that can +offer base frequency of 3000 MHz compared to the standard base frequency at this +performance levels. Nevertheless, these CPUs are fixed, and they are presented +via high-priority-cpu-list/high-priority-cpu-mask. But if this Intel(R) SST-BF +feature is selected, the low priorities CPUs (which are not in +high-priority-cpu-list) can only offer up to 2400 MHz. As a result, if this +clipping of low priority CPUs is acceptable, then the user can enable Intel +SST-BF feature particularly for the above "sched pipe" workload since only two +CPUs are used, they can be scheduled on high priority CPUs and can get boost of +400 MHz. + +Enable Intel(R) SST-BF +~~~~~~~~~~~~~~~~~~~~~~ + +To enable Intel(R) SST-BF feature, execute:: + + # intel-speed-select base-freq enable -a + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + base-freq + enable:success + package-1 + die-0 + cpu-14 + base-freq + enable:success + +In this case, -a option is optional. This not only enables Intel(R) SST-BF, but it +also adjusts the priority of cores using Intel(R) Speed Select Technology Core +Power (Intel(R) SST-CP) features. This option sets the minimum performance of each +Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) class to +maximum performance so that the hardware will give maximum performance possible +for each CPU. + +If -a option is not used, then the following steps are required before enabling +Intel(R) SST-BF: + +- Discover Intel(R) SST-BF and note low and high priority base frequency +- Note the high prioity CPU list +- Enable CLOS using core-power feature set +- Configure CLOS parameters. Use CLOS.min to set to minimum performance +- Subscribe desired CPUs to CLOS groups + +With this configuration, if the same workload is executed by pinning the +workload to high priority CPUs (CPU 5 and 6 in this case):: + + #taskset -c 5,6 perf bench -r 100 sched pipe + # Running 'sched/pipe' benchmark: + # Executed 1000000 pipe operations between two processes + Total time: 5.627 [sec] + 5.627922 usecs/op + 177685 ops/sec + +This way, by enabling Intel(R) SST-BF, the performance of this benchmark is +improved (latency reduced) by 7.79%. From the turbostat output, it can be +observed that the high priority CPUs reached 3000 MHz compared to 2600 MHz. +The turbostat output:: + + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 + Package Core CPU Bzy_MHz + 0 0 0 2151 + 0 1 1 2166 + 0 2 2 2175 + 0 3 3 2175 + 0 4 4 2175 + 0 5 5 3000 + 0 6 6 3000 + 0 7 7 2180 + 0 8 8 2662 + 0 9 9 2176 + 0 10 10 2175 + 0 11 11 2176 + 0 12 12 2176 + 0 13 13 2661 + +Disable Intel(R) SST-BF +~~~~~~~~~~~~~~~~~~~~~~~ + +To disable the Intel(R) SST-BF feature, execute:: + +# intel-speed-select base-freq disable -a + + +Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF) +-------------------------------------------------------------------- + +This feature enables the ability to set different "All core turbo ratio limits" +to cores based on the priority. By using this feature, some cores can be +configured to get higher turbo frequency by designating them as high priority at +the cost of lower or no turbo frequency on the low priority cores. + +For this reason, this feature is only useful when system is busy utilizing all +CPUs, but the user wants some configurable option to get high performance on +some CPUs. + +The support of Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF) +depends on the Intel(R) Speed Select Technology - Performance Profile (Intel +SST-PP) performance level configuration. It is possible that only a certain +performance level supports Intel(R) SST-TF. It is also possible that only the base +performance level (level = 0) has the support of Intel(R) SST-TF. Hence, first +select the desired performance level to enable this feature. + +In the system under test here, Intel(R) SST-TF is supported at the base +performance level 0, but currently disabled:: + + # intel-speed-select -c 0 perf-profile info -l 0 + Intel(R) Speed Select Technology + package-0 + die-0 + cpu-0 + perf-profile-level-0 + ... + ... + speed-select-turbo-freq:disabled + ... + ... + + +To check if performance can be improved using Intel(R) SST-TF feature, get the turbo +frequency properties with Intel(R) SST-TF enabled and compare to the base turbo +capability of this system. + +Get Base turbo capability +~~~~~~~~~~~~~~~~~~~~~~~~~ + +To get the base turbo capability of performance level 0, execute:: + + # intel-speed-select perf-profile info -l 0 + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + perf-profile-level-0 + ... + ... + turbo-ratio-limits-sse + bucket-0 + core-count:2 + max-turbo-frequency(MHz):3200 + bucket-1 + core-count:4 + max-turbo-frequency(MHz):3100 + bucket-2 + core-count:6 + max-turbo-frequency(MHz):3100 + bucket-3 + core-count:8 + max-turbo-frequency(MHz):3100 + bucket-4 + core-count:10 + max-turbo-frequency(MHz):3100 + bucket-5 + core-count:12 + max-turbo-frequency(MHz):3100 + bucket-6 + core-count:14 + max-turbo-frequency(MHz):3100 + bucket-7 + core-count:16 + max-turbo-frequency(MHz):3100 + +Based on the data above, when all the CPUS are busy, the max. frequency of 3100 +MHz can be achieved. If there is some busy workload on cpu 0 - 11 (e.g. stress) +and on CPU 12 and 13, execute "hackbench pipe" workload:: + + # taskset -c 12,13 perf bench -r 100 sched pipe + # Running 'sched/pipe' benchmark: + # Executed 1000000 pipe operations between two processes + Total time: 5.705 [sec] + 5.705488 usecs/op + 175269 ops/sec + +The turbostat output:: + + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 + Package Core CPU Bzy_MHz + 0 0 0 3000 + 0 1 1 3000 + 0 2 2 3000 + 0 3 3 3000 + 0 4 4 3000 + 0 5 5 3100 + 0 6 6 3100 + 0 7 7 3000 + 0 8 8 3100 + 0 9 9 3000 + 0 10 10 3000 + 0 11 11 3000 + 0 12 12 3100 + 0 13 13 3100 + +Based on turbostat output, the performance is limited by frequency cap of 3100 +MHz. To check if the hackbench performance can be improved for CPU 12 and CPU +13, first check the capability of the Intel(R) SST-TF feature for this performance +level. + +Get Intel(R) SST-TF Capability +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To get the capability, the "turbo-freq info" command can be used:: + + # intel-speed-select turbo-freq info -l 0 + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-0 + speed-select-turbo-freq + bucket-0 + high-priority-cores-count:2 + high-priority-max-frequency(MHz):3200 + high-priority-max-avx2-frequency(MHz):3200 + high-priority-max-avx512-frequency(MHz):3100 + bucket-1 + high-priority-cores-count:4 + high-priority-max-frequency(MHz):3100 + high-priority-max-avx2-frequency(MHz):3000 + high-priority-max-avx512-frequency(MHz):2900 + bucket-2 + high-priority-cores-count:6 + high-priority-max-frequency(MHz):3100 + high-priority-max-avx2-frequency(MHz):3000 + high-priority-max-avx512-frequency(MHz):2900 + speed-select-turbo-freq-clip-frequencies + low-priority-max-frequency(MHz):2600 + low-priority-max-avx2-frequency(MHz):2400 + low-priority-max-avx512-frequency(MHz):2100 + +Based on the output above, there is an Intel(R) SST-TF bucket for which there are +two high priority cores. If only two high priority cores are set, then max. +turbo frequency on those cores can be increased to 3200 MHz. This is 100 MHz +more than the base turbo capability for all cores. + +In turn, for the hackbench workload, two CPUs can be set as high priority and +rest as low priority. One side effect is that once enabled, the low priority +cores will be clipped to a lower frequency of 2600 MHz. + +Enable Intel(R) SST-TF +~~~~~~~~~~~~~~~~~~~~~~ + +To enable Intel(R) SST-TF, execute:: + + # intel-speed-select -c 12,13 turbo-freq enable -a + Intel(R) Speed Select Technology + Executing on CPU model: X + package-0 + die-0 + cpu-12 + turbo-freq + enable:success + package-0 + die-0 + cpu-13 + turbo-freq + enable:success + package--1 + die-0 + cpu-63 + turbo-freq --auto + enable:success + +In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF +feature and also sets the CPUs to high and and low priority using Intel Speed +Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed +with "-c" arguments are marked as high priority, including its siblings. + +If -a option is not used, then the following steps are required before enabling +Intel(R) SST-TF: + +- Discover Intel(R) SST-TF and note buckets of high priority cores and maximum frequency + +- Enable CLOS using core-power feature set - Configure CLOS parameters + +- Subscribe desired CPUs to CLOS groups making sure that high priority cores are set to the maximum frequency + +If the same hackbench workload is executed, schedule hackbench threads on high +priority CPUs:: + + #taskset -c 12,13 perf bench -r 100 sched pipe + # Running 'sched/pipe' benchmark: + # Executed 1000000 pipe operations between two processes + Total time: 5.510 [sec] + 5.510165 usecs/op + 180826 ops/sec + +This improved performance by around 3.3% improvement on a busy system. Here the +turbostat output will show that the CPU 12 and CPU 13 are getting 100 MHz boost. +The turbostat output:: + + #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1 + Package Core CPU Bzy_MHz + ... + 0 12 12 3200 + 0 13 13 3200 diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst index 0a38cdf39df1..f40994c422dc 100644 --- a/Documentation/admin-guide/pm/working-state.rst +++ b/Documentation/admin-guide/pm/working-state.rst @@ -13,3 +13,4 @@ Working-State Power Management intel_pstate cpufreq_drivers intel_epb + intel-speed-select -- cgit v1.2.3-59-g8ed1b