aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation
diff options
context:
space:
mode:
authorRafael J. Wysocki <rafael.j.wysocki@intel.com>2025-05-26 20:19:40 +0200
committerRafael J. Wysocki <rafael.j.wysocki@intel.com>2025-05-26 20:19:40 +0200
commitf34dc2834347301d4652464867f096048df546ff (patch)
treeec0b58c4f296f6e2aedadb2f16ab73a62a849a80 /Documentation
parentMerge branch 'pm-em' (diff)
parentcpufreq: CPPC: Add support for autonomous selection (diff)
downloadwireguard-linux-f34dc2834347301d4652464867f096048df546ff.tar.xz
wireguard-linux-f34dc2834347301d4652464867f096048df546ff.zip
Merge branch 'pm-cpufreq'
Merge cpufreq updates for 6.16-rc1: - Refactor cpufreq_online(), add and use cpufreq policy locking guards, use __free() in policy reference counting, and clean up core cpufreq code on top of that (Rafael Wysocki). - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh Kumar). - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay Ugwekar). - Add offline, online and suspend callbacks to the amd-pstate driver, rename and use the existing amd_pstate_epp callbacks in it (Dhananjay Ugwekar). - Add support for the "Requested CPU Min frequency" BIOS option to the amd-pstate driver (Dhananjay Ugwekar). - Reset amd-pstate driver mode after running selftests (Swapnil Sapkal). - Add helper for governor checks to the schedutil cpufreq governor and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki). - Populate the cpu_capacity sysfs entries from the intel_pstate driver after registering asym capacity support (Ricardo Neri). - Add support for enabling Energy-aware scheduling (EAS) to the intel_pstate driver when operating in the passive mode on a hybrid platform (Rafael Wysocki). - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan Chancellor). - Drop redundant cpus_read_lock() from store_local_boost() in the cpufreq core (Seyediman Seyedarab). - Replace sscanf() with kstrtouint() in the cpufreq code and use a symbol instead of a raw number in it (Bowen Yu). - Add support for autonomous CPU performance state selection to the CPPC cpufreq driver (Lifeng Zheng). * pm-cpufreq: (31 commits) cpufreq: CPPC: Add support for autonomous selection cpufreq: Update sscanf() to kstrtouint() cpufreq: Replace magic number cpufreq: drop redundant cpus_read_lock() from store_local_boost() cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver() cpufreq: intel_pstate: Document hybrid processor support cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache cpufreq: intel_pstate: EAS support for hybrid platforms cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas() cpufreq: intel_pstate: Populate the cpu_capacity sysfs entries arch_topology: Relocate cpu_scale to topology.[h|c] cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq cpufreq/sched: schedutil: Add helper for governor checks amd-pstate-ut: Reset amd-pstate driver mode after running selftests cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option cpufreq/amd-pstate: Add offline, online and suspend callbacks for amd_pstate_driver cpufreq: Force sync policy boost with global boost on sysfs update cpufreq: Preserve policy's boost state after resume cpufreq: Introduce policy_set_boost() cpufreq: Don't unnecessarily call set_boost() ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-devices-system-cpu54
-rw-r--r--Documentation/admin-guide/pm/intel_pstate.rst104
2 files changed, 156 insertions, 2 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 6a1acabb29d8..03f43fb667a3 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -268,6 +268,60 @@ Description: Discover CPUs in the same CPU frequency coordination domain
This file is only present if the acpi-cpufreq or the cppc-cpufreq
drivers are in use.
+What: /sys/devices/system/cpu/cpuX/cpufreq/auto_select
+Date: May 2025
+Contact: linux-pm@vger.kernel.org
+Description: Autonomous selection enable
+
+ Read/write interface to control autonomous selection enable
+ Read returns autonomous selection status:
+ 0: autonomous selection is disabled
+ 1: autonomous selection is enabled
+
+ Write 'y' or '1' or 'on' to enable autonomous selection.
+ Write 'n' or '0' or 'off' to disable autonomous selection.
+
+ This file is only present if the cppc-cpufreq driver is in use.
+
+What: /sys/devices/system/cpu/cpuX/cpufreq/auto_act_window
+Date: May 2025
+Contact: linux-pm@vger.kernel.org
+Description: Autonomous activity window
+
+ This file indicates a moving utilization sensitivity window to
+ the platform's autonomous selection policy.
+
+ Read/write an integer represents autonomous activity window (in
+ microseconds) from/to this file. The max value to write is
+ 1270000000 but the max significand is 127. This means that if 128
+ is written to this file, 127 will be stored. If the value is
+ greater than 130, only the first two digits will be saved as
+ significand.
+
+ Writing a zero value to this file enable the platform to
+ determine an appropriate Activity Window depending on the workload.
+
+ Writing to this file only has meaning when Autonomous Selection is
+ enabled.
+
+ This file is only present if the cppc-cpufreq driver is in use.
+
+What: /sys/devices/system/cpu/cpuX/cpufreq/energy_performance_preference_val
+Date: May 2025
+Contact: linux-pm@vger.kernel.org
+Description: Energy performance preference
+
+ Read/write an 8-bit integer from/to this file. This file
+ represents a range of values from 0 (performance preference) to
+ 0xFF (energy efficiency preference) that influences the rate of
+ performance increase/decrease and the result of the hardware's
+ energy efficiency and performance optimization policies.
+
+ Writing to this file only has meaning when Autonomous Selection is
+ enabled.
+
+ This file is only present if the cppc-cpufreq driver is in use.
+
What: /sys/devices/system/cpu/cpu*/cache/index3/cache_disable_{0,1}
Date: August 2008
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index 78fc83ed2a7e..26e702c7016e 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -329,6 +329,106 @@ information listed above is the same for all of the processors supporting the
HWP feature, which is why ``intel_pstate`` works with all of them.]
+Support for Hybrid Processors
+=============================
+
+Some processors supported by ``intel_pstate`` contain two or more types of CPU
+cores differing by the maximum turbo P-state, performance vs power characteristics,
+cache sizes, and possibly other properties. They are commonly referred to as
+hybrid processors. To support them, ``intel_pstate`` requires HWP to be enabled
+and it assumes the HWP performance units to be the same for all CPUs in the
+system, so a given HWP performance level always represents approximately the
+same physical performance regardless of the core (CPU) type.
+
+Hybrid Processors with SMT
+--------------------------
+
+On systems where SMT (Simultaneous Multithreading), also referred to as
+HyperThreading (HT) in the context of Intel processors, is enabled on at least
+one core, ``intel_pstate`` assigns performance-based priorities to CPUs. Namely,
+the priority of a given CPU reflects its highest HWP performance level which
+causes the CPU scheduler to generally prefer more performant CPUs, so the less
+performant CPUs are used when the other ones are fully loaded. However, SMT
+siblings (that is, logical CPUs sharing one physical core) are treated in a
+special way such that if one of them is in use, the effective priority of the
+other ones is lowered below the priorities of the CPUs located in the other
+physical cores.
+
+This approach maximizes performance in the majority of cases, but unfortunately
+it also leads to excessive energy usage in some important scenarios, like video
+playback, which is not generally desirable. While there is no other viable
+choice with SMT enabled because the effective capacity and utilization of SMT
+siblings are hard to determine, hybrid processors without SMT can be handled in
+more energy-efficient ways.
+
+.. _CAS:
+
+Capacity-Aware Scheduling Support
+---------------------------------
+
+The capacity-aware scheduling (CAS) support in the CPU scheduler is enabled by
+``intel_pstate`` by default on hybrid processors without SMT. CAS generally
+causes the scheduler to put tasks on a CPU so long as there is a sufficient
+amount of spare capacity on it, and if the utilization of a given task is too
+high for it, the task will need to go somewhere else.
+
+Since CAS takes CPU capacities into account, it does not require CPU
+prioritization and it allows tasks to be distributed more symmetrically among
+the more performant and less performant CPUs. Once placed on a CPU with enough
+capacity to accommodate it, a task may just continue to run there regardless of
+whether or not the other CPUs are fully loaded, so on average CAS reduces the
+utilization of the more performant CPUs which causes the energy usage to be more
+balanced because the more performant CPUs are generally less energy-efficient
+than the less performant ones.
+
+In order to use CAS, the scheduler needs to know the capacity of each CPU in
+the system and it needs to be able to compute scale-invariant utilization of
+CPUs, so ``intel_pstate`` provides it with the requisite information.
+
+First of all, the capacity of each CPU is represented by the ratio of its highest
+HWP performance level, multiplied by 1024, to the highest HWP performance level
+of the most performant CPU in the system, which works because the HWP performance
+units are the same for all CPUs. Second, the frequency-invariance computations,
+carried out by the scheduler to always express CPU utilization in the same units
+regardless of the frequency it is currently running at, are adjusted to take the
+CPU capacity into account. All of this happens when ``intel_pstate`` has
+registered itself with the ``CPUFreq`` core and it has figured out that it is
+running on a hybrid processor without SMT.
+
+Energy-Aware Scheduling Support
+-------------------------------
+
+If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and
+``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling
+`CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the
+Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
+``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate``
+to operate in the `passive mode <Passive Mode_>`_.
+
+The Energy Model registered by ``intel_pstate`` is artificial (that is, it is
+based on abstract cost values and it does not include any real power numbers)
+and it is relatively simple to avoid unnecessary computations in the scheduler.
+There is a performance domain in it for every CPU in the system and the cost
+values for these performance domains have been chosen so that running a task on
+a less performant (small) CPU appears to be always cheaper than running that
+task on a more performant (big) CPU. However, for two CPUs of the same type,
+the cost difference depends on their current utilization, and the CPU whose
+current utilization is higher generally appears to be a more expensive
+destination for a given task. This helps to balance the load among CPUs of the
+same type.
+
+Since EAS works on top of CAS, high-utilization tasks are always migrated to
+CPUs with enough capacity to accommodate them, but thanks to EAS, low-utilization
+tasks tend to be placed on the CPUs that look less expensive to the scheduler.
+Effectively, this causes the less performant and less loaded CPUs to be
+preferred as long as they have enough spare capacity to run the given task
+which generally leads to reduced energy usage.
+
+The Energy Model created by ``intel_pstate`` can be inspected by looking at
+the ``energy_model`` directory in ``debugfs`` (typlically mounted on
+``/sys/kernel/debug/``).
+
+
User Space Interface in ``sysfs``
=================================
@@ -697,8 +797,8 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
Limits`_ for details).
``no_cas``
- Do not enable capacity-aware scheduling (CAS) which is enabled by
- default on hybrid systems.
+ Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by
+ default on hybrid systems without SMT.
Diagnostics and Tuning
======================