aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/cpufreq (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-11-25Merge tag 'cpufreq-arm-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pmRafael J. Wysocki7-21/+194
Pull CPUFreq updates for 6.19 from Viresh Kumar: "- tegra186: Add OPP / bandwidth support for Tegra186 (Aaron Kling). - Minor improvements to various cpufreq drivers (Christian Marangi, Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)." * tag 'cpufreq-arm-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue users cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM cpufreq: CPPC: Don't warn if FIE init fails to read counters cpufreq: nforce2: fix reference count leak in nforce2 cpufreq: tegra186: add OPP support and set bandwidth cpufreq: dt-platdev: Add JH7110S SOC to the allowlist cpufreq: s5pv210: fix refcount leak
2025-11-21cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_listChristian Marangi1-1/+1
If CONFIG_OF is not enabled, of_match_node() is set as NULL and qcom_cpufreq_ipq806x_match_list won't be used causing a compilation warning. Flag qcom_cpufreq_ipq806x_match_list as __maybe_unused to fix the compilation warning. While at it also flag as __initconst as it's used only in probe contest and can be freed after probe. This follows the pattern of the usual of_device_id variables. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511202119.6zvvFMup-lkp@intel.com/ Fixes: 58f5d39d5ed8 ("cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> [ Viresh: Drop __initconst ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-11-20cpufreq: ACPI: Replace udelay() with usleep_range()Kaushlendra Kumar1-1/+1
Replace udelay() with usleep_range() in check_freqs() to allow CPU scheduling during frequency polling. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251119031109.134583-1-kaushlendra.kumar@intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-18cpufreq: intel_pstate: Eliminate some code duplicationRafael J. Wysocki1-14/+14
To eliminate some code duplication from the intel_pstate driver, move the core_get_val() function body to a new function called get_perf_ctl_val() and make both core_get_val() and atom_get_val() invoke it to carry out the same computation. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/2829273.mvXUDI8C0e@rafael.j.wysocki
2025-11-12cpufreq: intel_pstate: Use mutex guard for driver lockingRafael J. Wysocki1-66/+33
Use guard(mutex)(&intel_pstate_driver_lock), or the scoped variant of it, wherever intel_pstate_driver_lock needs to be held. This allows some local variables and goto statements to be dropped as they are not necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://patch.msgid.link/2807232.mvXUDI8C0e@rafael.j.wysocki
2025-11-12Merge tag 'amd-pstate-v6.19-2025-11-10' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linuxRafael J. Wysocki1-20/+15
Pull amd-pstate content for 6.19 (11/10/25) from Mario Liminciello: "* optimizations for parameter array handling * fix for mode changes with offline CPUs" * tag 'amd-pstate-v6.19-2025-11-10' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: cpufreq/amd-pstate: Call cppc_set_auto_sel() only for online CPUs cpufreq/amd-pstate: Add static asserts for EPP indices cpufreq/amd-pstate: Fix some whitespace issues cpufreq/amd-pstate: Adjust return values in amd_pstate_update_status() cpufreq/amd-pstate: Make amd_pstate_get_mode_string() never return NULL cpufreq/amd-pstate: Drop NULL value from amd_pstate_mode_string cpufreq/amd-pstate: Use sysfs_match_string() for epp
2025-11-12Merge back cpufreq material for 6.19Rafael J. Wysocki2-53/+58
2025-11-12cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writesSrinivas Pandruvada1-5/+4
Commit ac4e04d9e378 ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode") introduced a check for feature X86_FEATURE_IDA to verify turbo mode support. Although this is the correct way to check for turbo mode support, it causes issues on some platforms that disable turbo during OS boot, but enable it later [1]. Before adding this feature check, users were able to get turbo mode frequencies by writing 0 to /sys/devices/system/cpu/intel_pstate/no_turbo post-boot. To restore the old behavior on the affected systems while still addressing the unchecked MSR issue on some Skylake-X systems, check X86_FEATURE_IDA only immediately before updates of MSR_IA32_PERF_CTL that may involve setting the Turbo Engage Bit (bit 32). Fixes: ac4e04d9e378 ("cpufreq: intel_pstate: Unchecked MSR aceess in legacy mode") Reported-by: Aaron Rainbolt <arainbolt@kfocus.org> Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2122531 [1] Tested-by: Aaron Rainbolt <arainbolt@kfocus.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject adjustment, changelog edits ] Link: https://patch.msgid.link/20251111010840.141490-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-10cpufreq/amd-pstate: Call cppc_set_auto_sel() only for online CPUsGautham R. Shenoy1-1/+1
amd_pstate_change_mode_without_dvr_change() calls cppc_set_auto_sel() for all the present CPUs. However, this callpath eventually calls cppc_set_reg_val() which accesses the per-cpu cpc_desc_ptr object. This object is initialized only for online CPUs via acpi_soft_cpu_online() --> __acpi_processor_start() --> acpi_cppc_processor_probe(). Hence, restrict calling cppc_set_auto_sel() to only the online CPUs. Fixes: 3ca7bc818d8c ("cpufreq: amd-pstate: Add guided mode control support via sysfs") Suggested-by: Mario Limonciello (AMD) (kernel.org) <superm1@kernel.org> Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10cpufreq/amd-pstate: Add static asserts for EPP indicesMario Limonciello (AMD)1-0/+3
In case a new index is introduced add a static assert to make sure that strings and values are updated. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10cpufreq/amd-pstate: Fix some whitespace issuesMario Limonciello (AMD)1-2/+2
Add whitespace around the equals and remove leading space. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10cpufreq/amd-pstate: Adjust return values in amd_pstate_update_status()Mario Limonciello (AMD)1-3/+2
get_mode_idx_from_str() already checks the upper boundary for a string sent. Drop the extra check in amd_pstate_update_status() and pass the return code if there is a failure. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10cpufreq/amd-pstate: Make amd_pstate_get_mode_string() never return NULLMario Limonciello (AMD)1-2/+2
amd_pstate_get_mode_string() is only used by amd-pstate-ut. Set the failure path to use AMD_PSTATE_UNDEFINED ("undefined") to avoid showing "(null)" as a string when running test suite. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10cpufreq/amd-pstate: Drop NULL value from amd_pstate_mode_stringMario Limonciello (AMD)1-1/+1
None of the users actually look for the NULL value. To avoid risk of regression introducing a new value but forgetting to add a string add a static assert to test AMD_PSTATE_MAX matches the array size. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10cpufreq/amd-pstate: Use sysfs_match_string() for eppMario Limonciello (AMD)1-11/+4
Rather than scanning the buffer and manually matching the string use the sysfs macros. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-11-10cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue usersMarco Crivellari1-1/+2
Currently if a user enqueues a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This continues the effort to refactor workqueue APIs, which began with the introduction of new workqueues and a new alloc_workqueue flag in: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> [ Viresh: Fixed Subject ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-11-10cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEMChristian Marangi1-2/+33
On some IPQ806x SoC SMEM might be not initialized by SBL. This is the case for some Google devices (the OnHub family) that can't make use of SMEM to detect the SoC ID (and socinfo can't be used either as it does depends on SMEM presence). To handle these specific case, check if the SMEM is not initialized (by checking if the qcom_smem_get_soc_id returns -ENODEV) and fallback to OF machine compatible checking to identify the SoC variant. Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-30cpufreq: intel_pstate: Add Diamond Rapids OOB mode supportKuppuswamy Sathyanarayanan1-0/+1
Prevent intel_pstate from loading when Out-of-Band (OOB) P-states mode is enabled. The OOB identification mechanism for Diamond Rapids servers is the same as for prior generation CPUs such as Granite Rapids. Add the Diamond Rapids CPU model to intel_pstate_cpu_oob_ids[] to ensure correct OOB handling. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/20251022215425.3566218-1-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-28cpufreq: CPPC: Don't warn if FIE init fails to read countersJie Zhan1-9/+8
During the CPPC FIE initialization, reading perf counters on offline cpus should be expected to fail. Don't warn on this case. Also, change the error log level to debug since FIE is optional. Co-developed-by: Bowen Yu <yubowen8@huawei.com> Signed-off-by: Bowen Yu <yubowen8@huawei.com> # Changing loglevel to debug Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> [ Viresh: Added back the dropped comment. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-28cpufreq: nforce2: fix reference count leak in nforce2Miaoqian Lin1-0/+3
There are two reference count leaks in this driver: 1. In nforce2_fsb_read(): pci_get_subsys() increases the reference count of the PCI device, but pci_dev_put() is never called to release it, thus leaking the reference. 2. In nforce2_detect_chipset(): pci_get_subsys() gets a reference to the nforce2_dev which is stored in a global variable, but the reference is never released when the module is unloaded. Fix both by: - Adding pci_dev_put(nforce2_sub5) in nforce2_fsb_read() after reading the configuration. - Adding pci_dev_put(nforce2_dev) in nforce2_exit() to release the global device reference. Found via static analysis. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-23cpufreq: tegra186: add OPP support and set bandwidthAaron Kling1-7/+143
Add support to use OPP table from DT in Tegra186 cpufreq driver. Tegra SoC's receive the frequency lookup table (LUT) from BPMP-FW. Cross check the OPP's present in DT against the LUT from BPMP-FW and enable only those DT OPP's which are present in LUT also. The OPP table in DT has CPU Frequency to bandwidth mapping where the bandwidth value is per MC channel. DRAM bandwidth depends on the number of MC channels which can vary as per the boot configuration. This per channel bandwidth from OPP table will be later converted by MC driver to final bandwidth value by multiplying with number of channels before being handled in the EMC driver. If OPP table is not present in DT, then use the LUT from BPMP-FW directly as the CPU frequency table and not do the DRAM frequency scaling which is same as the current behavior. Signed-off-by: Aaron Kling <webgeek1234@gmail.com> [ Viresh: Fix _free() definitions ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-23cpufreq: dt-platdev: Add JH7110S SOC to the allowlistHal Feng1-0/+1
Add the compatible strings for supporting the generic cpufreq driver on the StarFive JH7110S SoC. Signed-off-by: Hal Feng <hal.feng@starfivetech.com> Reviewed-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-23cpufreq: s5pv210: fix refcount leakShuhao Fu1-2/+4
In function `s5pv210_cpu_init`, a possible refcount inconsistency has been identified, causing a resource leak. Why it is a bug: 1. For every clk_get, there should be a matching clk_put on every successive error handling path. 2. After calling `clk_get(dmc1_clk)`, variable `dmc1_clk` will not be freed even if any error happens. How it is fixed: For every failed path, an extra goto label is added to ensure `dmc1_clk` will be freed regardlessly. Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-10-20cpufreq: Replace deprecated strcpy() in cpufreq_unregister_governor()Thorsten Blum1-1/+1
strcpy() is deprecated; assign the NUL terminator directly instead. Link: https://github.com/KSPP/linux/issues/88 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> [ rjw: Subject tweaks ] Link: https://patch.msgid.link/20251017153354.82009-2-thorsten.blum@linux.dev Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-20cpufreq: intel_pstate: Improve printing of debug messagesRafael J. Wysocki1-12/+13
Some debug messages generated by intel_pstate on a given hybrid system are only printed for some CPUs which is confusing, so modify the driver to print them for all CPUs. Also change those messages to avoid printing local variable names in them. Moreover, some debug messages printed by intel_pstate are quite hard to understand without looking at the code printing them, so make them somewhat clearer while at it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/8609836.T7Z3S40VBb@rafael.j.wysocki
2025-10-20cpufreq: intel_pstate: hybrid: Adjust energy model rulesRafael J. Wysocki1-21/+14
Instead of using HWP-to-frequency scaling factors for computing cost coefficients in the energy model used on hybrid systems, which is fragile, rely on CPU type information that is easily accessible now and the information on whether or not L3 cache is present for this purpose. This also allows the cost coefficients for P-cores to be adjusted so that they start to be populated somewhat earlier (that is, before E-cores are loaded up to their full capacity). In addition to the above, replace an inaccurate comment regarding the reason why the freq value is added to the cost in hybrid_get_cost(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Link: https://patch.msgid.link/5932894.DvuYhMxLoT@rafael.j.wysocki
2025-10-20cpufreq: intel_pstate: Add and use hybrid_has_l3()Rafael J. Wysocki1-12/+18
Introduce a function for checking whether or not a given CPU has L3 cache, called hybrid_has_l3(), and use it in hybrid_get_cost() for computing cost coefficients associated with a given perf domain. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/13884343.uLZWGnKmhe@rafael.j.wysocki
2025-10-20cpufreq: intel_pstate: Add and use hybrid_get_cpu_type()Rafael J. Wysocki1-6/+7
Introduce a function for identifying the type of a given CPU in a hybrid system, called hybrid_get_cpu_type(), and use if for hybrid scaling factor determination in hwp_get_cpu_scaling(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1954386.tdWV9SEqCh@rafael.j.wysocki
2025-10-20cpufreq: preserve freq_table_sorted across suspend/hibernateZihuan Zhang1-3/+6
During S3/S4 suspend and resume, cpufreq policies are not freed or recreated; the freq_table and policy structure remain intact. However, set_freq_table_sorted() currently resets policy->freq_table_sorted to UNSORTED unconditionally, which is unnecessary since the table order does not change across suspend/resume. This patch adds a check to skip validation if policy->freq_table_sorted is already ASCENDING or DESCENDING. This avoids unnecessary traversal of the frequency table on S3/S4 resume or repeated online events, reducing overhead while preserving correctness. Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/20251011072420.11495-1-zhangzihuan@kylinos.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-15cpufreq/amd-pstate: Fix a regression leading to EPP 0 after hibernateMario Limonciello (AMD)1-1/+5
After resuming from S4, all CPUs except the boot CPU have the wrong EPP hint programmed. This is because when the CPUs were offlined the EPP value was reset to 0. This is a similar problem as fixed by commit ba3319e590571 ("cpufreq/amd-pstate: Fix a regression leading to EPP 0 after resume") and the solution is also similar. When offlining rather than reset the values to zero, reset them to match those chosen by the policy. When the CPUs are onlined again these values will be restored. Closes: https://community.frame.work/t/increased-power-usage-after-resuming-from-suspend-on-ryzen-7040-kernel-6-15-regression/74531/20?u=mario_limonciello Fixes: 608a76b65288 ("cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option") Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-10-01Merge tag 'cpufreq-arm-updates-6.18-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pmRafael J. Wysocki3-18/+41
Merge CPUFreq fixes for 6.18 from Viresh Kumar: "- Update frequency for all tegra CPUs (Aaron Kling). - Fix device leak in mediatek driver (Johan Hovold). - Rust cpufreq helper cleanup (Thorsten Blum)." * tag 'cpufreq-arm-updates-6.18-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: tegra186: Initialize all cores to max frequencies cpufreq: tegra186: Set target frequency for all cpus in policy rust: cpufreq: streamline find_supply_names cpufreq: mediatek: fix device leak on probe failure
2025-10-01ACPI: CPPC: Do not use CPUFREQ_ETERNAL as an error valueRafael J. Wysocki2-6/+6
Instead of using CPUFREQ_ETERNAL for signaling an error condition in cppc_get_transition_latency(), change the return value type of that function to int and make it return a proper negative error code on failures. No intentional functional impact. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Qais Yousef <qyousef@layalina.io>
2025-10-01cpufreq: CPPC: Avoid using CPUFREQ_ETERNAL as transition delayRafael J. Wysocki1-2/+12
If cppc_get_transition_latency() returns CPUFREQ_ETERNAL to indicate a failure to retrieve the transition latency value from the platform firmware, the CPPC cpufreq driver will use that value (converted to microseconds) as the policy transition delay, but it is way too large for any practical use. Address this by making the driver use the cpufreq's default transition latency value (in microseconds) as the transition delay if CPUFREQ_ETERNAL is returned by cppc_get_transition_latency(). Fixes: d4f3388afd48 ("cpufreq / CPPC: Set platform specific transition_delay_us") Cc: 5.19+ <stable@vger.kernel.org> # 5.19 Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Qais Yousef <qyousef@layalina.io>
2025-10-01cpufreq: Make drivers using CPUFREQ_ETERNAL specify transition latencyRafael J. Wysocki7-7/+7
Commit a755d0e2d41b ("cpufreq: Honour transition_latency over transition_delay_us") caused platforms where cpuinfo.transition_latency is CPUFREQ_ETERNAL to get a very large transition latency whereas previously it had been capped at 10 ms (and later at 2 ms). This led to a user-observable regression between 6.6 and 6.12 as described by Shawn: "The dbs sampling_rate was 10000 us on 6.6 and suddently becomes 6442450 us (4294967295 / 1000 * 1.5) on 6.12 for these platforms because the default transition delay was dropped [...]. It slows down dbs governor's reacting to CPU loading change dramatically. Also, as transition_delay_us is used by schedutil governor as rate_limit_us, it shows a negative impact on device idle power consumption, because the device gets slightly less time in the lowest OPP." Evidently, the expectation of the drivers using CPUFREQ_ETERNAL as cpuinfo.transition_latency was that it would be capped by the core, but they may as well return a default transition latency value instead of CPUFREQ_ETERNAL and the core need not do anything with it. Accordingly, introduce CPUFREQ_DEFAULT_TRANSITION_LATENCY_NS and make all of the drivers in question use it instead of CPUFREQ_ETERNAL. Also update the related Rust binding. Fixes: a755d0e2d41b ("cpufreq: Honour transition_latency over transition_delay_us") Closes: https://lore.kernel.org/linux-pm/20250922125929.453444-1-shawnguo2@yeah.net/ Reported-by: Shawn Guo <shawnguo@kernel.org> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Jie Zhan <zhanjie9@hisilicon.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 6.6+ <stable@vger.kernel.org> # 6.6+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2264949.irdbgypaU6@rafael.j.wysocki [ rjw: Fix typo in new symbol name, drop redundant type cast from Rust binding ] Tested-by: Shawn Guo <shawnguo@kernel.org> # with cpufreq-dt driver Reviewed-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-29cpufreq: tegra186: Initialize all cores to max frequenciesAaron Kling1-6/+21
During initialization, the EDVD_COREx_VOLT_FREQ registers for some cores are still at reset values and not reflecting the actual frequency. This causes get calls to fail. Set all cores to their respective max frequency during probe to initialize the registers to working values. Suggested-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Aaron Kling <webgeek1234@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-09-29cpufreq: tegra186: Set target frequency for all cpus in policyAaron Kling1-2/+6
The original commit set all cores in a cluster to a shared policy, but did not update set_target to apply a frequency change to all cores for the policy. This caused most cores to remain stuck at their boot frequency. Fixes: be4ae8c19492 ("cpufreq: tegra186: Share policy per cluster") Signed-off-by: Aaron Kling <webgeek1234@gmail.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-09-29rust: cpufreq: streamline find_supply_namesThorsten Blum1-7/+3
Remove local variables from find_supply_names() and use .and_then() with the more concise kernel::kvec![] macro, instead of KVec::with_capacity() followed by .push() and Some(). No functional changes intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-09-29cpufreq: mediatek: fix device leak on probe failureJohan Hovold1-3/+11
Make sure to drop the reference to the cci device taken by of_find_device_by_node() on probe failure (e.g. probe deferral). Fixes: 0daa47325bae ("cpufreq: mediatek: Link CCI device to CPU") Cc: Jia-Wei Chang <jia-wei.chang@mediatek.com> Cc: Rex-BC Chen <rex-bc.chen@mediatek.com> Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-09-24Merge back earlier cpufreq material for 6.18Rafael J. Wysocki24-249/+321
2025-09-23cpufreq: Replace pointer subtraction with iteration macroZihuan Zhang1-4/+4
The cpufreq documentation suggests avoiding direct pointer subtraction when working with entries in driver_freq_table, as it is relatively costly. Instead, the recommended approach is to use the provided iteration macros, like cpufreq_for_each_valid_entry_idx(). Use cpufreq_for_each_entry_idx() instead of pointer subtraction in cpufreq_frequency_table_cpuinfo() which improves code clarity and follows the established cpufreq coding style. While at it, remove redundant local variable initialization from cpufreq_table_index_unsorted(). No functional change intended. Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/20250923075553.45532-1-zhangzihuan@kylinos.cn [ rjw: Subject tweak, changelog edits, local variable definition tweak ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-20cpufreq: Initialize cpufreq-based invariance before subsysChristian Loehle1-9/+11
commit 2a6c72738706 ("cpufreq: Initialize cpufreq-based frequency-invariance later") postponed the frequency invariance initialization to avoid disabling it in the error case. This isn't locking safe, instead move the initialization up before the subsys interface is registered (which will rebuild the sched_domains) and add the corresponding disable on the error path. Observed lockdep without this patch: [ 0.989686] ====================================================== [ 0.989688] WARNING: possible circular locking dependency detected [ 0.989690] 6.17.0-rc4-cix-build+ #31 Tainted: G S [ 0.989691] ------------------------------------------------------ [ 0.989692] swapper/0/1 is trying to acquire lock: [ 0.989693] ffff800082ada7f8 (sched_energy_mutex){+.+.}-{4:4}, at: rebuild_sched_domains_energy+0x30/0x58 [ 0.989705] but task is already holding lock: [ 0.989706] ffff000088c89bc8 (&policy->rwsem){+.+.}-{4:4}, at: cpufreq_online+0x7f8/0xbe0 [ 0.989713] which lock already depends on the new lock. Fixes: 2a6c72738706 ("cpufreq: Initialize cpufreq-based frequency-invariance later") Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-19cpufreq: intel_pstate: Use likely() optimization in intel_pstate_sample()Yaxiong Tian1-1/+1
The comment above the condition `if (cpu->last_sample_time)` clearly indicates that the branch is taken for the vast majority of invocations after the first sample in a cycle. The first sample is a one-time initialization case. Add likely() hint to the condition to improve branch prediction for this performance-critical path in intel_pstate_sample(). Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-19cpufreq: Add defensive check during driver registrationZihuan Zhang1-0/+1
Currently, cpufreq allows drivers to implement both ->target() and ->target_index() callbacks, but that can lead to ambiguous or incorrect behavior. For this reason, prevent cpufreq drivers implementing both ->target() and ->target_index() at the same time from registering. This check can help to catch driver implementation mistakes early and improve overall robustness, without affecting existing valid drivers. Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn> Link: https://patch.msgid.link/20250908085738.31602-1-zhangzihuan@kylinos.cn [ rjw: Subject adjustment and changelog rewrite ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-19cpufreq: intel_pstate: Enable HWP without EPP if DEC is enabledRafael J. Wysocki1-16/+56
So far, HWP has never been enabled without EPP (Energy-Performance Preference) interface support, since the lack of the latter indicates an incomplete implementation of HWP, which was the case on early development vehicle platforms. However, HWP can be expected to work if DEC (Dynamic Efficiency Control) is enabled as indicated by setting bit 27 in MSR_IA32_POWER_CTL (DEC enable bit). Accordingly, allow HWP to be enabled if the EPP interface is not supported so long as DEC is enabled in the processor. Still, the EPP control sysfs interface is useless when EPP is not supported, so do not expose it in that case. Link: https://lore.kernel.org/linux-pm/20250904000608.260817-2-srinivas.pandruvada@linux.intel.com/ Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-15cpufreq: ACPI: Use on_each_cpu_mask() in drv_write()Rafael J. Wysocki1-8/+1
Make drv_write() call on_each_cpu_mask() instead of using an open-coded equivalent of the latter. Also remove a comment mentioning the smp_call_function_many() usage which is not particularly useful anyway. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-09-15Merge back earlier cpufreq material for 6.18Rafael J. Wysocki23-220/+258
2025-09-11Merge branches 'pm-sleep' and 'pm-em'Rafael J. Wysocki1-2/+2
Merge a hibernation regression fix and an fix related to energy model management for 6.17-rc6 * pm-sleep: PM: hibernate: Restrict GFP mask in hibernation_snapshot() * pm-em: PM: EM: Add function for registering a PD without capacity update
2025-09-10cpufreq: ondemand: Update the efficient idle check for Intel extended FamiliesSohil Mehta2-24/+24
IO time is considered busy by default for modern Intel processors. The current check covers recent Family 6 models but excludes the brand new Families 18 and 19. According to Arjan van de Ven, the model check was mainly due to a lack of testing on systems before INTEL_CORE2_MEROM. He suggests considering all Intel processors as having an efficient idle. Extend the IO busy classification to all Intel processors starting with Family 6, including Family 15 (Pentium 4s) and upcoming Families 18/19. Use an x86 VFM check and move the function to the header file to avoid using arch-specific #ifdefs in the C file. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://patch.msgid.link/20250908230655.2562440-1-sohil.mehta@intel.com [ rjw: Added empty line after #include ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-10cpufreq: conservative: Replace sscanf() with kstrtouint()Kaushlendra Kumar1-12/+12
Replace sscanf() with kstrtouint() in all sysfs store functions to improve input validation and security. The kstrtouint() function provides better error detection, overflow protection, and consistent error handling compared to sscanf(). This maintains existing functionality while improving input validation robustness and following kernel coding best practices for string parsing. Signed-off-by: Kaushlendra Kumar <kaushlendra.kumar@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20250906115316.3010384-1-kaushlendra.kumar@intel.com [ rjw: Dropped duplicate paragraph from the changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-09-10PM: EM: Add function for registering a PD without capacity updateRafael J. Wysocki1-2/+2
The intel_pstate driver manages CPU capacity changes itself and it does not need an update of the capacity of all CPUs in the system to be carried out after registering a PD. Moreover, in some configurations (for instance, an SMT-capable hybrid x86 system booted with nosmt in the kernel command line) the em_check_capacity_update() call at the end of em_dev_register_perf_domain() always fails and reschedules itself to run once again in 1 s, so effectively it runs in vain every 1 s forever. To address this, introduce a new variant of em_dev_register_perf_domain(), called em_dev_register_pd_no_update(), that does not invoke em_check_capacity_update(), and make intel_pstate use it instead of the original. Fixes: 7b010f9b9061 ("cpufreq: intel_pstate: EAS support for hybrid platforms") Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/ Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>