aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/power (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-28mm: convert totalram_pages and totalhigh_pages variables to atomicArun KS1-1/+1
totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-26Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-0/+218
Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Introduce "Energy Aware Scheduling" - by Quentin Perret. This is a coherent topology description of CPUs in cooperation with the PM subsystem, with the goal to schedule more energy-efficiently on asymetric SMP platform - such as waking up tasks to the more energy-efficient CPUs first, as long as the system isn't oversubscribed. For details of the design, see: https://lore.kernel.org/lkml/20180724122521.22109-1-quentin.perret@arm.com/ - Misc cleanups and smaller enhancements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) sched/fair: Select an energy-efficient CPU on task wake-up sched/fair: Introduce an energy estimation helper function sched/fair: Add over-utilization/tipping point indicator sched/fair: Clean-up update_sg_lb_stats parameters sched/toplogy: Introduce the 'sched_energy_present' static key sched/topology: Make Energy Aware Scheduling depend on schedutil sched/topology: Disable EAS on inappropriate platforms sched/topology: Add lowest CPU asymmetry sched_domain level pointer sched/topology: Reference the Energy Model of CPUs when available PM: Introduce an Energy Model management framework sched/cpufreq: Prepare schedutil for Energy Aware Scheduling sched/topology: Relocate arch_scale_cpu_capacity() to the internal header sched/core: Remove unnecessary unlikely() in push_*_task() sched/topology: Remove the ::smt_gain field from 'struct sched_domain' sched: Fix various typos in comments sched/core: Clean up the #ifdef block in add_nr_running() sched/fair: Make some variables static sched/core: Create task_has_idle_policy() helper sched/fair: Add lsub_positive() and use it consistently sched/fair: Mask UTIL_AVG_UNCHANGED usages ...
2018-12-21Merge branches 'pm-core', 'pm-qos', 'pm-domains' and 'pm-sleep'Rafael J. Wysocki2-26/+4
* pm-core: PM-runtime: Switch autosuspend over to using hrtimers * pm-qos: PM / QoS: Change to use DEFINE_SHOW_ATTRIBUTE macro * pm-domains: PM / Domains: remove define_genpd_open_function() and define_genpd_debugfs_fops() * pm-sleep: PM / sleep: convert to DEFINE_SHOW_ATTRIBUTE
2018-12-12PM / sleep: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li1-13/+2
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-12-11PM: Introduce an Energy Model management frameworkQuentin Perret3-0/+218
Several subsystems in the kernel (task scheduler and/or thermal at the time of writing) can benefit from knowing about the energy consumed by CPUs. Yet, this information can come from different sources (DT or firmware for example), in different formats, hence making it hard to exploit without a standard API. As an attempt to address this, introduce a centralized Energy Model (EM) management framework which aggregates the power values provided by drivers into a table for each performance domain in the system. The power cost tables are made available to interested clients (e.g. task scheduler or thermal) via platform-agnostic APIs. The overall design is represented by the diagram below (focused on Arm-related drivers as an example, but applicable to any architecture): +---------------+ +-----------------+ +-------------+ | Thermal (IPA) | | Scheduler (EAS) | | Other | +---------------+ +-----------------+ +-------------+ | | em_pd_energy() | | | em_cpu_get() | +-----------+ | +--------+ | | | v v v +---------------------+ | | | Energy Model | | | | Framework | | | +---------------------+ ^ ^ ^ | | | em_register_perf_domain() +----------+ | +---------+ | | | +---------------+ +---------------+ +--------------+ | cpufreq-dt | | arm_scmi | | Other | +---------------+ +---------------+ +--------------+ ^ ^ ^ | | | +--------------+ +---------------+ +--------------+ | Device Tree | | Firmware | | ? | +--------------+ +---------------+ +--------------+ Drivers (typically, but not limited to, CPUFreq drivers) can register data in the EM framework using the em_register_perf_domain() API. The calling driver must provide a callback function with a standardized signature that will be used by the EM framework to build the power cost tables of the performance domain. This design should offer a lot of flexibility to calling drivers which are free of reading information from any location and to use any technique to compute power costs. Moreover, the capacity states registered by drivers in the EM framework are not required to match real performance states of the target. This is particularly important on targets where the performance states are not known by the OS. The power cost coefficients managed by the EM framework are specified in milli-watts. Although the two potential users of those coefficients (IPA and EAS) only need relative correctness, IPA specifically needs to compare the power of CPUs with the power of other components (GPUs, for example), which are still expressed in absolute terms in their respective subsystems. Hence, specifying the power of CPUs in milli-watts should help transitioning IPA to using the EM framework without introducing new problems by keeping units comparable across sub-systems. On the longer term, the EM of other devices than CPUs could also be managed by the EM framework, which would enable to remove the absolute unit. However, this is not absolutely required as a first step, so this extension of the EM framework is left for later. On the client side, the EM framework offers APIs to access the power cost tables of a CPU (em_cpu_get()), and to estimate the energy consumed by the CPUs of a performance domain (em_pd_energy()). Clients such as the task scheduler can then use these APIs to access the shared data structures holding the Energy Model of CPUs. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-4-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-29PM / QoS: Change to use DEFINE_SHOW_ATTRIBUTE macroYangtao Li1-13/+2
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-10-31memblock: stop using implicit alignment to SMP_CACHE_BYTESMike Rapoport1-1/+2
When a memblock allocation APIs are called with align = 0, the alignment is implicitly set to SMP_CACHE_BYTES. Implicit alignment is done deep in the memblock allocator and it can come as a surprise. Not that such an alignment would be wrong even when used incorrectly but it is better to be explicit for the sake of clarity and the prinicple of the least surprise. Replace all such uses of memblock APIs with the 'align' parameter explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment in the memblock internal allocation functions. For the case when memblock APIs are used via helper functions, e.g. like iommu_arena_new_node() in Alpha, the helper functions were detected with Coccinelle's help and then manually examined and updated where appropriate. The direct memblock APIs users were updated using the semantic patch below: @@ expression size, min_addr, max_addr, nid; @@ ( | - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid) + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid) | - memblock_alloc(size, 0) + memblock_alloc(size, SMP_CACHE_BYTES) | - memblock_alloc_raw(size, 0) + memblock_alloc_raw(size, SMP_CACHE_BYTES) | - memblock_alloc_from(size, 0, min_addr) + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr) | - memblock_alloc_nopanic(size, 0) + memblock_alloc_nopanic(size, SMP_CACHE_BYTES) | - memblock_alloc_low(size, 0) + memblock_alloc_low(size, SMP_CACHE_BYTES) | - memblock_alloc_low_nopanic(size, 0) + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES) | - memblock_alloc_from_nopanic(size, 0, min_addr) + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr) | - memblock_alloc_node(size, 0, nid) + memblock_alloc_node(size, SMP_CACHE_BYTES, nid) ) [mhocko@suse.com: changelog update] [akpm@linux-foundation.org: coding-style fixes] [rppt@linux.ibm.com: fix missed uses of implicit alignment] Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Paul Burton <paul.burton@mips.com> [MIPS] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31mm: remove include/linux/bootmem.hMike Rapoport1-1/+1
Move remaining definitions and declarations from include/linux/bootmem.h into include/linux/memblock.h and remove the redundant header. The includes were replaced with the semantic patch below and then semi-automated removal of duplicated '#include <linux/memblock.h> @@ @@ - #include <linux/bootmem.h> + #include <linux/memblock.h> [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h] Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal] Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31memblock: remove _virt from APIs returning virtual addressMike Rapoport1-1/+1
The conversion is done using sed -i 's@memblock_virt_alloc@memblock_alloc@g' \ $(git grep -l memblock_virt_alloc) Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-18Merge branches 'acpi-pm' and 'pm-sleep'Rafael J. Wysocki1-1/+1
* acpi-pm: ACPI / PM: LPIT: Register sysfs attributes based on FADT * pm-sleep: x86-32, hibernate: Adjust in_suspend after resumed on 32bit system x86-32, hibernate: Set up temporary text mapping for 32bit system x86-32, hibernate: Switch to relocated restore code during resume on 32bit system x86-32, hibernate: Switch to original page table after resumed x86-32, hibernate: Use the page size macro instead of constant value x86-32, hibernate: Use temp_pgt as the temporary page table x86, hibernate: Rename temp_level4_pgt to temp_pgt x86-32, hibernate: Enable CONFIG_ARCH_HIBERNATION_HEADER on 32bit system x86, hibernate: Extract the common code of 64/32 bit system x86-32/asm/power: Create stack frames in hibernate_asm_32.S PM / hibernate: Check the success of generating md5 digest before hibernation x86, hibernate: Fix nosave_regions setup for hibernation PM / sleep: Show freezing tasks that caused a suspend abort PM / hibernate: Documentation: fix image_size default value
2018-10-12Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputGreg Kroah-Hartman1-0/+6
Dmitry writes: "Input updates for v4.19-rc7 - we added a few scheduling points into various input interfaces to ensure that large writes will not cause RCU stalls - fixed configuring PS/2 keyboards as wakeup devices on newer platforms - added a new Xbox gamepad ID." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: uinput - add a schedule point in uinput_inject_events() Input: evdev - add a schedule point in evdev_write() Input: mousedev - add a schedule point in mousedev_write() Input: i8042 - enable keyboard wakeups by default when s2idle is used Input: xpad - add support for Xbox1 PDP Camo series gamepad
2018-10-01Input: i8042 - enable keyboard wakeups by default when s2idle is usedDaniel Drake1-0/+6
Previously, on typical consumer laptops, pressing a key on the keyboard when the system is in suspend would cause it to wake up (default or unconditional behaviour). This happens because the EC generates a SCI interrupt in this scenario. That is no longer true on modern laptops based on Intel WhiskeyLake, including Acer Swift SF314-55G, Asus UX333FA, Asus UX433FN and Asus UX533FD. We confirmed with Asus EC engineers that the "Modern Standby" design has been modified so that the EC no longer generates a SCI in this case; the keyboard controller itself should be used for wakeup. In order to retain the standard behaviour of being able to use the keyboard to wake up the system, enable serio wakeups by default on platforms that are using s2idle. Link: https://lkml.kernel.org/r/CAB4CAwfQ0mPMqCLp95TVjw4J0r5zKPWkSvvkK4cpZUGE--w8bQ@mail.gmail.com Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2018-09-10PM / sleep: Show freezing tasks that caused a suspend abortTodd Brandt1-1/+1
For debug purposes it would be nice to see which tasks caused a suspend abort, i.e. which tasks were still in the process of freezing when a wakeup event occurred. This patch adds the info to pm_debug_messages. Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-15PM / sleep: wakeup: Fix build error caused by missing SRCU supportzhangyi (F)1-0/+1
Commit ea0212f40c6 (power: auto select CONFIG_SRCU) made the code in drivers/base/power/wakeup.c use SRCU instead of RCU, but it forgot to select CONFIG_SRCU in Kconfig, which leads to the following build error if CONFIG_SRCU is not selected somewhere else: drivers/built-in.o: In function `wakeup_source_remove': (.text+0x3c6fc): undefined reference to `synchronize_srcu' drivers/built-in.o: In function `pm_print_active_wakeup_sources': (.text+0x3c7a8): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `pm_print_active_wakeup_sources': (.text+0x3c84c): undefined reference to `__srcu_read_unlock' drivers/built-in.o: In function `device_wakeup_arm_wake_irqs': (.text+0x3d1d8): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `device_wakeup_arm_wake_irqs': (.text+0x3d228): undefined reference to `__srcu_read_unlock' drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs': (.text+0x3d24c): undefined reference to `__srcu_read_lock' drivers/built-in.o: In function `device_wakeup_disarm_wake_irqs': (.text+0x3d29c): undefined reference to `__srcu_read_unlock' drivers/built-in.o:(.data+0x4158): undefined reference to `process_srcu' Fix this error by selecting CONFIG_SRCU when PM_SLEEP is enabled. Fixes: ea0212f40c6 (power: auto select CONFIG_SRCU) Cc: 4.2+ <stable@vger.kernel.org> # 4.2+ Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> [ rjw: Minor subject/changelog fixups ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-14Merge tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds5-19/+21
Pull power management updates from Rafael Wysocki: "These add a new framework for CPU idle time injection, to be used by all of the idle injection code in the kernel in the future, fix some issues and add a number of relatively small extensions in multiple places. Specifics: - Add a new framework for CPU idle time injection (Daniel Lezcano). - Add AVS support to the armada-37xx cpufreq driver (Gregory CLEMENT). - Add support for current CPU frequency reporting to the ACPI CPPC cpufreq driver (George Cherian). - Rework the cooling device registration in the imx6q/thermal driver (Bastian Stender). - Make the pcc-cpufreq driver refuse to work with dynamic scaling governors on systems with many CPUs to avoid scalability issues with it (Rafael Wysocki). - Fix the intel_pstate driver to report different maximum CPU frequencies on systems where they really are different and to ignore the turbo active ratio if hardware-managend P-states (HWP) are in use; make it use the match_string() helper (Xie Yisheng, Srinivas Pandruvada). - Fix a minor deferred probe issue in the qcom-kryo cpufreq driver (Niklas Cassel). - Add a tracepoint for the tracking of frequency limits changes (from Andriod) to the cpufreq core (Ruchi Kandoi). - Fix a circular lock dependency between CPU hotplug and sysfs locking in the cpufreq core reported by lockdep (Waiman Long). - Avoid excessive error reports on driver registration failures in the ARM cpuidle driver (Sudeep Holla). - Add a new device links flag to the driver core to make links go away automatically on supplier driver removal (Vivek Gautam). - Eliminate potential race condition between system-wide power management transitions and system shutdown (Pingfan Liu). - Add a quirk to save NVS memory on system suspend for the ASUS 1025C laptop (Willy Tarreau). - Make more systems use suspend-to-idle (instead of ACPI S3) by default (Tristian Celestin). - Get rid of stack VLA usage in the low-level hibernation code on 64-bit x86 (Kees Cook). - Fix error handling in the hibernation core and mark an expected fall-through switch in it (Chengguang Xu, Gustavo Silva). - Extend the generic power domains (genpd) framework to support attaching a device to a power domain by name (Ulf Hansson). - Fix device reference counting and user limits initialization in the devfreq core (Arvind Yadav, Matthias Kaehlcke). - Fix a few issues in the rk3399_dmc devfreq driver and improve its documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner). - Drop a redundant error message from the exynos-ppmu devfreq driver (Markus Elfring)" * tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) PM / reboot: Eliminate race between reboot and suspend PM / hibernate: Mark expected switch fall-through cpufreq: intel_pstate: Ignore turbo active ratio in HWP cpufreq: Fix a circular lock dependency problem cpu/hotplug: Add a cpus_read_trylock() function x86/power/hibernate_64: Remove VLA usage cpufreq: trace frequency limits change cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC cpufreq: armada-37xx: Add AVS support dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding PM / devfreq: rk3399_dmc: Fix duplicated opp table on reload. PM / devfreq: Init user limits from OPP limits, not viceversa PM / devfreq: rk3399_dmc: fix spelling mistakes. PM / devfreq: rk3399_dmc: do not print error when get supply and clk defer. dt-bindings: devfreq: rk3399_dmc: move interrupts to be optional. PM / devfreq: rk3399_dmc: remove wait for dcf irq event. dt-bindings: clock: add rk3399 DDR3 standard speed bins. ...
2018-08-06PM / reboot: Eliminate race between reboot and suspendPingfan Liu4-17/+18
At present, "systemctl suspend" and "shutdown" can run in parrallel. A system can suspend after devices_shutdown(), and resume. Then the shutdown task goes on to power off. This causes many devices are not really shut off. Hence replacing reboot_mutex with system_transition_mutex (renamed from pm_mutex) to achieve the exclusion. The renaming of pm_mutex as system_transition_mutex can be better to reflect the purpose of the mutex. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-08-06PM / hibernate: Mark expected switch fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This addresses Coverity-ID: 114713 ("Missing break in switch"). Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-07-02PM / hibernate: cast PAGE_SIZE to int when comparing with error codeChengguang Xu1-2/+2
If PAGE_SIZE is unsigned type then negative error code will be larger than PAGE_SIZE. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-06-20sched/swait: Rename to exclusivePeter Zijlstra1-2/+2
Since swait basically implemented exclusive waits only, make sure the API reflects that. $ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]*" -e "\<prepare_to_swait\>" | while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]*/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done With a few manual touch-ups. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
2018-06-15fix a series of Documentation/ broken file name referencesMauro Carvalho Chehab1-2/+3
As files move around, their previous links break. Fix the references for them. Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-12treewide: Use array_size() in vmalloc()Kees Cook1-3/+3
The vmalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vmalloc(a * b) with: vmalloc(array_size(a, b)) as well as handling cases of: vmalloc(a * b * c) with: vmalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vmalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vmalloc( - sizeof(u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vmalloc( - sizeof(char) * COUNT + COUNT , ...) | vmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vmalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vmalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vmalloc(C1 * C2 * C3, ...) | vmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vmalloc(C1 * C2, ...) | vmalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-05Merge tag 'pm-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds5-12/+29
Pull power management updates from Rafael Wysocki: "These include a significant update of the generic power domains (genpd) and Operating Performance Points (OPP) frameworks, mostly related to the introduction of power domain performance levels, cpufreq updates (new driver for Qualcomm Kryo processors, updates of the existing drivers, some core fixes, schedutil governor improvements), PCI power management fixes, ACPI workaround for EC-based wakeup events handling on resume from suspend-to-idle, and major updates of the turbostat and pm-graph utilities. Specifics: - Introduce power domain performance levels into the the generic power domains (genpd) and Operating Performance Points (OPP) frameworks (Viresh Kumar, Rajendra Nayak, Dan Carpenter). - Fix two issues in the runtime PM framework related to the initialization and removal of devices using device links (Ulf Hansson). - Clean up the initialization of drivers for devices in PM domains (Ulf Hansson, Geert Uytterhoeven). - Fix a cpufreq core issue related to the policy sysfs interface causing CPU online to fail for CPUs sharing one cpufreq policy in some situations (Tao Wang). - Make it possible to use platform-specific suspend/resume hooks in the cpufreq-dt driver and make the Armada 37xx DVFS use that feature (Viresh Kumar, Miquel Raynal). - Optimize policy transition notifications in cpufreq (Viresh Kumar). - Improve the iowait boost mechanism in the schedutil cpufreq governor (Patrick Bellasi). - Improve the handling of deferred frequency updates in the schedutil cpufreq governor (Joel Fernandes, Dietmar Eggemann, Rafael Wysocki, Viresh Kumar). - Add a new cpufreq driver for Qualcomm Kryo (Ilia Lin). - Fix and clean up some cpufreq drivers (Colin Ian King, Dmitry Osipenko, Doug Smythies, Luc Van Oostenryck, Simon Horman, Viresh Kumar). - Fix the handling of PCI devices with the DPM_SMART_SUSPEND flag set and update stale comments in the PCI core PM code (Rafael Wysocki). - Work around an issue related to the handling of EC-based wakeup events in the ACPI PM core during resume from suspend-to-idle if the EC has been put into the low-power mode (Rafael Wysocki). - Improve the handling of wakeup source objects in the PM core (Doug Berger, Mahendran Ganesh, Rafael Wysocki). - Update the driver core to prevent deferred probe from breaking suspend/resume ordering (Feng Kan). - Clean up the PM core somewhat (Bjorn Helgaas, Ulf Hansson, Rafael Wysocki). - Make the core suspend/resume code and cpufreq support the RT patch (Sebastian Andrzej Siewior, Thomas Gleixner). - Consolidate the PM QoS handling in cpuidle governors (Rafael Wysocki). - Fix a possible crash in the hibernation core (Tetsuo Handa). - Update the rockchip-io Adaptive Voltage Scaling (AVS) driver (David Wu). - Update the turbostat utility (fixes, cleanups, new CPU IDs, new command line options, built-in "Low Power Idle" counters support, new POLL and POLL% columns) and add an entry for it to MAINTAINERS (Len Brown, Artem Bityutskiy, Chen Yu, Laura Abbott, Matt Turner, Prarit Bhargava, Srinivas Pandruvada). - Update the pm-graph to version 5.1 (Todd Brandt). - Update the intel_pstate_tracer utility (Doug Smythies)" * tag 'pm-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (128 commits) tools/power turbostat: update version number tools/power turbostat: Add Node in output tools/power turbostat: add node information into turbostat calculations tools/power turbostat: remove num_ from cpu_topology struct tools/power turbostat: rename num_cores_per_pkg to num_cores_per_node tools/power turbostat: track thread ID in cpu_topology tools/power turbostat: Calculate additional node information for a package tools/power turbostat: Fix node and siblings lookup data tools/power turbostat: set max_num_cpus equal to the cpumask length tools/power turbostat: if --num_iterations, print for specific number of iterations tools/power turbostat: Add Cannon Lake support tools/power turbostat: delete duplicate #defines x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines tools/power turbostat: Correct SNB_C1/C3_AUTO_UNDEMOTE defines tools/power turbostat: add POLL and POLL% column tools/power turbostat: Fix --hide Pk%pc10 tools/power turbostat: Build-in "Low Power Idle" counters support tools/power turbostat: Don't make man pages executable tools/power turbostat: remove blank lines tools/power turbostat: a small C-states dump readability immprovement ...
2018-06-04Merge branches 'pm-pci', 'acpi-pm', 'pm-sleep' and 'pm-avs'Rafael J. Wysocki3-11/+28
* pm-pci: PCI / PM: Clean up outdated comments in pci_target_state() PCI / PM: Do not clear state_saved for devices that remain suspended * acpi-pm: ACPI: EC: Dispatch the EC GPE directly on s2idle wake ACPICA: Introduce acpi_dispatch_gpe() * pm-sleep: PM / hibernate: Fix oops at snapshot_write() PM / wakeup: Make s2idle_lock a RAW_SPINLOCK PM / s2idle: Make s2idle_wait_head swait based PM / wakeup: Make events_lock a RAW_SPINLOCK PM / suspend: Prevent might sleep splats * pm-avs: PM / AVS: rockchip-io: add io selectors and supplies for PX30
2018-06-04Merge branches 'pm-qos' and 'pm-core'Rafael J. Wysocki2-1/+1
* pm-qos: PM / QoS: Drop redundant declaration of pm_qos_get_value() * pm-core: PM / runtime: Drop usage count for suppliers at device link removal PM / runtime: Fixup reference counting of device link suppliers at probe PM: wakeup: Use pr_debug() for the "aborting suspend" message PM / core: Drop unused internal inline functions for sysfs PM / core: Drop unused internal functions for pm_qos sysfs PM / core: Drop unused internal inline functions for wakeirqs PM / core: Drop internal unused inline functions for wakeups PM / wakeup: Only update last time for active wakeup sources PM / wakeup: Use seq_open() to show wakeup stats PM / core: Use dev_printk() and symbols in suspend/resume diagnostics PM / core: Simplify initcall_debug_report() timing PM / core: Remove unused initcall_debug_report() arguments PM / core: fix deferred probe breaking suspend resume order
2018-05-28PM / QoS: Drop redundant declaration of pm_qos_get_value()Rafael J. Wysocki1-1/+0
The extra forward declaration of pm_qos_get_value() is redundant, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-27PM / hibernate: Fix oops at snapshot_write()Tetsuo Handa1-0/+5
syzbot is reporting NULL pointer dereference at snapshot_write() [1]. This is because data->handle is zero-cleared by ioctl(SNAPSHOT_FREE). Fix this by checking data_of(data->handle) != NULL before using it. [1] https://syzkaller.appspot.com/bug?id=828a3c71bd344a6de8b6a31233d51a72099f27fd Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+ae590932da6e45d6564d@syzkaller.appspotmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-27PM / wakeup: Make s2idle_lock a RAW_SPINLOCKSebastian Andrzej Siewior1-7/+7
The `s2idle_lock' is acquired during suspend while interrupts are disabled even on RT. The lock is acquired for short sections only. Make it a RAW lock which avoids "sleeping while atomic" warnings on RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-27PM / s2idle: Make s2idle_wait_head swait basedSebastian Andrzej Siewior1-4/+5
s2idle_wait_head is used during s2idle with interrupts disabled even on RT. There is no "custom" wake up function so swait could be used instead which is also lower weight compared to the wait_queue. Make s2idle_wait_head a swait_queue_head. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-27PM / suspend: Prevent might sleep splatsThomas Gleixner2-0/+11
timekeeping suspend/resume calls read_persistent_clock() which takes rtc_lock. That results in might sleep warnings because at that point we run with interrupts disabled. We cannot convert rtc_lock to a raw spinlock as that would trigger other might sleep warnings. As a workaround we disable the might sleep warnings by setting system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and restoring it to SYSTEM_RUNNING afer sysdev_resume(). There is no lock contention because hibernate / suspend to RAM is single-CPU at this point. In s2idle's case the system_state is set to SYSTEM_SUSPEND before timekeeping_suspend() which is invoked by the last CPU. In the resume case it set back to SYSTEM_RUNNING after timekeeping_resume() which is invoked by the first CPU in the resume case. The other CPUs will block on tick_freeze_lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: cover s2idle in tick_freeze() / tick_unfreeze()] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-05-14block: consistently use GFP_NOIO instead of __GFP_NORECLAIMChristoph Hellwig1-7/+7
Same numerical value (for now at least), but a much better documentation of intent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-10PM / wakeup: Only update last time for active wakeup sourcesDoug Berger1-0/+1
When wakelock support was added, the wakeup_source_add() function was updated to set the last_time value of the wakeup source. This has the unintended side effect of producing confusing output from pm_print_active_wakeup_sources() when a wakeup source is added prior to a sleep that is blocked by a different wakeup source. The function pm_print_active_wakeup_sources() will search for the most recently active wakeup source when no active source is found. If a wakeup source is added after a different wakeup source blocks the system from going to sleep it may have a later last_time value than the blocking source and be output as the last active wakeup source even if it has never actually been active. It looks to me like the change to wakeup_source_add() was made to prevent the wakelock garbage collection from accidentally dropping a wakelock during the narrow window between adding the wakelock to the wakelock list in wakelock_lookup_add() and the activation of the wakeup source in pm_wake_lock(). This commit changes the behavior so that only the last_time of the wakeup source used by a wakelock is initialized prior to adding it to the wakeup source list. This preserves the meaning of the last_time value as the last time the wakeup source was active and allows a wakeup source that has never been active to have a last_time value of 0. Fixes: b86ff9820fd5 (PM / Sleep: Add user space interface for manipulating wakeup sources, v3) Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-09PM / QoS: mark expected switch fall-throughsGustavo A. R. Silva1-0/+2
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-04-03Merge tag 'pm-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds1-1/+25
Pull power management updates from Rafael Wysocki: "These update the cpuidle poll state definition to reduce excessive energy usage related to it, add new CPU ID to the RAPL power capping driver, update the ACPI system suspend code to handle some special cases better, extend the PM core's device links code slightly, add new sysfs attribute for better suspend-to-idle diagnostics and easier hibernation handling, update power management tools and clean up cpufreq quite a bit. Specifics: - Modify the cpuidle poll state implementation to prevent CPUs from staying in the loop in there for excessive times (Rafael Wysocki). - Add Intel Cannon Lake chips support to the RAPL power capping driver (Joe Konno). - Add reference counting to the device links handling code in the PM core (Lukas Wunner). - Avoid reconfiguring GPEs on suspend-to-idle in the ACPI system suspend code (Rafael Wysocki). - Allow devices to be put into deeper low-power states via ACPI if both _SxD and _SxW are missing (Daniel Drake). - Reorganize the core ACPI suspend-to-idle wakeup code to avoid a keyboard wakeup issue on Asus UX331UA (Chris Chiu). - Prevent the PCMCIA library code from aborting suspend-to-idle due to noirq suspend failures resulting from incorrect assumptions (Rafael Wysocki). - Add coupled cpuidle supprt to the Exynos3250 platform (Marek Szyprowski). - Add new sysfs file to make it easier to specify the image storage location during hibernation (Mario Limonciello). - Add sysfs files for collecting suspend-to-idle usage and time statistics for CPU idle states (Rafael Wysocki). - Update the pm-graph utilities (Todd Brandt). - Reduce the kernel log noise related to reporting Low-power Idle constraings by the ACPI system suspend code (Rafael Wysocki). - Make it easier to distinguish dedicated wakeup IRQs in the /proc/interrupts output (Tony Lindgren). - Add the frequency table validation in cpufreq to the core and drop it from a number of cpufreq drivers (Viresh Kumar). - Drop "cooling-{min|max}-level" for CPU nodes from a couple of DT bindings (Viresh Kumar). - Clean up the CPU online error code path in the cpufreq core (Viresh Kumar). - Fix assorted issues in the SCPI, CPPC, mediatek and tegra186 cpufreq drivers (Arnd Bergmann, Chunyu Hu, George Cherian, Viresh Kumar). - Drop memory allocation error messages from a few places in cpufreq and cpuildle drivers (Markus Elfring)" * tag 'pm-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits) ACPI / PM: Fix keyboard wakeup from suspend-to-idle on ASUS UX331UA cpufreq: CPPC: Use transition_delay_us depending transition_latency PM / hibernate: Change message when writing to /sys/power/resume PM / hibernate: Make passing hibernate offsets more friendly cpuidle: poll_state: Avoid invoking local_clock() too often PM: cpuidle/suspend: Add s2idle usage and time state attributes cpuidle: Enable coupled cpuidle support on Exynos3250 platform cpuidle: poll_state: Add time limit to poll_idle() cpufreq: tegra186: Don't validate the frequency table twice cpufreq: speedstep: Don't validate the frequency table twice cpufreq: sparc: Don't validate the frequency table twice cpufreq: sh: Don't validate the frequency table twice cpufreq: sfi: Don't validate the frequency table twice cpufreq: scpi: Don't validate the frequency table twice cpufreq: sc520: Don't validate the frequency table twice cpufreq: s3c24xx: Don't validate the frequency table twice cpufreq: qoirq: Don't validate the frequency table twice cpufreq: pxa: Don't validate the frequency table twice cpufreq: ppc_cbe: Don't validate the frequency table twice cpufreq: powernow: Don't validate the frequency table twice ...
2018-04-02fs: add ksys_sync() helper; remove in-kernel calls to sys_sync()Dominik Brodowski3-3/+3
Using this helper allows us to avoid the in-kernel calls to the sys_sync() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_sync(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-03-30PM / hibernate: Change message when writing to /sys/power/resumeMario Limonciello1-1/+1
This file is used both for setting the wakeup device without kernel command line as well as for actually waking the system (when appropriate swap header is in place). To avoid confusion on incorrect logs in system log downgrade the message to debug and make it clearer. Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-03-30PM / hibernate: Make passing hibernate offsets more friendlyMario Limonciello1-0/+24
Currently the only way to specify a hibernate offset for a swap file is on the kernel command line. Add a new /sys/power/resume_offset that lets userspace specify the offset and disk to use when initiating a hibernate cycle. Signed-off-by: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-02-02x86/power: Fix swsusp_arch_resume prototypeArnd Bergmann1-3/+0
The declaration for swsusp_arch_resume marks it as 'asmlinkage', but the definition in x86-32 does not, and it fails to include the header with the declaration. This leads to a warning when building with link-time-optimizations: kernel/power/power.h:108:23: error: type of 'swsusp_arch_resume' does not match original declaration [-Werror=lto-type-mismatch] extern asmlinkage int swsusp_arch_resume(void); ^ arch/x86/power/hibernate_32.c:148:0: note: 'swsusp_arch_resume' was previously declared here int swsusp_arch_resume(void) This moves the declaration into a globally visible header file and fixes up both x86 definitions to match it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <len.brown@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: linux-pm@vger.kernel.org Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Bart Van Assche <bart.vanassche@wdc.com> Link: https://lkml.kernel.org/r/20180202145634.200291-2-arnd@arndb.de
2018-01-29Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block updates from Jens Axboe: "This is the main pull request for block IO related changes for the 4.16 kernel. Nothing major in this pull request, but a good amount of improvements and fixes all over the map. This contains: - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and Paolo. - Support for SMR zones for deadline and mq-deadline from Damien and Christoph. - Set of fixes for bcache by way of Michael Lyle, including fixes from himself, Kent, Rui, Tang, and Coly. - Series from Matias for lightnvm with fixes from Hans Holmberg, Javier, and Matias. Mostly centered around pblk, and the removing rrpc 1.2 in preparation for supporting 2.0. - A couple of NVMe pull requests from Christoph. Nothing major in here, just fixes and cleanups, and support for command tracing from Johannes. - Support for blk-throttle for tracking reads and writes separately. From Joseph Qi. A few cleanups/fixes also for blk-throttle from Weiping. - Series from Mike Snitzer that enables dm to register its queue more logically, something that's alwways been problematic on dm since it's a stacked device. - Series from Ming cleaning up some of the bio accessor use, in preparation for supporting multipage bvecs. - Various fixes from Ming closing up holes around queue mapping and quiescing. - BSD partition fix from Richard Narron, fixing a problem where we can't mount newer (10/11) FreeBSD partitions. - Series from Tejun reworking blk-mq timeout handling. The previous scheme relied on atomic bits, but it had races where we would think a request had timed out if it to reused at the wrong time. - null_blk now supports faking timeouts, to enable us to better exercise and test that functionality separately. From me. - Kill the separate atomic poll bit in the request struct. After this, we don't use the atomic bits on blk-mq anymore at all. From me. - sgl_alloc/free helpers from Bart. - Heavily contended tag case scalability improvement from me. - Various little fixes and cleanups from Arnd, Bart, Corentin, Douglas, Eryu, Goldwyn, and myself" * 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits) block: remove smart1,2.h nvme: add tracepoint for nvme_complete_rq nvme: add tracepoint for nvme_setup_cmd nvme-pci: introduce RECONNECTING state to mark initializing procedure nvme-rdma: remove redundant boolean for inline_data nvme: don't free uuid pointer before printing it nvme-pci: Suspend queues after deleting them bsg: use pr_debug instead of hand crafted macros blk-mq-debugfs: don't allow write on attributes with seq_operations set nvme-pci: Fix queue double allocations block: Set BIO_TRACE_COMPLETION on new bio during split blk-throttle: use queue_is_rq_based block: Remove kblockd_schedule_delayed_work{,_on}() blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() lib/scatterlist: Fix chaining support in sgl_alloc_order() blk-throttle: track read and write request individually block: add bdev_read_only() checks to common helpers block: fail op_is_write() requests to read-only partitions blk-throttle: export io_serviced_recursive, io_service_bytes_recursive ...
2018-01-17PM / hibernate: Drop unused parameter of enough_swapKyungsik Lee1-2/+2
Parameter flags is no longer used, remove it. Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-01-10PM / sleep: Make lock/unlock_system_sleep() available to kernel modulesBart Van Assche1-0/+29
Since pm_mutex is not exported using lock/unlock_system_sleep() from inside a kernel module causes a "pm_mutex undefined" linker error. Hence move lock/unlock_system_sleep() into kernel/power/main.c and export these. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2018-01-06block: convert to bio_first_bvec_all & bio_first_page_allMing Lei1-1/+1
This patch converts to bio_first_bvec_all() & bio_first_page_all() for retrieving the 1st bvec/page, and prepares for supporting multipage bvec. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-05PM: hibernate: Do not subtract NR_FILE_MAPPED in minimum_image_size()Rainer Fiebig1-4/+2
s2disk/s2both may fail unnecessarily and erratically if NR_FILE_MAPPED is high - for instance when using VMs with VirtualBox and perhaps VMware Player. In those situations s2disk becomes unreliable and therefore unusable. A typical scenario is: user issues a s2disk and it fails. User issues a second s2disk immediately after that and it succeeds. And user wonders why. The problem is caused by minimum_image_size() in snapshot.c. The value it returns is roughly 100% too high because NR_FILE_MAPPED is subtracted in its calculation. Eventually the number of preallocated image pages is falsely too low. This doesn't matter as long as NR_FILE_MAPPED-values are in a normal range or in 32bit-environments as the code allows for allocation of additional pages from highmem. But with the high values generated by VirtualBox-VMs (a 2-GB-VM causes NR_FILE_MAPPED go up by 2 GB) it may lead to failure in 64bit-systems. Not subtracting NR_FILE_MAPPED in minimum_image_size() solves the problem. I've done at least hundreds of successful s2both/s2disk now on an x86_64 system (with and without VirtualBox) which gives me some confidence that this is right. It has turned s2disk/s2both from unusable into 100% reliable. Link: https://bugzilla.kernel.org/show_bug.cgi?id=97201 Signed-off-by: Rainer Fiebig <jrf@mailbox.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-15mm: remove __GFP_COLDMel Gorman1-2/+2
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-13Merge branch 'pm-sleep'Rafael J. Wysocki4-91/+78
* pm-sleep: freezer: Fix typo in freezable_schedule_timeout() comment PM / s2idle: Clear the events_check_enabled flag PM / sleep: Remove pm_complete_with_resume_check() PM: ARM: locomo: Drop suspend and resume bus type callbacks PM: Use a more common logging style PM: Document rules on using pm_runtime_resume() in system suspend callbacks
2017-11-13Merge branches 'pm-cpufreq-sched' and 'pm-opp'Rafael J. Wysocki1-14/+0
* pm-cpufreq-sched: cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq * pm-opp: PM / OPP: Add dev_pm_opp_{un}register_get_pstate_helper() PM / OPP: Support updating performance state of device's power domain PM / OPP: add missing of_node_put() for of_get_cpu_node() PM / OPP: Rename dev_pm_opp_register_put_opp_helper() PM / OPP: Add missing of_node_put(np) PM / OPP: Move error message to debug level PM / OPP: Use snprintf() to avoid kasprintf() and kfree() PM / OPP: Move the OPP directory out of power/
2017-11-08PM / s2idle: Clear the events_check_enabled flagRajat Jain1-1/+1
Problem: This flag does not get cleared currently in the suspend or resume path in the following cases: * In case some driver's suspend routine returns an error. * Successful s2idle case * etc? Why is this a problem: What happens is that the next suspend attempt could fail even though the user did not enable the flag by writing to /sys/power/wakeup_count. This is 1 use case how the issue can be seen (but similar use case with driver suspend failure can be thought of): 1. Read /sys/power/wakeup_count 2. echo count > /sys/power/wakeup_count 3. echo freeze > /sys/power/wakeup_count 4. Let the system suspend, and wakeup the system using some wake source that calls pm_wakeup_event() e.g. power button or something. 5. Note that the combined wakeup count would be incremented due to the pm_wakeup_event() in the resume path. 6. After resuming the events_check_enabled flag is still set. At this point if the user attempts to freeze again (without writing to /sys/power/wakeup_count), the suspend would fail even though there has been no wake event since the past resume. Address that by clearing the flag just before a resume is completed, so that it is always cleared for the corner cases mentioned above. Signed-off-by: Rajat Jain <rajatja@google.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman6-0/+6
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-14Merge branch 'pm-domains' into pm-oppRafael J. Wysocki1-7/+11
2017-10-09Merge back suspend/resume/hibernate material for v4.15.Rafael J. Wysocki3-90/+77
2017-10-03PM: Use a more common logging styleJoe Perches3-90/+77
Convert printks to pr_<level>. Miscellanea: o Use pr_fmt with "PM:" and remove "PM: " from format strings o Coalesce format strings and realign format arguments o Convert an embedded incorrect function name to "%s: ", __func__ o Convert a couple multi-line formats to multiple pr_<level> calls Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>