aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/power (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-02-13PM / sleep: Re-implement suspend-to-idle handlingRafael J. Wysocki1-7/+36
In preparation for adding support for quiescing timers in the final stage of suspend-to-idle transitions, rework the freeze_enter() function making the system wait on a wakeup event, the freeze_wake() function terminating the suspend-to-idle loop and the mechanism by which deep idle states are entered during suspend-to-idle. First of all, introduce a simple state machine for suspend-to-idle and make the code in question use it. Second, prevent freeze_enter() from losing wakeup events due to race conditions and ensure that the number of online CPUs won't change while it is being executed. In addition to that, make it force all of the CPUs re-enter the idle loop in case they are in idle states already (so they can enter deeper idle states if possible). Next, drop cpuidle_use_deepest_state() and replace use_deepest_state checks in cpuidle_select() and cpuidle_reflect() with a single suspend-to-idle state check in cpuidle_idle_call(). Finally, introduce cpuidle_enter_freeze() that will simply find the deepest idle state available to the given CPU and enter it using cpuidle_enter(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-02-11oom, PM: make OOM detection in the freezer path racelessMichal Hocko1-41/+9
Commit 5695be142e20 ("OOM, PM: OOM killed task shouldn't escape PM suspend") has left a race window when OOM killer manages to note_oom_kill after freeze_processes checks the counter. The race window is quite small and really unlikely and partial solution deemed sufficient at the time of submission. Tejun wasn't happy about this partial solution though and insisted on a full solution. That requires the full OOM and freezer's task freezing exclusion, though. This is done by this patch which introduces oom_sem RW lock and turns oom_killer_disable() into a full OOM barrier. oom_killer_disabled check is moved from the allocation path to the OOM level and we take oom_sem for reading for both the check and the whole OOM invocation. oom_killer_disable() takes oom_sem for writing so it waits for all currently running OOM killer invocations. Then it disable all the further OOMs by setting oom_killer_disabled and checks for any oom victims. Victims are counted via mark_tsk_oom_victim resp. unmark_oom_victim. The last victim wakes up all waiters enqueued by oom_killer_disable(). Therefore this function acts as the full OOM barrier. The page fault path is covered now as well although it was assumed to be safe before. As per Tejun, "We used to have freezing points deep in file system code which may be reacheable from page fault." so it would be better and more robust to not rely on freezing points here. Same applies to the memcg OOM killer. out_of_memory tells the caller whether the OOM was allowed to trigger and the callers are supposed to handle the situation. The page allocation path simply fails the allocation same as before. The page fault path will retry the fault (more on that later) and Sysrq OOM trigger will simply complain to the log. Normally there wouldn't be any unfrozen user tasks after try_to_freeze_tasks so the function will not block. But if there was an OOM killer racing with try_to_freeze_tasks and the OOM victim didn't finish yet then we have to wait for it. This should complete in a finite time, though, because - the victim cannot loop in the page fault handler (it would die on the way out from the exception) - it cannot loop in the page allocator because all the further allocation would fail and __GFP_NOFAIL allocations are not acceptable at this stage - it shouldn't be blocked on any locks held by frozen tasks (try_to_freeze expects lockless context) and kernel threads and work queues are not frozen yet Signed-off-by: Michal Hocko <mhocko@suse.cz> Suggested-by: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11PM: convert printk to pr_* equivalentMichal Hocko1-14/+15
While touching this area let's convert printk to pr_*. This also makes the printing of continuation lines done properly. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10Merge tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds2-7/+95
Pull ACPI and power management updates from Rafael Wysocki: "We have a few new features this time, including a new SFI-based cpufreq driver, a new devfreq driver for Tegra Activity Monitor, a new devfreq class for providing its governors with raw utilization data and a new ACPI driver for AMD SoCs. Still, the majority of changes here are reworks of existing code to make it more straightforward or to prepare it for implementing new features on top of it. The primary example is the rework of ACPI resources handling from Jiang Liu, Thomas Gleixner and Lv Zheng with support for IOAPIC hotplug implemented on top of it, but there is quite a number of changes of this kind in the cpufreq core, ACPICA, ACPI EC driver, ACPI processor driver and the generic power domains core code too. The most active developer is Viresh Kumar with his cpufreq changes. Specifics: - Rework of the core ACPI resources parsing code to fix issues in it and make using resource offsets more convenient and consolidation of some resource-handing code in a couple of places that have grown analagous data structures and code to cover the the same gap in the core (Jiang Liu, Thomas Gleixner, Lv Zheng). - ACPI-based IOAPIC hotplug support on top of the resources handling rework (Jiang Liu, Yinghai Lu). - ACPICA update to upstream release 20150204 including an interrupt handling rework that allows drivers to install raw handlers for ACPI GPEs which then become entirely responsible for the given GPE and the ACPICA core code won't touch it (Lv Zheng, David E Box, Octavian Purdila). - ACPI EC driver rework to fix several concurrency issues and other problems related to events handling on top of the ACPICA's new support for raw GPE handlers (Lv Zheng). - New ACPI driver for AMD SoCs analogous to the LPSS (Low-Power Subsystem) driver for Intel chips (Ken Xue). - Two minor fixes of the ACPI LPSS driver (Heikki Krogerus, Jarkko Nikula). - Two new blacklist entries for machines (Samsung 730U3E/740U3E and 510R) where the native backlight interface doesn't work correctly while the ACPI one does (Hans de Goede). - Rework of the ACPI processor driver's handling of idle states to make the code more straightforward and less bloated overall (Rafael J Wysocki). - Assorted minor fixes related to ACPI and SFI (Andreas Ruprecht, Andy Shevchenko, Hanjun Guo, Jan Beulich, Rafael J Wysocki, Yaowei Bai). - PCI core power management modification to avoid resuming (some) runtime-suspended devices during system suspend if they are in the right states already (Rafael J Wysocki). - New SFI-based cpufreq driver for Intel platforms using SFI (Srinidhi Kasagar). - cpufreq core fixes, cleanups and simplifications (Viresh Kumar, Doug Anderson, Wolfram Sang). - SkyLake CPU support and other updates for the intel_pstate driver (Kristen Carlson Accardi, Srinivas Pandruvada). - cpufreq-dt driver cleanup (Markus Elfring). - Init fix for the ARM big.LITTLE cpuidle driver (Sudeep Holla). - Generic power domains core code fixes and cleanups (Ulf Hansson). - Operating Performance Points (OPP) core code cleanups and kernel documentation update (Nishanth Menon). - New dabugfs interface to make the list of PM QoS constraints available to user space (Nishanth Menon). - New devfreq driver for Tegra Activity Monitor (Tomeu Vizoso). - New devfreq class (devfreq_event) to provide raw utilization data to devfreq governors (Chanwoo Choi). - Assorted minor fixes and cleanups related to power management (Andreas Ruprecht, Krzysztof Kozlowski, Rickard Strandqvist, Pavel Machek, Todd E Brandt, Wonhong Kwon). - turbostat updates (Len Brown) and cpupower Makefile improvement (Sriram Raghunathan)" * tag 'pm+acpi-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (151 commits) tools/power turbostat: relax dependency on APERF_MSR tools/power turbostat: relax dependency on invariant TSC Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS tools/power turbostat: relax dependency on root permission ACPI / video: Add disable_native_backlight quirk for Samsung 510R ACPI / PM: Remove unneeded nested #ifdef USB / PM: Remove unneeded #ifdef and associated dead code intel_pstate: provide option to only use intel_pstate with HWP ACPI / EC: Add GPE reference counting debugging messages ACPI / EC: Add query flushing support ACPI / EC: Refine command storm prevention support ACPI / EC: Add command flushing support. ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag ACPI: add AMD ACPI2Platform device support for x86 system ACPI / table: remove duplicate NULL check for the handler of acpi_table_parse() ACPI / EC: Update revision due to raw handler mode. ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp. ACPI / EC: Fix several GPE handling issues by deploying ACPI_GPE_DISPATCH_RAW_HANDLER mode. ACPICA: Events: Enable APIs to allow interrupt/polling adaptive request based GPE handling model ...
2015-02-10Merge branches 'pm-sleep' and 'pm-runtime'Rafael J. Wysocki1-5/+6
* pm-sleep: PM / hibernate: exclude freed pages from allocated pages printout PM / sleep: export suspend_resume trace event PM / sleep: Mention async suspend in PM_TRACE documentation PM / hibernate: Remove unused function * pm-runtime: ACPI / PM: Remove unneeded nested #ifdef USB / PM: Remove unneeded #ifdef and associated dead code
2015-02-03PM / hibernate: exclude freed pages from allocated pages printoutWonhong Kwon1-3/+6
hibernate_preallocate_memory() prints out that how many pages are allocated, but it doesn't take into consideration the pages freed by free_unnecessary_pages(). Therefore, it always shows the count more than actually allocated. Signed-off-by: Wonhong Kwon <wonhong.kwon@lge.com> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23PM / hibernate: Remove unused functionRickard Strandqvist1-2/+0
Remove the function get_safe_write_buffer() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23PM / QoS: Add debugfs support to view the list of constraintsNishanth Menon1-2/+89
PM QoS requests are notoriously hard to debug and made even more so due to their highly dynamic nature. Having visibility into the internal data representation per constraint allows us to have much better appreciation of potential issues or bad usage by drivers in the system. So introduce for all classes of PM QoS, an entry in /sys/kernel/debug/pm_qos that shall show all the current requests as well as the snapshot of the value these requests boil down to. For example: ==> /sys/kernel/debug/pm_qos/cpu_dma_latency <== 1: 4444: Active 2: 2000000000: Default 3: 2000000000: Default 4: 2000000000: Default Type=Minimum, Value=4444, Requests: active=1 / total=4 ==> /sys/kernel/debug/pm_qos/memory_bandwidth <== Empty! ... The actual value listed will have their meaning based on the QoS it is on, the 'Type' indicates what logic it would use to collate the information - Minimum, Maximum, or Sum. Value is the collation of all requests. This interface also compares the values with the defaults for the QoS class and marks the ones that are currently active. Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Dave Gerlach <d-gerlach@ti.com> Acked-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-06rcu: Make SRCU optional by using CONFIG_SRCUPranith Kumar1-0/+1
SRCU is not necessary to be compiled by default in all cases. For tinification efforts not compiling SRCU unless necessary is desirable. The current patch tries to make compiling SRCU optional by introducing a new Kconfig option CONFIG_SRCU which is selected when any of the components making use of SRCU are selected. If we do not select CONFIG_SRCU, srcu.o will not be compiled at all. text data bss dec hex filename 2007 0 0 2007 7d7 kernel/rcu/srcu.o Size of arch/powerpc/boot/zImage changes from text data bss dec hex filename 831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before 829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after so the savings are about ~2000 bytes. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
2014-12-19PM: Eliminate CONFIG_PM_RUNTIMERafael J. Wysocki1-10/+6
Having switched over all of the users of CONFIG_PM_RUNTIME to use CONFIG_PM directly, turn the latter into a user-selectable option and drop the former entirely from the tree. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org>
2014-12-08Merge branch 'pm-runtime'Rafael J. Wysocki1-5/+1
* pm-runtime: (25 commits) i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM hwrandom / exynos / PM: Use CONFIG_PM in #ifdef block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM USB / PM: Drop CONFIG_PM_RUNTIME from the USB core PM: Merge the SET*_RUNTIME_PM_OPS() macros PM / Kconfig: Do not select PM directly from Kconfig files PCI / PM: Drop CONFIG_PM_RUNTIME from the PCI core ...
2014-12-08Merge branches 'pm-domains', 'pm-sleep' and 'pm-tools'Rafael J. Wysocki4-35/+34
* pm-domains: ARM: shmobile: Convert to genpd flags for PM clocks for R-mobile ARM: shmobile: Convert to genpd flags for PM clocks for r8a7779 PM / Domains: Initial PM clock support for genpd PM / Domains: Power on the PM domain right after attach completes PM / Domains: Move struct pm_domain_data to pm_domain.h PM / Domains: Extract code to power off/on a PM domain PM / Domains: Make genpd parameter of pm_genpd_present() const * pm-sleep: PM / hibernate: Deletion of an unnecessary check before the function call "vfree" PM / Hibernate: Migrate to ktime_t * pm-tools: tools: cpupower: fix return checks for sysfs_get_idlestate_count()
2014-12-04PM: Drop CONFIG_PM_RUNTIME from the driver coreRafael J. Wysocki1-4/+0
After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selected) PM_RUNTIME is always set if PM is set, so quite a few depend on CONFIG_PM or even may be dropped entirely in some cases. Replace CONFIG_PM_RUNTIME with CONFIG_PM in the PM core code. Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-27Merge back earlier 'pm-runtime' material for 3.19-rc1.Rafael J. Wysocki1-1/+1
2014-11-18PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is selectedRafael J. Wysocki1-0/+1
The number of and dependencies between high-level power management Kconfig options make life much harder than necessary. Several conbinations of them have to be tested and supported, even though some of those combinations are very rarely used in practice (if they are used in practice at all). Moreover, the fact that we have separate independent Kconfig options for runtime PM and system suspend is a serious obstacle for integration between the two frameworks. To overcome these difficulties, always select PM_RUNTIME if PM_SLEEP is set. Among other things, this will allow system suspend callbacks provided by bus types and device drivers to rely on the runtime PM framework regardless of the kernel configuration. Enthusiastically-acked-by: Kevin Hilman <khilman@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-18PM: Kconfig: fix unmet dependency for CPU_PMPankaj Dubey1-1/+0
If BL_SWITCHER is enabled but SUSPEND and CPU_IDLE is not enabled we are getting following config warning. warning: (BL_SWITCHER) selects CPU_PM which has unmet direct dependencies (SUSPEND || CPU_IDLE) It has been noticed that CPU_PM dependencies in this file are not really required so let's remove these dependencies from CPU_PM. Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-18PM / hibernate: Deletion of an unnecessary check before the function call "vfree"Markus Elfring1-1/+1
The vfree() function performs also input parameter validation. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-18Merge back 'pm-sleep' material for 3.19-rc1.Rafael J. Wysocki4-34/+33
2014-11-15PM / Runtime: Kconfig: move ia64 dependency to arch/ia64/KconfigKevin Hilman1-1/+0
The IA64_HP_SIM dependency on PM_RUNTIME should be done in the arch Kconfig instead of in the PM core. Move it accordingly. NOTE: arch/ia64/Kconfig currently does a 'select PM', which since commit 1eb208aea317 (PM: Make CONFIG_PM depend on (CONFIG_PM_SLEEP || CONFIG_PM_RUNTIME)) is effectively a noop unless PM_SLEEP or PM_RUNTIME are set elsewhere. Signed-off-by: Kevin Hilman <khilman@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-08PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is setDmitry Eremin-Solenikov1-2/+2
If no freeze_ops is set, trying to enter suspend-to-IDLE will cause a nice oops in platform_suspend_prepare_late(). Add respective checks to platform_suspend_prepare_late() and platform_resume_early() functions. Fixes: a8d46b9e4e48 (ACPI / sleep: Rework the handling of ACPI GPE wakeup ...) Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-03PM / Hibernate: Migrate to ktime_tTina Ruchandani4-34/+33
This patch migrates swsusp_show_speed and its callers to using ktime_t instead of 'struct timeval' which suffers from the y2038 problem. Changes to swsusp_show_speed: - use ktime_t for start and stop times - pass start and stop times by value Calling functions affected: - load_image - load_image_lzo - save_image - save_image_lzo - hibernate_preallocate_memory Design decisions: - use ktime_t to preserve same granularity of reporting as before - use centisecs logic as before to avoid 'div by zero' issues caused by using seconds and nanoseconds directly - use monotonic time (ktime_get()) since we only care about elapsed time. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-27PM / Sleep: fix recovery during resuming from hibernationImre Deak1-1/+7
If a device's dev_pm_ops::freeze callback fails during the QUIESCE phase, we don't rollback things correctly calling the thaw and complete callbacks. This could leave some devices in a suspended state in case of an error during resuming from hibernation. Signed-off-by: Imre Deak <imre.deak@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-23Merge branch 'freezer'Rafael J. Wysocki1-9/+48
* freezer: PM / freezer: Clean up code after recent fixes PM: convert do_each_thread to for_each_process_thread OOM, PM: OOM killed task shouldn't escape PM suspend freezer: remove obsolete comments in __thaw_task() freezer: Do not freeze tasks killed by OOM killer
2014-10-23Merge branch 'pm-qos'Rafael J. Wysocki1-1/+26
* pm-qos: PM / QoS: Add PM_QOS_MEMORY_BANDWIDTH class
2014-10-22PM / freezer: Clean up code after recent fixesRafael J. Wysocki1-15/+16
Clean up the code in process.c after recent changes to get rid of unnecessary labels and goto statements. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-21PM: convert do_each_thread to for_each_process_threadMichal Hocko1-8/+8
as per 0c740d0afc3b (introduce for_each_thread() to replace the buggy while_each_thread()) get rid of do_each_thread { } while_each_thread() construct and replace it by a more error prone for_each_thread. This patch doesn't introduce any user visible change. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-21OOM, PM: OOM killed task shouldn't escape PM suspendMichal Hocko1-1/+39
PM freezer relies on having all tasks frozen by the time devices are getting frozen so that no task will touch them while they are getting frozen. But OOM killer is allowed to kill an already frozen task in order to handle OOM situtation. In order to protect from late wake ups OOM killer is disabled after all tasks are frozen. This, however, still keeps a window open when a killed task didn't manage to die by the time freeze_processes finishes. Reduce the race window by checking all tasks after OOM killer has been disabled. This is still not race free completely unfortunately because oom_killer_disable cannot stop an already ongoing OOM killer so a task might still wake up from the fridge and get killed without freeze_processes noticing. Full synchronization of OOM and freezer is, however, too heavy weight for this highly unlikely case. Introduce and check oom_kills counter which gets incremented early when the allocator enters __alloc_pages_may_oom path and only check all the tasks if the counter changes during the freezing attempt. The counter is updated so early to reduce the race window since allocator checked oom_killer_disabled which is set by PM-freezing code. A false positive will push the PM-freezer into a slow path but that is not a big deal. Changes since v1 - push the re-check loop out of freeze_processes into check_frozen_processes and invert the condition to make the code more readable as per Rafael Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring) Cc: 3.2+ <stable@vger.kernel.org> # 3.2+ Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-07Merge branch 'pm-domains'Rafael J. Wysocki1-0/+4
* pm-domains: (32 commits) PM / Domains: Rename cpu_data to cpuidle_data PM / Domains: Move dev_pm_domain_attach|detach() to pm_domain.h PM / Domains: Remove legacy API for adding devices through DT PM / Domains: Add genpd attach/detach callbacks PM / Domains: add debugfs listing of struct generic_pm_domain-s ACPI / PM: Convert acpi_dev_pm_detach() into a static function ARM: exynos: Move to generic PM domain DT bindings amba: Add support for attach/detach of PM domains spi: core: Convert to dev_pm_domain_attach|detach() mmc: sdio: Convert to dev_pm_domain_attach|detach() i2c: core: Convert to dev_pm_domain_attach|detach() drivercore / platform: Convert to dev_pm_domain_attach|detach() PM / Domains: Add APIs to attach/detach a PM domain for a device PM / Domains: Add generic OF-based PM domain look-up ACPI / PM: Assign the ->detach() callback when attaching the PM domain PM / Domains: Add a detach callback to the struct dev_pm_domain PM / domains: Spelling s/domian/domain/ PM / domains: Keep declaration of dev_power_governors together PM / domains: Remove default_stop_ok() API drivers: sh: Leave disabling of unused PM domains to genpd ...
2014-10-07Merge branch 'acpi-pm'Rafael J. Wysocki1-11/+38
* acpi-pm: ACPI / sleep: Rework the handling of ACPI GPE wakeup from suspend-to-idle PM / sleep: Rename platform suspend/resume functions in suspend.c PM / sleep: Export dpm_suspend_late/noirq() and dpm_resume_early/noirq()
2014-10-07Merge branch 'pm-sleep'Rafael J. Wysocki3-18/+70
* pm-sleep: PM / hibernate: Iterate over set bits instead of PFNs in swsusp_free() PM / sleep: new suspend_resume trace event for console resume PM / sleep: Update test_suspend option documentation PM / sleep: Enhance test_suspend option with repeat capability PM / sleep: Support freeze as test_suspend option PM / sysfs: avoid shadowing variables
2014-10-07Merge branch 'pm-genirq'Rafael J. Wysocki1-0/+1
* pm-genirq: PM / genirq: Document rules related to system suspend and interrupts PCI / PM: Make PCIe PME interrupts wake up from suspend-to-idle x86 / PM: Set IRQCHIP_SKIP_SET_WAKE for IOAPIC IRQ chip objects genirq: Simplify wakeup mechanism genirq: Mark wakeup sources as armed on suspend genirq: Create helper for flow handler entry check genirq: Distangle edge handler entry genirq: Avoid double loop on suspend genirq: Move MASK_ON_SUSPEND handling into suspend_device_irqs() genirq: Make use of pm misfeature accounting genirq: Add sanity checks for PM options on shared interrupt lines genirq: Move suspend/resume logic into irq/pm code PM / sleep: Mechanism for aborting system suspends unconditionally
2014-09-30PM / hibernate: Iterate over set bits instead of PFNs in swsusp_free()Joerg Roedel1-15/+39
The existing implementation of swsusp_free iterates over all pfns in the system and checks every bit in the two memory bitmaps. This doesn't scale very well with large numbers of pfns, especially when the bitmaps are not populated very densly. Change the algorithm to iterate over the set bits in the bitmaps instead to make it scale better in large memory configurations. Also add a memory_bm_clear_current() helper function that clears the bit for the last position returned from the memory bitmap. This new version adds a !NULL check for the memory bitmaps before they are walked. Not doing so causes a kernel crash when the bitmaps are NULL. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30ACPI / sleep: Rework the handling of ACPI GPE wakeup from suspend-to-idleRafael J. Wysocki1-1/+20
The ACPI GPE wakeup from suspend-to-idle is currently based on using the IRQF_NO_SUSPEND flag for the ACPI SCI, but that is problematic for a couple of reasons. First, in principle the ACPI SCI may be shared and IRQF_NO_SUSPEND does not really work well with shared interrupts. Second, it may require the ACPI subsystem to special-case the handling of device notifications depending on whether or not they are received during suspend-to-idle in some places which would lead to fragile code. Finally, it's better the handle ACPI wakeup interrupts consistently with wakeup interrupts from other sources. For this reason, remove the IRQF_NO_SUSPEND flag from the ACPI SCI and use enable_irq_wake()/disable_irq_wake() with it instead, which requires two additional platform hooks to be added to struct platform_freeze_ops. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30PM / sleep: Rename platform suspend/resume functions in suspend.cRafael J. Wysocki1-10/+10
Rename several local functions related to platform handling during system suspend resume in suspend.c so that their names better reflect their roles. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30PM / sleep: Export dpm_suspend_late/noirq() and dpm_resume_early/noirq()Rafael J. Wysocki1-3/+11
Subsequent change sets will add platform-related operations between dpm_suspend_late() and dpm_suspend_noirq() as well as between dpm_resume_noirq() and dpm_resume_early() in suspend_enter(), so export these functions for suspend_enter() to be able to call them separately and split the invocations of dpm_suspend_end() and dpm_resume_start() in there accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30Merge branch 'pm-genirq' into acpi-pmRafael J. Wysocki1-0/+1
2014-09-29Merge back earlier 'pm-sleep' material for v3.18.Rafael J. Wysocki2-3/+31
2014-09-25PM / QoS: Add PM_QOS_MEMORY_BANDWIDTH classTomeu Vizoso1-1/+26
Also adds a class type PM_QOS_SUM that aggregates the values by summing them. It can be used by memory controllers to calculate the optimum clock frequency based on the bandwidth needs of the different memory clients. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-25Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"Rafael J. Wysocki1-35/+15
Revert commit 6efde38f0769 (PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()) that introduced a NULL pointer dereference during system resume from hibernation: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 PGD b39c2067 PUD b39c1067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: <irrelevant list of modules> CPU: 1 PID: 4898 Comm: s2disk Tainted: G C 3.17-rc5-amd64 #1 Debian 3.17~rc5-1~exp1 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 task: ffff88023155ea40 ti: ffff8800b3b14000 task.ti: ffff8800b3b14000 RIP: 0010:[<ffffffff810a8cc1>] [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 RSP: 0018:ffff8800b3b17ea8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800b39bab00 RCX: 0000000000000001 RDX: ffff8800b39bab10 RSI: ffff8800b39bab00 RDI: 0000000000000000 RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000 R10: ffff8800b39bab10 R11: 0000000000000246 R12: ffffea0000000000 R13: ffff880232f485a0 R14: ffff88023ac27cd8 R15: ffff880232927590 FS: 00007f406d83b700(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000b3a62000 CR4: 00000000000007e0 Stack: ffff8800b39bab00 0000000000000010 ffff880232927590 ffffffff810acb4a ffff8800b39bab00 ffffffff811a955a ffff8800b39bab10 0000000000000000 ffff88023155f098 ffffffff81a6b8c0 ffff88023155ea40 0000000000000007 Call Trace: [<ffffffff810acb4a>] ? snapshot_release+0x2a/0xb0 [<ffffffff811a955a>] ? __fput+0xca/0x1d0 [<ffffffff81080627>] ? task_work_run+0x97/0xd0 [<ffffffff81012d89>] ? do_notify_resume+0x69/0xa0 [<ffffffff8151452a>] ? int_signal+0x12/0x17 Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 48 8b 05 ba 62 9c 00 49 bc 00 00 00 00 00 ea ff ff 48 8b 3d a1 62 9c 00 55 53 <48> 8b 10 48 89 50 18 48 8b 52 20 48 c7 40 28 00 00 00 00 c7 40 RIP [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 RSP <ffff8800b3b17ea8> CR2: 0000000000000000 ---[ end trace f02be86a1ec0cccb ]--- due to forbidden_pages_map being NULL in swsusp_free(). Fixes: 6efde38f0769 "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()" Reported-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-22ARM: exynos: Move to generic PM domain DT bindingsTomasz Figa1-1/+1
This patch moves Exynos PM domain code to use the new generic PM domain look-up framework introduced in previous patches, thus also allowing the new code to be compiled with CONFIG_ARCH_EXYNOS. This patch was originally submitted by Tomasz Figa when he was employed by Samsung. Link: http://marc.info/?l=linux-pm&m=139955336002083&w=2 Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Kevin Hilman <khilman@linaro.org> Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-22PM / Domains: Add generic OF-based PM domain look-upTomasz Figa1-0/+4
This patch introduces generic code to perform PM domain look-up using device tree and automatically bind devices to their PM domains. Generic device tree bindings are introduced to specify PM domains of devices in their device tree nodes. Backwards compatibility with legacy Samsung-specific PM domain bindings is provided, but for now the new code is not compiled when CONFIG_ARCH_EXYNOS is selected to avoid collision with legacy code. This will change as soon as the Exynos PM domain code gets converted to use the generic framework in further patch. This patch was originally submitted by Tomasz Figa when he was employed by Samsung. Link: http://marc.info/?l=linux-pm&m=139955349702152&w=2 Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rob Herring <robh@kernel.org> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-22PM / sleep: new suspend_resume trace event for console resumeTodd E Brandt1-0/+2
This patch adds another suspend_resume trace event for analyze_suspend to capture. The resume_console call can take several hundred milliseconds if the printk buffer is full of debug info. The tool will now inform testers of the wasted time and encourage them to disable it in production builds. Signed-off-by: Todd Brandt <todd.e.brandt@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-09PM / sleep: Enhance test_suspend option with repeat capabilitySrinivas Pandruvada1-3/+22
Enhanced test_suspend boot paramter to repeat tests multiple times, by adding optional repeat count. The new boot param syntax: test_suspend="mem|freeze|standby[,N]" Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-09PM / sleep: Support freeze as test_suspend optionSrinivas Pandruvada1-0/+7
Added freeze as one of the option for test_suspend boot param. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-03PM / sleep: Fix test_suspend= command line optionRafael J. Wysocki3-13/+21
After commit d431cbc53cb7 (PM / sleep: Simplify sleep states sysfs interface code) the pm_states[] array is not populated initially, which causes setup_test_suspend() to always fail and the suspend testing during boot doesn't work any more. Fix the problem by using pm_labels[] instead of pm_states[] in setup_test_suspend() and storing a pointer to the label of the sleep state to test rather than the number representing it, because the connection between the state numbers and labels is only established by suspend_set_ops(). Fixes: d431cbc53cb7 (PM / sleep: Simplify sleep states sysfs interface code) Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-01PM / sleep: Mechanism for aborting system suspends unconditionallyRafael J. Wysocki1-0/+1
It sometimes may be necessary to abort a system suspend in progress or wake up the system from suspend-to-idle even if the pm_wakeup_event()/pm_stay_awake() mechanism is not enabled. For this purpose, introduce a new global variable pm_abort_suspend and make pm_wakeup_pending() check its value. Also add routines for manipulating that variable. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-11Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle'Rafael J. Wysocki1-1/+20
* pm-sleep: PM / hibernate: avoid unsafe pages in e820 reserved regions * pm-cpufreq: cpufreq: arm_big_little: fix module license spec cpufreq: speedstep-smi: fix decimal printf specifiers cpufreq: OPP: Avoid sleeping while atomic cpufreq: cpu0: Do not print error message when deferring cpufreq: integrator: Use set_cpus_allowed_ptr * pm-cpuidle: cpuidle: menu: Lookup CPU runqueues less cpuidle: menu: Call nr_iowait_cpu less times cpuidle: menu: Use ktime_to_us instead of reinventing the wheel cpuidle: menu: Use shifts when calculating averages where possible
2014-08-06Merge tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds6-218/+475
Pull ACPI and power management updates from Rafael Wysocki: "Again, ACPICA leads the pack (47 commits), followed by cpufreq (18 commits) and system suspend/hibernation (9 commits). From the new code perspective, the ACPICA update brings ACPI 5.1 to the table, including a new device configuration object called _DSD (Device Specific Data) that will hopefully help us to operate device properties like Device Trees do (at least to some extent) and changes related to supporting ACPI on ARM. Apart from that we have hibernation changes making it use radix trees to store memory bitmaps which should speed up some operations carried out by it quite significantly. We also have some power management changes related to suspend-to-idle (the "freeze" sleep state) support and more preliminary changes needed to support ACPI on ARM (outside of ACPICA). The rest is fixes and cleanups pretty much everywhere. Specifics: - ACPICA update to upstream version 20140724. That includes ACPI 5.1 material (support for the _CCA and _DSD predefined names, changes related to the DMAR and PCCT tables and ARM support among other things) and cleanups related to using ACPICA's header files. A major part of it is related to acpidump and the core code used by that utility. Changes from Bob Moore, David E Box, Lv Zheng, Sascha Wildner, Tomasz Nowicki, Hanjun Guo. - Radix trees for memory bitmaps used by the hibernation core from Joerg Roedel. - Support for waking up the system from suspend-to-idle (also known as the "freeze" sleep state) using ACPI-based PCI wakeup signaling (Rafael J Wysocki). - Fixes for issues related to ACPI button events (Rafael J Wysocki). - New device ID for an ACPI-enumerated device included into the Wildcat Point PCH from Jie Yang. - ACPI video updates related to backlight handling from Hans de Goede and Linus Torvalds. - Preliminary changes needed to support ACPI on ARM from Hanjun Guo and Graeme Gregory. - ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui. - Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros (Rafael J Wysocki). - ACPI-based device hotplug cleanups from Wei Yongjun and Rafael J Wysocki. - Cleanups and improvements related to system suspend from Lan Tianyu, Randy Dunlap and Rafael J Wysocki. - ACPI battery cleanup from Wei Yongjun. - cpufreq core fixes from Viresh Kumar. - Elimination of a deadband effect from the cpufreq ondemand governor and intel_pstate driver cleanups from Stratos Karafotis. - 350MHz CPU support for the powernow-k6 cpufreq driver from Mikulas Patocka. - Fix for the imx6 cpufreq driver from Anson Huang. - cpuidle core and governor cleanups from Daniel Lezcano, Sandeep Tripathy and Mohammad Merajul Islam Molla. - Build fix for the big_little cpuidle driver from Sachin Kamat. - Configuration fix for the Operation Performance Points (OPP) framework from Mark Brown. - APM cleanup from Jean Delvare. - cpupower utility fixes and cleanups from Peter Senna Tschudin, Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas Renninger" * tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (118 commits) ACPI / LPSS: add LPSS device for Wildcat Point PCH ACPI / PNP: Replace faulty is_hex_digit() by isxdigit() ACPICA: Update version to 20140724. ACPICA: ACPI 5.1: Update for PCCT table changes. ACPICA/ARM: ACPI 5.1: Update for GTDT table changes. ACPICA/ARM: ACPI 5.1: Update for MADT changes. ACPICA/ARM: ACPI 5.1: Update for FADT changes. ACPICA: ACPI 5.1: Support for the _CCA predifined name. ACPICA: ACPI 5.1: New notify value for System Affinity Update. ACPICA: ACPI 5.1: Support for the _DSD predefined name. ACPICA: Debug object: Add current value of Timer() to debug line prefix. ACPICA: acpihelp: Add UUID support, restructure some existing files. ACPICA: Utilities: Fix local printf issue. ACPICA: Tables: Update for DMAR table changes. ACPICA: Remove some extraneous printf arguments. ACPICA: Update for comments/formatting. No functional changes. ACPICA: Disassembler: Add support for the ToUUID opererator (macro). ACPICA: Remove a redundant cast to acpi_size for ACPI_OFFSET() macro. ACPICA: Work around an ancient GCC bug. ACPI / processor: Make it possible to get local x2apic id via _MAT ...
2014-08-06PM / hibernate: avoid unsafe pages in e820 reserved regionsLee, Chun-Yi1-1/+20
When the machine doesn't well handle the e820 persistent when hibernate resuming, then it may cause page fault when writing image to snapshot buffer: [ 17.929495] BUG: unable to handle kernel paging request at ffff880069d4f000 [ 17.933469] IP: [<ffffffff810a1cf0>] load_image_lzo+0x810/0xe40 [ 17.933469] PGD 2194067 PUD 77ffff067 PMD 2197067 PTE 0 [ 17.933469] Oops: 0002 [#1] SMP ... The ffff880069d4f000 page is in e820 reserved region of resume boot kernel: [ 0.000000] BIOS-e820: [mem 0x0000000069d4f000-0x0000000069e12fff] reserved ... [ 0.000000] PM: Registered nosave memory: [mem 0x69d4f000-0x69e12fff] So snapshot.c mark the pfn to forbidden pages map. But, this page is also in the memory bitmap in snapshot image because it's an original page used by image kernel, so it will also mark as an unsafe(free) page in prepare_image(). That means the page in e820 when resuming mark as "forbidden" and "free", it causes get_buffer() treat it as an allocated unsafe page. Then snapshot_write_next() return this page to load_image, load_image writing content to this address, but this page didn't really allocated . So, we got page fault. Although the root cause is from BIOS, I think aggressive check and significant message in kernel will better then a page fault for issue tracking, especially when serial console unavailable. This patch adds code in mark_unsafe_pages() for check does free pages in nosave region. If so, then it print message and return fault to stop whole S4 resume process: [ 8.166004] PM: Image loading progress: 0% [ 8.658717] PM: 0x6796c000 in e820 nosave region: [mem 0x6796c000-0x6796cfff] [ 8.918737] PM: Read 2511940 kbytes in 1.04 seconds (2415.32 MB/s) [ 8.926633] PM: Error -14 resuming [ 8.933534] PM: Failed to load hibernation image, recovering. Reviewed-by: Takashi Iwai <tiwai@suse.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Lee, Chun-Yi <jlee@suse.com> [rjw: Subject] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-05Merge branch 'pm-sleep'Rafael J. Wysocki1-127/+367
* pm-sleep: PM / Hibernate: Touch Soft Lockup Watchdog in rtree_next_node PM / Hibernate: Remove the old memory-bitmap implementation PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free() PM / Hibernate: Implement position keeping in radix tree PM / Hibernate: Add memory_rtree_find_bit function PM / Hibernate: Create a Radix-Tree to store memory bitmap PM / sleep: fix kernel-doc warnings in drivers/base/power/main.c