aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-03-15x86: also use debug_pagealloc_enabled() for free_init_pagesChristian Borntraeger1-14/+15
we want to couple all debugging features with debug_pagealloc_enabled() and not with the config option CONFIG_DEBUG_PAGEALLOC. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15x86: query dynamic DEBUG_PAGEALLOC settingChristian Borntraeger3-16/+10
We can use debug_pagealloc_enabled() to check if we can map the identity mapping with 2MB pages. We can also add the state into the dump_stack output. The patch does not touch the code for the 1GB pages, which ignored CONFIG_DEBUG_PAGEALLOC. Do we need to fence this as well? Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Laura Abbott <labbott@fedoraproject.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-14Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-3/+3
Pull scheduler updates from Ingo Molnar: "The main changes in this cycle are: - Make schedstats a runtime tunable (disabled by default) and optimize it via static keys. As most distributions enable CONFIG_SCHEDSTATS=y due to its instrumentation value, this is a nice performance enhancement. (Mel Gorman) - Implement 'simple waitqueues' (swait): these are just pure waitqueues without any of the more complex features of full-blown waitqueues (callbacks, wake flags, wake keys, etc.). Simple waitqueues have less memory overhead and are faster. Use simple waitqueues in the RCU code (in 4 different places) and for handling KVM vCPU wakeups. (Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker, Marcelo Tosatti) - sched/numa enhancements (Rik van Riel) - NOHZ performance enhancements (Rik van Riel) - Various sched/deadline enhancements (Steven Rostedt) - Various fixes (Peter Zijlstra) - ... and a number of other fixes, cleanups and smaller enhancements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits) sched/cputime: Fix steal_account_process_tick() to always return jiffies sched/deadline: Remove dl_new from struct sched_dl_entity Revert "kbuild: Add option to turn incompatible pointer check into error" sched/deadline: Remove superfluous call to switched_to_dl() sched/debug: Fix preempt_disable_ip recording for preempt_disable() sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity time, acct: Drop irq save & restore from __acct_update_integrals() acct, time: Change indentation in __acct_update_integrals() sched, time: Remove non-power-of-two divides from __acct_update_integrals() sched/rt: Kick RT bandwidth timer immediately on start up sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug sched/debug: Move sched_domain_sysctl to debug.c sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c sched/rt: Fix PI handling vs. sched_setscheduler() sched/core: Remove duplicated sched_group_set_shares() prototype sched/fair: Consolidate nohz CPU load update code sched/fair: Avoid using decay_load_missed() with a negative value sched/deadline: Always calculate end of period on sched_yield() sched/cgroup: Fix cgroup entity load tracking tear-down rcu: Use simple wait queues where possible in rcutree ...
2016-03-14Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds14-165/+553
Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call
2016-03-14Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds62-804/+1047
Pull perf updates from Ingo Molnar: "Main kernel side changes: - Big reorganization of the x86 perf support code. The old code grew organically deep inside arch/x86/kernel/cpu/perf* and its naming became somewhat messy. The new location is under arch/x86/events/, using the following cleaner hierarchy of source code files: perf/x86: Move perf_event.c .................. => x86/events/core.c perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch] perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch] perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch] perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c (Borislav Petkov) - Update various x86 PMU constraint and hw support details (Stephane Eranian) - Optimize kprobes for BPF execution (Martin KaFai Lau) - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas Gleixner) - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner) - Various fixes and smaller cleanups. There are lots of perf tooling updates as well. A few highlights: perf report/top: - Hierarchy histogram mode for 'perf top' and 'perf report', showing multiple levels, one per --sort entry: (Namhyung Kim) On a mostly idle system: # perf top --hierarchy -s comm,dso Then expand some levels and use 'P' to take a snapshot: # cat perf.hist.0 - 92.32% perf 58.20% perf 22.29% libc-2.22.so 5.97% [kernel] 4.18% libelf-0.165.so 1.69% [unknown] - 4.71% qemu-system-x86 3.10% [kernel] 1.60% qemu-system-x86_64 (deleted) + 2.97% swapper # - Add 'L' hotkey to dynamicly set the percent threshold for histogram entries and callchains, i.e. dynamicly do what the --percent-limit command line option to 'top' and 'report' does. (Namhyung Kim) perf mem: - Allow specifying events via -e in 'perf mem record', also listing what events can be specified via 'perf mem record -e list' (Jiri Olsa) perf record: - Add 'perf record' --all-user/--all-kernel options, so that one can tell that all the events in the command line should be restricted to the user or kernel levels (Jiri Olsa), i.e.: perf record -e cycles:u,instructions:u is equivalent to: perf record --all-user -e cycles,instructions - Make 'perf record' collect CPU cache info in the perf.data file header: $ perf record usleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ] $ perf report --header-only -I | tail -10 | head -8 # CPU cache info: # L1 Data 32K [0-1] # L1 Instruction 32K [0-1] # L1 Data 32K [2-3] # L1 Instruction 32K [2-3] # L2 Unified 256K [0-1] # L2 Unified 256K [2-3] # L3 Unified 4096K [0-3] Will be used in 'perf c2c' and eventually in 'perf diff' to allow, for instance running the same workload in multiple machines and then when using 'diff' show the hardware difference. (Jiri Olsa) - Improved support for Java, using the JVMTI agent library to do jitdumps that then will be inserted in synthesized PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized ELF files stored in ~/.debug and keyed with build-ids, to allow symbol resolution and even annotation with source line info, see the changeset comments to see how to use it (Stephane Eranian) perf script/trace: - Decode data_src values (e.g. perf.data files generated by 'perf mem record') in 'perf script': (Jiri Olsa) # perf script perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Improve support to 'data_src', 'weight' and 'addr' fields in 'perf script' (Jiri Olsa) - Handle empty print fmts in 'perf script -s' i.e. when running python or perl scripts (Taeung Song) perf stat: - 'perf stat' now shows shadow metrics (insn per cycle, etc) in interval mode too. E.g: # perf stat -I 1000 -e instructions,cycles sleep 1 # time counts unit events 1.000215928 519,620 instructions # 0.69 insn per cycle 1.000215928 752,003 cycles <SNIP> - Port 'perf kvm stat' to PowerPC (Hemant Kumar) - Implement CSV metrics output in 'perf stat' (Andi Kleen) perf BPF support: - Support converting data from bpf events in 'perf data' (Wang Nan) - Print bpf-output events in 'perf script': (Wang Nan). # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000 # perf script usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms]) BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a 0008: 42 50 46 20 65 76 65 6e BPF even 0010: 74 21 00 00 t!.. BPF string: "Raise a BPF event!" # - Add API to set values of map entries in a BPF object, be it individual map slots or ranges (Wang Nan) - Introduce support for the 'bpf-output' event (Wang Nan) - Add glue to read perf events in a BPF program (Wang Nan) - Improve support for bpf-output events in 'perf trace' (Wang Nan) ... and tons of other changes as well - see the shortlog and git log for details!" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits) perf stat: Add --metric-only support for -A perf stat: Implement --metric-only mode perf stat: Document CSV format in manpage perf hists browser: Check sort keys before hot key actions perf hists browser: Allow thread filtering for comm sort key perf tools: Add sort__has_comm variable perf tools: Recalc total periods using top-level entries in hierarchy perf tools: Remove nr_sort_keys field perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry() perf tools: Remove hist_entry->fmt field perf tools: Fix command line filters in hierarchy mode perf tools: Add more sort entry check functions perf tools: Fix hist_entry__filter() for hierarchy perf jitdump: Build only on supported archs tools lib traceevent: Add '~' operation within arg_num_eval() perf tools: Omit unnecessary cast in perf_pmu__parse_scale perf tools: Pass perf_hpp_list all the way through setup_sort_list perf tools: Fix perf script python database export crash perf jitdump: DWARF is also needed perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes ...
2016-03-14Merge branch 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds14-63/+26
Pull read-only kernel memory updates from Ingo Molnar: "This tree adds two (security related) enhancements to the kernel's handling of read-only kernel memory: - extend read-only kernel memory to a new class of formerly writable kernel data: 'post-init read-only memory' via the __ro_after_init attribute, and mark the ARM and x86 vDSO as such read-only memory. This kind of attribute can be used for data that requires a once per bootup initialization sequence, but is otherwise never modified after that point. This feature was based on the work by PaX Team and Brad Spengler. (by Kees Cook, the ARM vDSO bits by David Brown.) - make CONFIG_DEBUG_RODATA always enabled on x86 and remove the Kconfig option. This simplifies the kernel and also signals that read-only memory is the default model and a first-class citizen. (Kees Cook)" * 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ARM/vdso: Mark the vDSO code read-only after init x86/vdso: Mark the vDSO code read-only after init lkdtm: Verify that '__ro_after_init' works correctly arch: Introduce post-init read-only memory x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings asm-generic: Consolidate mark_rodata_ro()
2016-03-14Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-16/+9
Pull locking changes from Ingo Molnar: "Various updates: - Futex scalability improvements: remove page lock use for shared futex get_futex_key(), which speeds up 'perf bench futex hash' benchmarks by over 40% on a 60-core Westmere. This makes anon-mem shared futexes perform close to private futexes. (Mel Gorman) - lockdep hash collision detection and fix (Alfredo Alvarez Fernandez) - lockdep testing enhancements (Alfredo Alvarez Fernandez) - robustify lockdep init by using hlists (Andrew Morton, Andrey Ryabinin) - mutex and csd_lock micro-optimizations (Davidlohr Bueso) - small x86 barriers tweaks (Michael S Tsirkin) - qspinlock updates (Waiman Long)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait() locking/csd_lock: Explicitly inline csd_lock*() helpers futex: Replace barrier() in unqueue_me() with READ_ONCE() locking/lockdep: Detect chain_key collisions locking/lockdep: Prevent chain_key collisions tools/lib/lockdep: Fix link creation warning tools/lib/lockdep: Add tests for AA and ABBA locking tools/lib/lockdep: Add userspace version of READ_ONCE() tools/lib/lockdep: Fix the build on recent kernels locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h locking/mutex: Allow next waiter lockless wakeup locking/pvqspinlock: Enable slowpath locking count tracking locking/qspinlock: Use smp_cond_acquire() in pending code locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock() locking/mcs: Fix mcs_spin_lock() ordering futex: Remove requirement for lock_page() in get_futex_key() futex: Rename barrier references in ordering guarantees locking/atomics: Update comment about READ_ONCE() and structures locking/lockdep: Eliminate lockdep_init() locking/lockdep: Convert hash tables to hlists ...
2016-03-14Merge branch 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-44/+45
Pull ram resource handling changes from Ingo Molnar: "Core kernel resource handling changes to support NVDIMM error injection. This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM, for System RAM while keeping the current IORESOURCE_MEM type bit set for all memory-mapped ranges (including System RAM) for backward compatibility. With this resource flag it no longer takes a strcmp() loop through the resource tree to find "System RAM" resources. The new resource type is then used to extend ACPI/APEI error injection facility to also support NVDIMM" * 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ACPI/EINJ: Allow memory error injection to NVDIMM resource: Kill walk_iomem_res() x86/kexec: Remove walk_iomem_res() call with GART type x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search resource: Add walk_iomem_res_desc() memremap: Change region_intersects() to take @flags and @desc arm/samsung: Change s3c_pm_run_res() to use System RAM type resource: Change walk_system_ram() to use System RAM type drivers: Initialize resource entry to zero xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM kexec: Set IORESOURCE_SYSTEM_RAM for System RAM arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM ia64: Set System RAM type and descriptor x86/e820: Set System RAM type and descriptor resource: Add I/O resource descriptor resource: Handle resource flags properly resource: Add System RAM resource type
2016-03-12Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds5-37/+79
Pull x86 fixes from Ingo Molnar: "This fixes 3 FPU handling related bugs, an EFI boot crash and a runtime warning. The EFI fix arrived late but I didn't want to delay it to after v4.5 because the effects are pretty bad for the systems that are affected by it" [ Actually, I don't think the EFI fix really matters yet, because we haven't switched to the separate EFI page tables in mainline yet ] * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables x86/fpu: Fix eager-FPU handling on legacy FPU machines x86/delay: Avoid preemptible context checks in delay_mwaitx() x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off") x86/fpu: Fix 'no387' regression
2016-03-12x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tablesMatt Fleming1-17/+62
Some machines have EFI regions in page zero (physical address 0x00000000) and historically that region has been added to the e820 map via trim_bios_range(), and ultimately mapped into the kernel page tables. It was not mapped via efi_map_regions() as one would expect. Alexis reports that with the new separate EFI page tables some boot services regions, such as page zero, are not mapped. This triggers an oops during the SetVirtualAddressMap() runtime call. For the EFI boot services quirk on x86 we need to memblock_reserve() boot services regions until after SetVirtualAddressMap(). Doing that while respecting the ownership of regions that may have already been reserved by the kernel was the motivation behind this commit: 7d68dc3f1003 ("x86, efi: Do not reserve boot services regions within reserved areas") That patch was merged at a time when the EFI runtime virtual mappings were inserted into the kernel page tables as described above, and the trick of setting ->numpages (and hence the region size) to zero to track regions that should not be freed in efi_free_boot_services() meant that we never mapped those regions in efi_map_regions(). Instead we were relying solely on the existing kernel mappings. Now that we have separate page tables we need to make sure the EFI boot services regions are mapped correctly, even if someone else has already called memblock_reserve(). Instead of stashing a tag in ->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it generally makes no sense to mark a boot services region as required at runtime, it's pretty much guaranteed the firmware will not have already set this bit. For the record, the specific circumstances under which Alexis triggered this bug was that an EFI runtime driver on his machine was responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during SetVirtualAddressMap(). The event handler for this driver looks like this, sub rsp,0x28 lea rdx,[rip+0x2445] # 0xaa948720 mov ecx,0x4 call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720) mov r11,QWORD PTR [rip+0x2434] # 0xaa948720 xor eax,eax mov BYTE PTR [r11+0x1],0x1 add rsp,0x28 ret Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting instruction because ConvertPointer() was being called to convert the address 0x0000000000000000, which when converted is left unchanged and remains 0x0000000000000000. The output of the oops trace gave the impression of a standard NULL pointer dereference bug, but because we're accessing physical addresses during ConvertPointer(), it wasn't. EFI boot services code is stored at that address on Alexis' machine. Reported-by: Alexis Murzeau <amurzeau@gmail.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raphael Hertzog <hertzog@debian.org> Cc: Roger Shimizu <rogershimizu@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-12x86/fpu: Fix eager-FPU handling on legacy FPU machinesBorislav Petkov2-2/+4
i486 derived cores like Intel Quark support only the very old, legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and our FPU code wasn't handling the saving and restoring there properly in the 'eagerfpu' case. So after we made eagerfpu the default for all CPU types: 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs these old FPU designs broke. First, Andy Shevchenko reported a splat: WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160 which was us trying to execute FXRSTOR on those machines even though they don't support it. After taking care of that, Bryan O'Donoghue reported that a simple FPU test still failed because we weren't initializing the FPU state properly on those machines. Take care of all that. Reported-and-tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10x86/delay: Avoid preemptible context checks in delay_mwaitx()Borislav Petkov1-1/+1
We do use this_cpu_ptr(&cpu_tss) as a cacheline-aligned, seldomly accessed per-cpu var as the MONITORX target in delay_mwaitx(). However, when called in preemptible context, this_cpu_ptr -> smp_processor_id() -> debug_smp_processor_id() fires: BUG: using smp_processor_id() in preemptible [00000000] code: udevd/312 caller is delay_mwaitx+0x40/0xa0 But we don't care about that check - we only need cpu_tss as a MONITORX target and it doesn't really matter which CPU's var we're touching as we're going idle anyway. Fix that. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Huang Rui <ray.huang@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/20160309205622.GG6564@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0Paolo Bonzini1-1/+3
KVM has special logic to handle pages with pte.u=1 and pte.w=0 when CR0.WP=1. These pages' SPTEs flip continuously between two states: U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed) and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed). When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1. When guest EFER has the NX bit cleared, the reserved bit check thinks that the latter state is invalid; teach it that the smep_andnot_wp case will also use the NX bit of SPTEs. Cc: stable@vger.kernel.org Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.inel.com> Fixes: c258b62b264fdc469b6d3610a907708068145e3b Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-10KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 comboPaolo Bonzini1-13/+23
Yes, all of these are needed. :) This is admittedly a bit odd, but kvm-unit-tests access.flat tests this if you run it with "-cpu host" and of course ept=0. KVM runs the guest with CR0.WP=1, so it must handle supervisor writes specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0. When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and restarts execution. This will still cause a user write to fault, while supervisor writes will succeed. User reads will fault spuriously now, and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads will be enabled and supervisor writes disabled, going back to the originary situation where supervisor writes fault spuriously. When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0. If the guest has not enabled NX, the result is a continuous stream of page faults due to the NX bit being reserved. The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry control, so they do not use user-return notifiers for EFER---if they did, EFER.NX would be forced to the same value as the host). There is another bug in the reserved bit check, which I've split to a separate patch for easier application to stable kernels. Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-10Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar4-12/+22
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")Yu-cheng Yu2-11/+4
Leonid Shatz noticed that the SDM interpretation of the following recent commit: 394db20ca240741 ("x86/fpu: Disable AVX when eagerfpu is off") ... is incorrect and that the original behavior of the FPU code was correct. Because AVX is not stated in CR0 TS bit description, it was mistakenly believed to be not supported for lazy context switch. This turns out to be false: Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers: 'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task.' Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception Specification: 'AVX instructions refer to exceptions by classes that include #NM "Device Not Available" exception for lazy context switch.' So revert the commit. Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-09x86/fpu: Fix 'no387' regressionAndy Lutomirski1-6/+8
After fixing FPU option parsing, we now parse the 'no387' boot option too early: no387 clears X86_FEATURE_FPU before it's even probed, so the boot CPU promptly re-enables it. I suspect it gets even more confused on SMP. Fix the probing code to leave X86_FEATURE_FPU off if it's been disabled by setup_clear_cpu_cap(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Fixes: 4f81cbafcce2 ("x86/fpu: Fix early FPU command-line parsing") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/mm, x86/mce: Add memcpy_mcsafe()Tony Luck3-0/+132
Make use of the EXTABLE_FAULT exception table entries to write a kernel copy routine that doesn't crash the system if it encounters a machine check. Prime use case for this is to copy from large arrays of non-volatile memory used as storage. We have to use an unrolled copy loop for now because current hardware implementations treat a machine check in "rep mov" as fatal. When that is fixed we can simplify. Return type is a "bool". True means that we copied OK, false means that it didn't. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@gmail.com> Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08perf/x86/intel/rapl: Simplify quirk handling even moreBorislav Petkov1-19/+13
Drop the quirk() function pointer in favor of a simple boolean which says whether the quirk should be applied or not. Update comment while at it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-tip-commits@vger.kernel.org Link: http://lkml.kernel.org/r/20160308164041.GF16568@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/nmi: Mark 'ignore_nmis' as __read_mostlyKostenzer Felix1-1/+2
ignore_nmis is used in two distinct places: 1. modified through {stop,restart}_nmi by alternative_instructions 2. read by do_nmi to determine if default_do_nmi should be called or not thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath. Signed-off-by: Kostenzer Felix <fkostenzer@live.at> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08KVM: VMX: disable PEBS before a guest entryRadim Krčmář1-0/+7
Linux guests on Haswell (and also SandyBridge and Broadwell, at least) would crash if you decided to run a host command that uses PEBS, like perf record -e 'cpu/mem-stores/pp' -a This happens because KVM is using VMX MSR switching to disable PEBS, but SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it isn't safe: When software needs to reconfigure PEBS facilities, it should allow a quiescent period between stopping the prior event counting and setting up a new PEBS event. The quiescent period is to allow any latent residual PEBS records to complete its capture at their previously specified buffer address (provided by IA32_DS_AREA). There might not be a quiescent period after the MSR switch, so a CPU ends up using host's MSR_IA32_DS_AREA to access an area in guest's memory. (Or MSR switching is just buggy on some models.) The guest can learn something about the host this way: If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results in #PF where we leak host's MSR_IA32_DS_AREA through CR2. After that, a malicious guest can map and configure memory where MSR_IA32_DS_AREA is pointing and can therefore get an output from host's tracing. This is not a critical leak as the host must initiate with PEBS tracing and I have not been able to get a record from more than one instruction before vmentry in vmx_vcpu_run() (that place has most registers already overwritten with guest's). We could disable PEBS just few instructions before vmentry, but disabling it earlier shouldn't affect host tracing too much. We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that optimization isn't worth its code, IMO. (If you are implementing PEBS for guests, be sure to handle the case where both host and guest enable PEBS, because this patch doesn't.) Fixes: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.") Cc: <stable@vger.kernel.org> Reported-by: Jiří Olša <jolsa@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08perf/x86/intel: Fix PEBS data source interpretation on Nehalem/WestmereAndi Kleen3-1/+14
Jiri reported some time ago that some entries in the PEBS data source table in perf do not agree with the SDM. We investigated and the bits changed for Sandy Bridge, but the SDM was not updated. perf already implements the bits correctly for Sandy Bridge and later. This patch patches it up for Nehalem and Westmere. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/1456871124-15985-1-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08perf/x86/pebs: Add proper PEBS constraints for BroadwellStephane Eranian3-1/+27
This patch adds a Broadwell specific PEBS event constraint table. Broadwell has a fix for the HT corruption bug erratum HSD29 on Haswell. Therefore, there is no need to mark events 0xd0, 0xd1, 0xd2, 0xd3 has requiring the exclusive mode across both sibling HT threads. This holds true for regular counting and sampling (see core.c) and PEBS (ds.c) which we fix in this patch. In doing so, we relax evnt scheduling for these events, they can now be programmed on any 4 counters without impacting what is measured on the sibling thread. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@redhat.com Cc: adrian.hunter@intel.com Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/1457034642-21837-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08perf/x86/pebs: Add workaround for broken OVFL status on HSW+Stephane Eranian1-0/+10
This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits on Haswell, Broadwell and Skylake processors when using PEBS. The SDM stipulates that when the PEBS iterrupt threshold is crossed, an interrupt is posted and the kernel is interrupted. The kernel will find GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records to drain. But the bits corresponding to the actual counters should NOT be set. The kernel follows the SDM and assumes that all PEBS events are processed in the drain_pebs() callback. The kernel then checks for remaining overflows on any other (non-PEBS) events and processes these in the for_each_bit_set(&status) loop. As it turns out, under certain conditions on HSW and later processors, on PEBS buffer interrupt, bit 62 is set but the counter bits may be set as well. In that case, the kernel drains PEBS and generates SAMPLES with the EXACT tag, then it processes the counter bits, and generates normal (non-EXACT) SAMPLES. I ran into this problem by trying to understand why on HSW sampling on a PEBS event was sometimes returning SAMPLES without the EXACT tag. This should not happen on user level code because HSW has the eventing_ip which always point to the instruction that caused the event. The workaround in this patch simply ensures that the bits for the counters used for PEBS events are cleared after the PEBS buffer has been drained. With this fix 100% of the PEBS samples on my user code report the EXACT tag. Before: $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase $ perf report -D | fgrep SAMPLES PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y \--- EXACT tag is missing After: $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase $ perf report -D | fgrep SAMPLES PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y \--- EXACT tag is set The problem tends to appear more often when multiple PEBS events are used. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/1457034642-21837-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08perf/x86/intel: Add definition for PT PMI bitStephane Eranian1-0/+1
This patch adds a definition for GLOBAL_OVFL_STATUS bit 55 which is used with the Processor Trace (PT) feature. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/1457034642-21837-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmiKan Liang3-3/+29
This patch tries to fix a PEBS warning found in my stress test. The following perf command can easily trigger the pebs warning or spurious NMI error on Skylake/Broadwell/Haswell platforms: sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a Also the NMI watchdog must be enabled. For this case, the events number is larger than counter number. So perf has to do multiplexing. In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out old events, rotate_ctx, schedule in new events and finally perf_pmu_enable(). If the old events include precise event, the MSR_IA32_PEBS_ENABLE should be cleared when perf_pmu_disable(). The MSR_IA32_PEBS_ENABLE should keep 0 until the perf_pmu_enable() is called and the new event is precise event. However, there is a corner case which could restore PEBS_ENABLE to stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL will be set to 0 to stop overflow and followed PMI. But there may be pending PMI from an earlier overflow, which cannot be stopped. So even GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At the end of the PMI handler, __intel_pmu_enable_all() will be called, which will restore the stale values if old events haven't scheduled out. Once the stale pebs value is set, it's impossible to be corrected if the new events are non-precise. Because the pebs_enabled will be set to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE setting. As a result, the following NMI with stale PEBS_ENABLE trigger pebs warning. The pending PMI after enabled=0 will become harmless if the NMI handler does not change the state. This patch checks cpuc->enabled in pmi and only restore the state when PMU is active. Here is the dump: Call Trace: <NMI> [<ffffffff813c3a2e>] dump_stack+0x63/0x85 [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320 [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460 [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40 [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330 [<ffffffff811f2f11>] ? unmap_kernel_range_noflush+0x11/0x20 [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0 [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170 [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50 [<ffffffff810310b9>] nmi_handle+0x69/0x120 [<ffffffff810316f6>] default_do_nmi+0xe6/0x100 [<ffffffff810317f2>] do_nmi+0xe2/0x130 [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40 [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40 [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40 <<EOE>> <IRQ> [<ffffffff81006df8>] ? x86_perf_event_set_period+0xd8/0x180 [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100 [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300 [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10 [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280 [<ffffffff8119c970>] ? __perf_install_in_context+0xc0/0xc0 [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280 [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190 [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0 [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60 [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50 [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0 <EOI> [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0 [<ffffffff81123de5>] ? smp_call_function_single+0xd5/0x130 [<ffffffff81123ddb>] ? smp_call_function_single+0xcb/0x130 [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0 [<ffffffff8119765a>] event_function_call+0x10a/0x120 [<ffffffff8119c660>] ? ctx_resched+0x90/0x90 [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30 [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60 [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70 [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0 [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60 [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0 [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0 [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0 [<ffffffff81036d29>] ? sched_clock+0x9/0x10 [<ffffffff8124a919>] SyS_ioctl+0x79/0x90 [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4 ---[ end trace aef202839fe9a71d ]--- Uhhuh. NMI received for unknown reason 2d on CPU 2. Do you have a strange power saving mode enabled? Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com [ Fixed various typos and other small details. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2Jiri Olsa2-2/+12
Using PAGE_SIZE buffers makes the WRMSR to PERF_GLOBAL_CTRL in intel_pmu_enable_all() mysteriously hang on Core2. As a workaround, we don't do this. The hard lockup is easily triggered by running 'perf test attr' repeatedly. Most of the time it gets stuck on sample session with small periods. # perf test attr -vv 14: struct perf_event_attr setup : --- start --- ... 'PERF_TEST_ATTR=/tmp/tmpuEKz3B /usr/bin/perf record -o /tmp/tmpuEKz3B/perf.data -c 123 kill >/dev/null 2>&1' ret 1 Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20160301190352.GA8355@krava.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/mce/AMD: Document some functionalityAravind Gopalakrishnan2-14/+19
In an attempt to aid in understanding of what the threshold_block structure holds, provide comments to describe the members here. Also, trim comments around threshold_restart_bank() and update copyright info. No functional change is introduced. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Shorten comments. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-6-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/mce: Clarify comments regarding deferred errorAravind Gopalakrishnan1-1/+1
Deferred errors indicate errors that hardware could not fix. But it still does not cause any interruption to program flow. So it does not generate any #MC and UC bit in MCx_STATUS is not set. Correct comment. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-5-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/mce/AMD: Fix logic to obtain block addressAravind Gopalakrishnan2-29/+59
In upcoming processors, the BLKPTR field is no longer used to indicate the MSR number of the additional register. Insted, it simply indicates the prescence of additional MSRs. Fix the logic here to gather MSR address from MSR_AMD64_SMCA_MCx_MISC() for newer processors and fall back to existing logic for older processors. [ Drop nextaddr_out label; style cleanups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-4-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errorsAravind Gopalakrishnan2-0/+88
For Scalable MCA enabled processors, errors are listed per IP block. And since it is not required for an IP to map to a particular bank, we need to use HWID and McaType values from the MCx_IPID register to figure out which IP a given bank represents. We also have a new bit (TCC) in the MCx_STATUS register to indicate Task context is corrupt. Add logic here to decode errors from all known IP blocks for Fam17h Model 00-0fh and to print TCC errors. [ Minor fixups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/mce: Move MCx_CONFIG MSR definitionsAravind Gopalakrishnan2-4/+4
Those MSRs are used only by the MCE code so move them there. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1456785179-14378-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08Merge branch 'linus' into ras/core, to pick up fixesIngo Molnar25-102/+227
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-06Merge branch 'for-linus-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/umlLinus Torvalds1-2/+2
Pull UML fixes from Richard Weinberger: "This contains three bug/build fixes" * 'for-linus-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: use %lx format specifiers for unsigned longs um: Export pm_power_off Revert "um: Fix get_signal() usage"
2016-03-05um: use %lx format specifiers for unsigned longsColin Ian King1-2/+2
static analysis from cppcheck detected %x being used for unsigned longs: [arch/x86/um/os-Linux/task_size.c:112]: (warning) %x in format string (no. 1) requires 'unsigned int' but the argument type is 'unsigned long'. Use %lx instead of %x Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-03-04Merge tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds1-0/+7
Pull power management and ACPI fixes from Rafael Wysocki: "Two build fixes for cpufreq drivers (including one for breakage introduced recently) and a fix for a graph tracer crash when used over suspend-to-RAM on x86. Specifics: - Prevent the graph tracer from crashing when used over suspend-to- RAM on x86 by pausing it before invoking do_suspend_lowlevel() and un-pausing it when that function has returned (Todd Brandt). - Fix build issues in the qoriq and mediatek cpufreq drivers related to broken dependencies on THERMAL (Arnd Bergmann)" * tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / sleep / x86: Fix crash on graph trace through x86 suspend cpufreq: mediatek: allow building as a module cpufreq: qoriq: allow building as module with THERMAL=m
2016-03-04Merge tag 'v4.5-rc6' into core/resources, to resolve conflictIngo Molnar39-204/+451
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-10/+13
Pull KVM fixes from Paolo Bonzini: - ARM/MIPS: Fixes for ioctls when copy_from_user returns nonzero - x86: Small fix for Skylake TSC scaling - x86: Improved fix for last week's missed hardware breakpoint bug * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Update tsc multiplier on change. mips/kvm: fix ioctl error handling arm/arm64: KVM: Fix ioctl error handling KVM: x86: fix root cause for missed hardware breakpoints
2016-03-03perf/x86/uncore: Fix build on UP-IOAPIC configsIngo Molnar1-0/+2
Commit: cf6d445f6897 ("perf/x86/uncore: Track packages, not per CPU data") reorganized the uncore code to track packages, and introduced a dependency on MAX_APIC_ID. This constant is not available on UP-IOAPIC builds: arch/x86/events/intel/uncore.c:1350:44: error: 'MAX_LOCAL_APIC' undeclared here (not in a function) Include asm/apicdef.h explicitly to pick it up. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-03PM / sleep / x86: Fix crash on graph trace through x86 suspendTodd E Brandt1-0/+7
Pause/unpause graph tracing around do_suspend_lowlevel as it has inconsistent call/return info after it jumps to the wakeup vector. The graph trace buffer will otherwise become misaligned and may eventually crash and hang on suspend. To reproduce the issue and test the fix: Run a function_graph trace over suspend/resume and set the graph function to suspend_devices_and_enter. This consistently hangs the system without this fix. Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-02kvm: x86: Update tsc multiplier on change.Owen Hofmann1-5/+9
vmx.c writes the TSC_MULTIPLIER field in vmx_vcpu_load, but only when a vcpu has migrated physical cpus. Record the last value written and update in vmx_vcpu_load on any change, otherwise a cpu migration must occur for TSC frequency scaling to take effect. Cc: stable@vger.kernel.org Fixes: ff2c3a1803775cc72dc6f624b59554956396b0ee Signed-off-by: Owen Hofmann <osh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-29Merge tag 'v4.5-rc6' into locking/core, to pick up fixesIngo Molnar24-93/+208
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changesIngo Molnar25-95/+211
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29perf/x86/intel/rapl: Convert it to a per package facilityThomas Gleixner1-108/+86
RAPL is a per package facility and we already have a mechanism for a dedicated per package reader. So there is no point to have multiple CPUs doing the same. The current implementation actually starts two timers on two CPUs if one does: perf stat -C1,2 -e -e power/energy-pkg .... which makes the whole concept of 1 reader per package moot. What's worse is that the above returns the double of the actual energy consumption, but that's a different problem to address and cannot be solved by removing the pointless per cpuness of that mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.845369524@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29perf/x86/intel/rapl: Utilize event->pmu_privateThomas Gleixner1-4/+12
Store the PMU pointer in event->pmu_private and use it instead of the per CPU data. Preparatory step to get rid of the per CPU allocations. The usage sites are the perf fast path, so we keep that even after the conversion to per package storage as a CPU to package lookup involves 3 loads versus 1 with the pmu_private pointer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.748151799@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29perf/x86/intel/rapl: Make PMU lock rawThomas Gleixner1-10/+10
This lock is taken in hard interrupt context even on Preempt-RT. Make it raw so RT does not have to patch it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.669411833@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29perf/x86/intel/rapl: Refactor the code some moreThomas Gleixner1-30/+31
Split out code from init into seperate functions. Tidy up the code and get rid of pointless comments. I wish there would be comments for code which is not obvious.... Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.588544679@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29perf/x86/intel/rapl: Clean up the printk outputThomas Gleixner1-19/+23
The output is inconsistent. Use a proper pr_fmt prefix and split out the advertisement into a seperate function. Remove the WARN_ON() in the failure case. It's pointless as we already know where it failed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.504551295@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29perf/x86/intel/rapl: Calculate timing onceThomas Gleixner1-21/+18
No point in doing the same calculation over and over. Do it once in rapl_check_hw_unit(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.409238136@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29perf/x86/intel/rapl: Sanitize the quirk handlingThomas Gleixner1-30/+19
There is no point in having a quirk machinery for a single possible function. Get rid of it and move the quirk to a place where it actually makes sense. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.311639465@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>