aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2-1/+44
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-11-02 The following pull-request contains BPF updates for your *net-next* tree. We've added 30 non-merge commits during the last 7 day(s) which contain a total of 41 files changed, 1864 insertions(+), 474 deletions(-). The main changes are: 1) Fix long standing user vs kernel access issue by introducing bpf_probe_read_user() and bpf_probe_read_kernel() helpers, from Daniel. 2) Accelerated xskmap lookup, from Björn and Maciej. 3) Support for automatic map pinning in libbpf, from Toke. 4) Cleanup of BTF-enabled raw tracepoints, from Alexei. 5) Various fixes to libbpf and selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller22-139/+227
The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-02uaccess: Add strict non-pagefault kernel-space read functionDaniel Borkmann2-1/+44
Add two new probe_kernel_read_strict() and strncpy_from_unsafe_strict() helpers which by default alias to the __probe_kernel_read() and the __strncpy_from_unsafe(), respectively, but can be overridden by archs which have non-overlapping address ranges for kernel space and user space in order to bail out with -EFAULT when attempting to probe user memory including non-canonical user access addresses [0]: 4-level page tables: user-space mem: 0x0000000000000000 - 0x00007fffffffffff non-canonical: 0x0000800000000000 - 0xffff7fffffffffff 5-level page tables: user-space mem: 0x0000000000000000 - 0x00ffffffffffffff non-canonical: 0x0100000000000000 - 0xfeffffffffffffff The idea is that these helpers are complementary to the probe_user_read() and strncpy_from_unsafe_user() which probe user-only memory. Both added helpers here do the same, but for kernel-only addresses. Both set of helpers are going to be used for BPF tracing. They also explicitly avoid throwing the splat for non-canonical user addresses from 00c42373d397 ("x86-64: add warning for non-canonical user access address dereferences"). For compat, the current probe_kernel_read() and strncpy_from_unsafe() are left as-is. [0] Documentation/x86/x86_64/mm.txt Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: x86@kernel.org Link: https://lore.kernel.org/bpf/eefeefd769aa5a013531f491a71f0936779e916b.1572649915.git.daniel@iogearbox.net
2019-11-01Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-21/+43
Pull perf fixes from Ingo Molnar: "Misc fixes: an ABI fix for a reserved field, AMD IBS fixes, an Intel uncore PMU driver fix and a header typo fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/ perf/x86/uncore: Fix event group support perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity perf/core: Start rejecting the syscall with attr.__reserved_2 set
2019-11-01Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+3
Pull EFI fixes from Ingo Molnar: "Various fixes all over the map: prevent boot crashes on HyperV, classify UEFI randomness as bootloader randomness, fix EFI boot for the Raspberry Pi2, fix efi_test permissions, etc" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN x86, efi: Never relocate kernel below lowest acceptable address efi: libstub/arm: Account for firmware reserved memory at the base of RAM efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness efi/tpm: Return -EINVAL when determining tpm final events log size fails efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only
2019-11-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-13/+11
Pull kvm fixes from Paolo Bonzini: "generic: - fix memory leak on failure to create VM x86: - fix MMU corner case with AMD nested paging disabled" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active kvm: call kvm_arch_destroy_vm if vm creation fails kvm: Allocate memslots and buses before calling kvm_arch_init_vm
2019-10-31KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is activePaolo Bonzini2-13/+11
VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX already ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. Removing all the conditionals simplifies the code. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch also makes the code for the two vendors more similar while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors. Cc: stable@vger.kernel.org Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-31x86, efi: Never relocate kernel below lowest acceptable addressKairui Song1-1/+3
Currently, kernel fails to boot on some HyperV VMs when using EFI. And it's a potential issue on all x86 platforms. It's caused by broken kernel relocation on EFI systems, when below three conditions are met: 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR) by the loader. 2. There isn't enough room to contain the kernel, starting from the default load address (eg. something else occupied part the region). 3. In the memmap provided by EFI firmware, there is a memory region starts below LOAD_PHYSICAL_ADDR, and suitable for containing the kernel. EFI stub will perform a kernel relocation when condition 1 is met. But due to condition 2, EFI stub can't relocate kernel to the preferred address, so it fallback to ask EFI firmware to alloc lowest usable memory region, got the low region mentioned in condition 3, and relocated kernel there. It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This is the lowest acceptable kernel relocation address. The first thing goes wrong is in arch/x86/boot/compressed/head_64.S. Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output address if kernel is located below it. Then the relocation before decompression, which move kernel to the end of the decompression buffer, will overwrite other memory region, as there is no enough memory there. To fix it, just don't let EFI stub relocate the kernel to any address lower than lowest acceptable address. [ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ] Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28perf/x86/uncore: Fix event group supportKan Liang2-18/+38
The events in the same group don't start or stop simultaneously. Here is the ftrace when enabling event group for uncore_iio_0: # perf stat -e "{uncore_iio_0/event=0x1/,uncore_iio_0/event=0xe/}" <idle>-0 [000] d.h. 8959.064832: read_msr: a41, value b2b0b030 //Read counter reg of IIO unit0 counter0 <idle>-0 [000] d.h. 8959.064835: write_msr: a48, value 400001 //Write Ctrl reg of IIO unit0 counter0 to enable counter0. <------ Although counter0 is enabled, Unit Ctrl is still freezed. Nothing will count. We are still good here. <idle>-0 [000] d.h. 8959.064836: read_msr: a40, value 30100 //Read Unit Ctrl reg of IIO unit0 <idle>-0 [000] d.h. 8959.064838: write_msr: a40, value 30000 //Write Unit Ctrl reg of IIO unit0 to enable all counters in the unit by clear Freeze bit <------Unit0 is un-freezed. Counter0 has been enabled. Now it starts counting. But counter1 has not been enabled yet. The issue starts here. <idle>-0 [000] d.h. 8959.064846: read_msr: a42, value 0 //Read counter reg of IIO unit0 counter1 <idle>-0 [000] d.h. 8959.064847: write_msr: a49, value 40000e //Write Ctrl reg of IIO unit0 counter1 to enable counter1. <------ Now, counter1 just starts to count. Counter0 has been running for a while. Current code un-freezes the Unit Ctrl right after the first counter is enabled. The subsequent group events always loses some counter values. Implement pmu_enable and pmu_disable support for uncore, which can help to batch hardware accesses. No one uses uncore_enable_box and uncore_disable_box. Remove them. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-drivers-review@eclists.intel.com Cc: linux-perf@eclists.intel.com Fixes: 087bfbb03269 ("perf/x86: Add generic Intel uncore PMU support") Link: https://lkml.kernel.org/r/1572014593-31591-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)Kim Phillips1-2/+4
This saves us writing the IBS control MSR twice when disabling the event. I searched revision guides for all families since 10h, and did not find occurrence of erratum #420, nor anything remotely similar: so we isolate the secondary MSR write to family 10h only. Also unconditionally update the count mask for IBS Op implementations that have read & writeable current count (CurCnt) fields in addition to the MaxCnt field. These bits were reserved on prior implementations, and therefore shouldn't have negative impact. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: c9574fe0bdb9 ("perf/x86-ibs: Implement workaround for IBS erratum #420") Link: https://lkml.kernel.org/r/20191023150955.30292-2-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validityKim Phillips1-1/+1
The loop that reads all the IBS MSRs into *buf stopped one MSR short of reading the IbsOpData register, which contains the RipInvalid status bit. Fix the offset_max assignment so the MSR gets read, so the RIP invalid evaluation is based on what the IBS h/w output, instead of what was left in memory. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: d47e8238cd76 ("perf/x86-ibs: Take instruction pointer from ibs sample") Link: https://lkml.kernel.org/r/20191023150955.30292-1-kim.phillips@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-27Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-5/+9
Pull x86 fixes from Thomas Gleixner: "Two fixes for the VMWare guest support: - Unbreak VMWare platform detection which got wreckaged by converting an integer constant to a string constant. - Fix the clang build of the VMWAre hypercall by explicitely specifying the ouput register for INL instead of using the short form" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/vmware: Fix platform detection VMWARE_PORT macro x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
2019-10-27Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull perf fixes from Thomas Gleixner: "A set of perf fixes: kernel: - Unbreak the tracking of auxiliary buffer allocations which got imbalanced causing recource limit failures. - Fix the fallout of splitting of ToPA entries which missed to shift the base entry PA correctly. - Use the correct context to lookup the AUX event when unmapping the associated AUX buffer so the event can be stopped and the buffer reference dropped. tools: - Fix buildiid-cache mode setting in copyfile_mode_ns() when copying /proc/kcore - Fix freeing id arrays in the event list so the correct event is closed. - Sync sched.h anc kvm.h headers with the kernel sources. - Link jvmti against tools/lib/ctype.o to have weak strlcpy(). - Fix multiple memory and file descriptor leaks, found by coverity in perf annotate. - Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found by a static analysis tool" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/aux: Fix AUX output stopping perf/aux: Fix tracking of auxiliary trace buffer allocation perf/x86/intel/pt: Fix base for single entry topa perf kmem: Fix memory leak in compact_gfp_flags() tools headers UAPI: Sync sched.h with the kernel tools headers kvm: Sync kvm.h headers with the kernel sources tools headers kvm: Sync kvm headers with the kernel sources tools headers kvm: Sync kvm headers with the kernel sources perf c2c: Fix memory leak in build_cl_output() perf tools: Fix mode setting in copyfile_mode_ns() perf annotate: Fix multiple memory and file descriptor leaks perf tools: Fix resource leak of closedir() on the error paths perf evlist: Fix fix for freed id arrays perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()
2019-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-4/+93
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-10-27 The following pull-request contains BPF updates for your *net-next* tree. We've added 52 non-merge commits during the last 11 day(s) which contain a total of 65 files changed, 2604 insertions(+), 1100 deletions(-). The main changes are: 1) Revolutionize BPF tracing by using in-kernel BTF to type check BPF assembly code. The work here teaches BPF verifier to recognize kfree_skb()'s first argument as 'struct sk_buff *' in tracepoints such that verifier allows direct use of bpf_skb_event_output() helper used in tc BPF et al (w/o probing memory access) that dumps skb data into perf ring buffer. Also add direct loads to probe memory in order to speed up/replace bpf_probe_read() calls, from Alexei Starovoitov. 2) Big batch of changes to improve libbpf and BPF kselftests. Besides others: generalization of libbpf's CO-RE relocation support to now also include field existence relocations, revamp the BPF kselftest Makefile to add test runner concept allowing to exercise various ways to build BPF programs, and teach bpf_object__open() and friends to automatically derive BPF program type/expected attach type from section names to ease their use, from Andrii Nakryiko. 3) Fix deadlock in stackmap's build-id lookup on rq_lock(), from Song Liu. 4) Allow to read BTF as raw data from bpftool. Most notable use case is to dump /sys/kernel/btf/vmlinux through this, from Jiri Olsa. 5) Use bpf_redirect_map() helper in libbpf's AF_XDP helper prog which manages to improve "rx_drop" performance by ~4%., from Björn Töpel. 6) Fix to restore the flow dissector after reattach BPF test and also fix error handling in bpf_helper_defs.h generation, from Jakub Sitnicki. 7) Improve verifier's BTF ctx access for use outside of raw_tp, from Martin KaFai Lau. 8) Improve documentation for AF_XDP with new sections and to reflect latest features, from Magnus Karlsson. 9) Add back 'version' section parsing to libbpf for old kernels, from John Fastabend. 10) Fix strncat bounds error in libbpf's libbpf_prog_type_by_name(), from KP Singh. 11) Turn on -mattr=+alu32 in LLVM by default for BPF kselftests in order to improve insn coverage for built BPF progs, from Yonghong Song. 12) Misc minor cleanups and fixes, from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-26Merge tag 'for-linus-5.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-0/+8
Pull xen fixlet from Juergen Gross: "Just one patch for issuing a deprecation warning for 32-bit Xen pv guests" * tag 'for-linus-5.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: issue deprecation warning for 32-bit pv guest
2019-10-25xen: issue deprecation warning for 32-bit pv guestJuergen Gross1-0/+8
Support for the kernel as Xen 32-bit PV guest will soon be removed. Issue a warning when booted as such. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-22KVM: nVMX: Don't leak L1 MMIO regions to L2Jim Mattson4-32/+55
If the "virtualize APIC accesses" VM-execution control is set in the VMCS, the APIC virtualization hardware is triggered when a page walk in VMX non-root mode terminates at a PTE wherein the address of the 4k page frame matches the APIC-access address specified in the VMCS. On hardware, the APIC-access address may be any valid 4k-aligned physical address. KVM's nVMX implementation enforces the additional constraint that the APIC-access address specified in the vmcs12 must be backed by a "struct page" in L1. If not, L0 will simply clear the "virtualize APIC accesses" VM-execution control in the vmcs02. The problem with this approach is that the L1 guest has arranged the vmcs12 EPT tables--or shadow page tables, if the "enable EPT" VM-execution control is clear in the vmcs12--so that the L2 guest physical address(es)--or L2 guest linear address(es)--that reference the L2 APIC map to the APIC-access address specified in the vmcs12. Without the "virtualize APIC accesses" VM-execution control in the vmcs02, the APIC accesses in the L2 guest will directly access the APIC-access page in L1. When there is no mapping whatsoever for the APIC-access address in L1, the L2 VM just loses the intended APIC virtualization. However, when the APIC-access address is mapped to an MMIO region in L1, the L2 guest gets direct access to the L1 MMIO device. For example, if the APIC-access address specified in the vmcs12 is 0xfee00000, then L2 gets direct access to L1's APIC. Since this vmcs12 configuration is something that KVM cannot faithfully emulate, the appropriate response is to exit to userspace with KVM_INTERNAL_ERROR_EMULATION. Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12") Reported-by: Dan Cross <dcross@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_updateMiaohe Lin3-8/+8
Guest physical APIC ID may not equal to vcpu->vcpu_id in some case. We may set the wrong physical id in avic_handle_ldr_update as we always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page instead. Export and use kvm_xapic_id here and in avic_handle_apic_id_update as suggested by Vitaly. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: clear kvmclock MSR on resetPaolo Bonzini1-5/+3
After resetting the vCPU, the kvmclock MSR keeps the previous value but it is not enabled. This can be confusing, so fix it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: x86: fix bugon.cocci warningskbuild test robot1-2/+1
Use BUG_ON instead of a if condition followed by BUG. Generated by: scripts/coccinelle/misc/bugon.cocci Fixes: 4b526de50e39 ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()") CC: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: kbuild test robot <lkp@intel.com> Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22KVM: VMX: Remove specialized handling of unexpected exit-reasonsLiran Alon1-12/+0
Commit bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") introduced specialized handling of specific exit-reasons that should not be raised by CPU because KVM configures VMCS such that they should never be raised. However, since commit 7396d337cfad ("KVM: x86: Return to userspace with internal error on unexpected exit reason"), VMX & SVM exit handlers were modified to generically handle all unexpected exit-reasons by returning to userspace with internal error. Therefore, there is no need for specialized handling of specific unexpected exit-reasons (This specialized handling also introduced inconsistency for these exit-reasons to silently skip guest instruction instead of return to userspace on internal-error). Fixes: bf653b78f960 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUIDJim Mattson1-1/+1
When the RDPID instruction is supported on the host, enumerate it in KVM_GET_SUPPORTED_CPUID. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22x86/cpu/vmware: Fix platform detection VMWARE_PORT macroThomas Hellstrom1-5/+8
The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT definition, but expects it to be an integer. However, when it was moved to the new vmware.h include file, it was changed to be a string to better fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the platform detection VMWARE_PORT functionality. Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB definitions to be integers, and use __stringify() for their stringified form when needed. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas_os@shipmail.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-22x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvmThomas Hellstrom1-1/+2
LLVM's assembler doesn't accept the short form INL instruction: inl (%%dx) but instead insists on the output register to be explicitly specified. This was previously fixed for the VMWARE_PORT macro. Fix it also for the VMWARE_HYPERCALL macro. Suggested-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: clang-built-linux@googlegroups.com Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas_os@shipmail.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller14-38/+118
Several cases of overlapping changes which were for the most part trivially resolvable. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-20perf/x86/intel/pt: Fix base for single entry topaJiri Olsa1-1/+1
Jan reported failing ltp test for PT: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/tracing/pt_test/pt_test.c It looks like the reason is this new commit added in this v5.4 merge window: 38bb8d77d0b9 ("perf/x86/intel/pt: Split ToPA metadata and page layout") which did not keep the TOPA_SHIFT for entry base. Add it back. Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 38bb8d77d0b9 ("perf/x86/intel/pt: Split ToPA metadata and page layout") Link: https://lkml.kernel.org/r/20191019220726.12213-1-jolsa@kernel.org [ Minor changelog edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-20Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds6-38/+84
Pull x86 fixes from Thomas Gleixner: "A small set of x86 fixes: - Prevent a NULL pointer dereference in the X2APIC code in case of a CPU hotplug failure. - Prevent boot failures on HP superdome machines by invalidating the level2 kernel pagetable entries outside of the kernel area as invalid so BIOS reserved space won't be touched unintentionally. Also ensure that memory holes are rounded up to the next PMD boundary correctly. - Enable X2APIC support on Hyper-V to prevent boot failures. - Set the paravirt name when running on Hyper-V for consistency - Move a function under the appropriate ifdef guard to prevent build warnings" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guard x86/hyperv: Set pv_info.name to "Hyper-V" x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu x86/hyperv: Make vapic support x2apic mode x86/boot/64: Round memory hole size up to next PMD page x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
2019-10-18x86/boot/acpi: Move get_cmdline_acpi_rsdp() under #ifdef guardZhenzhong Duan1-24/+24
When building with "EXTRA_CFLAGS=-Wall" gcc warns: arch/x86/boot/compressed/acpi.c:29:30: warning: get_cmdline_acpi_rsdp defined but not used [-Wunused-function] get_cmdline_acpi_rsdp() is only used when CONFIG_RANDOMIZE_BASE and CONFIG_MEMORY_HOTREMOVE are both enabled, so any build where one of these config options is disabled has this issue. Move the function under the same ifdef guard as the call site. [ tglx: Add context to the changelog so it becomes useful ] Fixes: 41fa1ee9c6d6 ("acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1569719633-32164-1-git-send-email-zhenzhong.duan@oracle.com
2019-10-18x86/hyperv: Set pv_info.name to "Hyper-V"Andrea Parri1-0/+4
Michael reported that the x86/hyperv initialization code prints the following dmesg when running in a VM on Hyper-V: [ 0.000738] Booting paravirtualized kernel on bare hardware Let the x86/hyperv initialization code set pv_info.name to "Hyper-V" so dmesg reports correctly: [ 0.000172] Booting paravirtualized kernel on Hyper-V [ tglx: Folded build fix provided by Yue ] Reported-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: YueHaibing <yuehaibing@huawei.com> Link: https://lkml.kernel.org/r/20191015103502.13156-1-parri.andrea@gmail.com
2019-10-17bpf: Add support for BTF pointers to x86 JITAlexei Starovoitov1-4/+93
Pointer to BTF object is a pointer to kernel object or NULL. Such pointers can only be used by BPF_LDX instructions. The verifier changed their opcode from LDX|MEM|size to LDX|PROBE_MEM|size to make JITing easier. The number of entries in extable is the number of BPF_LDX insns that access kernel memory via "pointer to BTF type". Only these load instructions can fault. Since x86 extable is relative it has to be allocated in the same memory region as JITed code. Allocate it prior to last pass of JITing and let the last pass populate it. Pointer to extable in bpf_prog_aux is necessary to make page fault handling fast. Page fault handling is done in two steps: 1. bpf_prog_kallsyms_find() finds BPF program that page faulted. It's done by walking rb tree. 2. then extable for given bpf program is binary searched. This process is similar to how page faulting is done for kernel modules. The exception handler skips over faulting x86 instruction and initializes destination register with zero. This mimics exact behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed). JITs for other architectures can add support in similar way. Until then they will reject unknown opcode and fallback to interpreter. Since extable should be aligned and placed near JITed code make bpf_jit_binary_alloc() return 4 byte aligned image offset, so that extable aligning formula in bpf_int_jit_compile() doesn't need to rely on internal implementation of bpf_jit_binary_alloc(). On x86 gcc defaults to 16-byte alignment for regular kernel functions due to better performance. JITed code may be aligned to 16 in the future, but it will use 4 in the meantime. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191016032505.2089704-10-ast@kernel.org
2019-10-15x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpuSean Christopherson1-1/+2
Check that the per-cpu cluster mask pointer has been set prior to clearing a dying cpu's bit. The per-cpu pointer is not set until the target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the teardown function, x2apic_dead_cpu(), is associated with the earlier CPUHP_X2APIC_PREPARE. If an error occurs before the cpu is awakened, e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference the NULL pointer and cause a panic. smpboot: do_boot_cpu failed(-22) to wakeup CPU#1 BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:x2apic_dead_cpu+0x1a/0x30 Call Trace: cpuhp_invoke_callback+0x9a/0x580 _cpu_up+0x10d/0x140 do_cpu_up+0x69/0xb0 smp_init+0x63/0xa9 kernel_init_freeable+0xd7/0x229 ? rest_init+0xa0/0xa0 kernel_init+0xa/0x100 ret_from_fork+0x35/0x40 Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
2019-10-15x86/hyperv: Make vapic support x2apic modeRoman Kagan1-5/+15
Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode when supported by the vcpus. However, the apic access functions for Hyper-V enlightened apic assume xapic mode only. As a result, Linux fails to bring up secondary cpus when run as a guest in QEMU/KVM with both hv_apic and x2apic enabled. According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic apic MSRs behave exactly the same as the corresponding architectural x2apic MSRs, so there's no need to override the apic accessors. The only exception is hv_apic_eoi_write, which benefits from lazy EOI when available; however, its implementation works for both xapic and x2apic modes. Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver") Fixes: 6b48cb5f8347 ("X86/Hyper-V: Enlighten APIC access") Suggested-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com
2019-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-0/+10
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-10-14 The following pull-request contains BPF updates for your *net-next* tree. 12 days of development and 85 files changed, 1889 insertions(+), 1020 deletions(-) The main changes are: 1) auto-generation of bpf_helper_defs.h, from Andrii. 2) split of bpf_helpers.h into bpf_{helpers, helper_defs, endian, tracing}.h and move into libbpf, from Andrii. 3) Track contents of read-only maps as scalars in the verifier, from Andrii. 4) small x86 JIT optimization, from Daniel. 5) cross compilation support, from Ivan. 6) bpf flow_dissector enhancements, from Jakub and Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-12Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-25/+60
Pull perf fixes from Ingo Molnar: "Mostly tooling fixes, but also a couple of updates for new Intel models (which are technically hw-enablement, but to users it's a fix to perf behavior on those new CPUs - hope this is fine), an AUX inheritance fix, event time-sharing fix, and a fix for lost non-perf NMI events on AMD systems" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) perf/x86/cstate: Add Tiger Lake CPU support perf/x86/msr: Add Tiger Lake CPU support perf/x86/intel: Add Tiger Lake CPU support perf/x86/cstate: Update C-state counters for Ice Lake perf/x86/msr: Add new CPU model numbers for Ice Lake perf/x86/cstate: Add Comet Lake CPU support perf/x86/msr: Add Comet Lake CPU support perf/x86/intel: Add Comet Lake CPU support perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp perf/core: Fix corner case in perf_rotate_context() perf/core: Rework memory accounting in perf_mmap() perf/core: Fix inheritance of aux_output groups perf annotate: Don't return -1 for error when doing BPF disassembly perf annotate: Return appropriate error code for allocation failures perf annotate: Fix arch specific ->init() failure errors perf annotate: Propagate the symbol__annotate() error return perf annotate: Fix the signedness of failure returns perf annotate: Propagate perf_env__arch() error perf evsel: Fall back to global 'perf_env' in perf_evsel__env() perf tools: Propagate get_cpuid() error ...
2019-10-12Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-3/+0
Pull EFI fixes from Ingo Molnar: "Misc EFI fixes all across the map: CPER error report fixes, fixes to TPM event log parsing, fix for a kexec hang, a Sparse fix and other fixes" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/tpm: Fix sanity check of unsigned tbl_size being less than zero efi/x86: Do not clean dummy variable in kexec path efi: Make unexported efi_rci2_sysfs_init() static efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing efi/tpm: Don't traverse an event log with no events efi/tpm: Don't access event->count when it isn't mapped efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified efi/cper: Fix endianness of PCIe class code
2019-10-12Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-4/+7
Pull x86 fixes from Ingo Molnar: "A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware guest support fix when built under Clang, and new CPU model number definitions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Comet Lake to the Intel CPU models header lib/string: Make memzero_explicit() inline instead of external x86/cpu/vmware: Use the full form of INL in VMWARE_PORT x86/asm: Fix MWAITX C-state hint value
2019-10-12Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-3/+3
Pull x86 license tag fixlets from Ingo Molnar: "Fix a couple of SPDX tags in x86 headers to follow the canonical pattern" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Use the correct SPDX License Identifier in headers
2019-10-12Merge tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-3/+25
Pull xen fixes from Juergen Gross: - correct panic handling when running as a Xen guest - cleanup the Xen grant driver to remove printing a pointer being always NULL - remove a soon to be wrong call of of_dma_configure() * tag 'for-linus-5.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Stop abusing DT of_dma_configure API xen/grant-table: remove unnecessary printing x86/xen: Return from panic notifier
2019-10-12perf/x86/cstate: Add Tiger Lake CPU supportKan Liang1-9/+11
Tiger Lake is the followon to Ice Lake. From the perspective of Intel cstate residency counters, there is nothing changed compared with Ice Lake. Share icl_cstates with Ice Lake. Update the comments for Tiger Lake. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-10-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12perf/x86/msr: Add Tiger Lake CPU supportKan Liang1-0/+2
Tiger Lake is the followon to Ice Lake. PPERF and SMI_COUNT MSRs are also supported. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-9-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12perf/x86/intel: Add Tiger Lake CPU supportKan Liang1-0/+2
Tiger Lake is the followon to Ice Lake. From the perspective of Intel core PMU, there is little changes compared with Ice Lake, e.g. small changes in event list. But it doesn't impact on core PMU functionality. Share the perf code with Ice Lake. The event list patch will be submitted later separately. The patch has been tested on real hardware. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-8-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12perf/x86/cstate: Update C-state counters for Ice LakeKan Liang1-11/+25
There is no Core C3 C-State counter for Ice Lake. Package C8/C9/C10 C-State counters are added for Ice Lake. Introduce a new event list, icl_cstates, for Ice Lake. Update the comments accordingly. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: f08c47d1f86c ("perf/x86/intel/cstate: Add Icelake support") Link: https://lkml.kernel.org/r/1570549810-25049-7-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12perf/x86/msr: Add new CPU model numbers for Ice LakeKan Liang1-0/+3
PPERF and SMI_COUNT MSRs are also supported by Ice Lake desktop and server. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-6-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12perf/x86/cstate: Add Comet Lake CPU supportKan Liang1-10/+14
Comet Lake is the new 10th Gen Intel processor. From the perspective of Intel cstate residency counters, there is nothing changed compared with Kaby Lake. Share hswult_cstates with Kaby Lake. Update the comments for Comet Lake. Kaby Lake is missed in the comments for some Residency Counters. Update the comments for Kaby Lake as well. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-5-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12perf/x86/msr: Add Comet Lake CPU supportKan Liang1-0/+2
Comet Lake is the new 10th Gen Intel processor. PPERF and SMI_COUNT MSRs are also supported. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12perf/x86/intel: Add Comet Lake CPU supportKan Liang1-0/+2
Comet Lake is the new 10th Gen Intel processor. From the perspective of Intel PMU, there is nothing changed compared with Sky Lake. Share the perf code with Sky Lake. The patch has been tested on real hardware. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-12Merge branch 'x86/urgent' into perf/urgent, to pick up new CPU model definitionsIngo Molnar4-4/+7
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-11x86/boot/64: Round memory hole size up to next PMD pageSteve Wahl1-6/+19
The kernel image map is created using PMD pages, which can include some extra space beyond what's actually needed. Round the size of the memory hole we search for up to the next PMD boundary, to be certain all of the space to be mapped is usable RAM and includes no reserved areas. Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: dimitri.sivanich@hpe.com Cc: Feng Tang <feng.tang@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jordan Borgner <mail@jordan-borgner.de> Cc: Juergen Gross <jgross@suse.com> Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/df4f49f05c0c27f108234eb93db5c613d09ea62e.1569358539.git.steve.wahl@hpe.com
2019-10-11x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel areaSteve Wahl1-2/+20
Our hardware (UV aka Superdome Flex) has address ranges marked reserved by the BIOS. Access to these ranges is caught as an error, causing the BIOS to halt the system. Initial page tables mapped a large range of physical addresses that were not checked against the list of BIOS reserved addresses, and sometimes included reserved addresses in part of the mapped range. Including the reserved range in the map allowed processor speculative accesses to the reserved range, triggering a BIOS halt. Used early in booting, the page table level2_kernel_pgt addresses 1 GiB divided into 2 MiB pages, and it was set up to linearly map a full 1 GiB of physical addresses that included the physical address range of the kernel image, as chosen by KASLR. But this also included a large range of unused addresses on either side of the kernel image. And unlike the kernel image's physical address range, this extra mapped space was not checked against the BIOS tables of usable RAM addresses. So there were times when the addresses chosen by KASLR would result in processor accessible mappings of BIOS reserved physical addresses. The kernel code did not directly access any of this extra mapped space, but having it mapped allowed the processor to issue speculative accesses into reserved memory, causing system halts. This was encountered somewhat rarely on a normal system boot, and much more often when starting the crash kernel if "crashkernel=512M,high" was specified on the command line (this heavily restricts the physical address of the crash kernel, in our case usually within 1 GiB of reserved space). The solution is to invalidate the pages of this table outside the kernel image's space before the page table is activated. It fixes this problem on our hardware. [ bp: Touchups. ] Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: dimitri.sivanich@hpe.com Cc: Feng Tang <feng.tang@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jordan Borgner <mail@jordan-borgner.de> Cc: Juergen Gross <jgross@suse.com> Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
2019-10-09perf/x86/amd: Change/fix NMI latency mitigation to use a timestampTom Lendacky1-13/+17
It turns out that the NMI latency workaround from commit: 6d3edaae16c6 ("x86/perf/amd: Resolve NMI latency issues for active PMCs") ends up being too conservative and results in the perf NMI handler claiming NMIs too easily on AMD hardware when the NMI watchdog is active. This has an impact, for example, on the hpwdt (HPE watchdog timer) module. This module can produce an NMI that is used to reset the system. It registers an NMI handler for the NMI_UNKNOWN type and relies on the fact that nothing has claimed an NMI so that its handler will be invoked when the watchdog device produces an NMI. After the referenced commit, the hpwdt module is unable to process its generated NMI if the NMI watchdog is active, because the current NMI latency mitigation results in the NMI being claimed by the perf NMI handler. Update the AMD perf NMI latency mitigation workaround to, instead, use a window of time. Whenever a PMC is handled in the perf NMI handler, set a timestamp which will act as a perf NMI window. Any NMIs arriving within that window will be claimed by perf. Anything outside that window will not be claimed by perf. The value for the NMI window is set to 100 msecs. This is a conservative value that easily covers any NMI latency in the hardware. While this still results in a window in which the hpwdt module will not receive its NMI, the window is now much, much smaller. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6d3edaae16c6 ("x86/perf/amd: Resolve NMI latency issues for active PMCs") Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>