aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-25x86/uaccess: Remove unused __addr_ok() macroBorislav Petkov1-3/+0
This was caught while staring at the whole {set,get}_fs() machinery. It's last user, the 32-bit version of strnlen_user() went away with 5723aa993d83 ("x86: use the new generic strnlen_user() function") so drop it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: the arch/x86 maintainers <x86@kernel.org> Cc: "Tobin C. Harding" <tobin@kernel.org> Link: https://lkml.kernel.org/r/20190225191109.7671-1-bp@alien8.de
2019-02-25x86/uaccess: Don't leak the AC flag into __put_user() value evaluationAndy Lutomirski1-2/+4
When calling __put_user(foo(), ptr), the __put_user() macro would call foo() in between __uaccess_begin() and __uaccess_end(). If that code were buggy, then those bugs would be run without SMAP protection. Fortunately, there seem to be few instances of the problem in the kernel. Nevertheless, __put_user() should be fixed to avoid doing this. Therefore, evaluate __put_user()'s argument before setting AC. This issue was noticed when an objtool hack by Peter Zijlstra complained about genregs_get() and I compared the assembly output to the C source. [ bp: Massage commit message and fixed up whitespace. ] Fixes: 11f1a4b9755f ("x86: reorganize SMAP handling in user space accesses") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190225125231.845656645@infradead.org
2019-02-25Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses"Linus Torvalds1-58/+0
This reverts commit 9da3f2b74054406f87dff7101a569217ffceb29b. It was well-intentioned, but wrong. Overriding the exception tables for instructions for random reasons is just wrong, and that is what the new code did. It caused problems for tracing, and it caused problems for strncpy_from_user(), because the new checks made perfectly valid use cases break, rather than catch things that did bad things. Unchecked user space accesses are a problem, but that's not a reason to add invalid checks that then people have to work around with silly flags (in this case, that 'kernel_uaccess_faults_ok' flag, which is just an odd way to say "this commit was wrong" and was sprinked into random places to hide the wrongness). The real fix to unchecked user space accesses is to get rid of the special "let's not check __get_user() and __put_user() at all" logic. Make __{get|put}_user() be just aliases to the regular {get|put}_user() functions, and make it impossible to access user space without having the proper checks in places. The raison d'ĂȘtre of the special double-underscore versions used to be that the range check was expensive, and if you did multiple user accesses, you'd do the range check up front (like the signal frame handling code, for example). But SMAP (on x86) and PAN (on ARM) have made that optimization pointless, because the _real_ expense is the "set CPU flag to allow user space access". Do let's not break the valid cases to catch invalid cases that shouldn't even exist. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Tobin C. Harding <tobin@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-4/+20
Three conflicts, one of which, for marvell10g.c is non-trivial and requires some follow-up from Heiner or someone else. The issue is that Heiner converted the marvell10g driver over to use the generic c45 code as much as possible. However, in 'net' a bug fix appeared which makes sure that a new local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0 is cleared. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22perf, pt, coresight: Fix address filters for vmas with non-zero offsetAlexander Shishkin1-4/+5
Currently, the address range calculation for file-based filters works as long as the vma that maps the matching part of the object file starts from offset zero into the file (vm_pgoff==0). Otherwise, the resulting filter range would be off by vm_pgoff pages. Another related problem is that in case of a partially matching vma, that is, a vma that matches part of a filter region, the filter range size wouldn't be adjusted. Fix the arithmetics around address filter range calculations, taking into account vma offset, so that the entire calculation is done before the filter configuration is passed to the PMU drivers instead of having those drivers do the final bit of arithmetics. Based on the patch by Adrian Hunter <adrian.hunter.intel.com>. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Link: http://lkml.kernel.org/r/20190215115655.63469-3-alexander.shishkin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-22KVM: MMU: record maximum physical address width in kvm_mmu_extended_roleYu Zhang2-0/+2
Previously, commit 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") offered some optimization to avoid the unnecessary reconfiguration. Yet one scenario is broken - when cpuid changes VM's maximum physical address width, reconfiguration is needed to reset the reserved bits. Also, the TDP may need to reset its shadow_root_level when this value is changed. To fix this, a new field, maxphyaddr, is introduced in the extended role structure to keep track of the configured guest physical address width. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-22kvm: x86: Return LA57 feature based on hardware capabilityYu Zhang1-0/+4
Previously, 'commit 372fddf70904 ("x86/mm: Introduce the 'no5lvl' kernel parameter")' cleared X86_FEATURE_LA57 in boot_cpu_data, if Linux chooses to not run in 5-level paging mode. Yet boot_cpu_data is queried by do_cpuid_ent() as the host capability later when creating vcpus, and Qemu will not be able to detect this feature and create VMs with LA57 feature. As discussed earlier, VMs can still benefit from extended linear address width, e.g. to enhance features like ASLR. So we would like to fix this, by return the true hardware capability when Qemu queries. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-22x86/kvm/mmu: fix switch between root and guest MMUsVitaly Kuznetsov2-4/+14
Commit 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu") brought one subtle change: previously, when switching back from L2 to L1, we were resetting MMU hooks (like mmu->get_cr3()) in kvm_init_mmu() called from nested_vmx_load_cr3() and now we do that in nested_ept_uninit_mmu_context() when we re-target vcpu->arch.mmu pointer. The change itself looks logical: if nested_ept_init_mmu_context() changes something than nested_ept_uninit_mmu_context() restores it back. There is, however, one thing: the following call chain: nested_vmx_load_cr3() kvm_mmu_new_cr3() __kvm_mmu_new_cr3() fast_cr3_switch() cached_root_available() now happens with MMU hooks pointing to the new MMU (root MMU in our case) while previously it was happening with the old one. cached_root_available() tries to stash current root but it is incorrect to read current CR3 with mmu->get_cr3(), we need to use old_mmu->get_cr3() which in case we're switching from L2 to L1 is guest_mmu. (BTW, in shadow page tables case this is a non-issue because we don't switch MMU). While we could've tried to guess that we're switching between MMUs and call the right ->get_cr3() from cached_root_available() this seems to be overly complicated. Instead, just stash the corresponding CR3 when setting root_hpa and make cached_root_available() use the stashed value. Fixes: 14c07ad89f4d ("x86/kvm/mmu: introduce guest_mmu") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller11-32/+99
Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-1932-bit userspace ABI: introduce ARCH_32BIT_OFF_T config optionYury Norov2-0/+2
All new 32-bit architectures should have 64-bit userspace off_t type, but existing architectures has 32-bit ones. To enforce the rule, new config option is added to arch/Kconfig that defaults ARCH_32BIT_OFF_T to be disabled for new 32-bit architectures. All existing 32-bit architectures enable it explicitly. New option affects force_o_largefile() behaviour. Namely, if userspace off_t is 64-bits long, we have no reason to reject user to open big files. Note that even if architectures has only 64-bit off_t in the kernel (arc, c6x, h8300, hexagon, nios2, openrisc, and unicore32), a libc may use 32-bit off_t, and therefore want to limit the file size to 4GB unless specified differently in the open flags. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Yury Norov <ynorov@marvell.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-18x86/mpx: tweak header nameMichael S. Tsirkin1-1/+1
Use linux/mman.h to make sure we get all mmap flags we need. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-18x86/smpboot: Remove unused phys_id variableShaokun Zhang1-6/+1
The 'phys_id' local variable became unused after commit ce4b1b16502b ("x86/smpboot: Initialize secondary CPU only if master CPU will wait for it"). Remove it. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alison Schofield <alison.schofield@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pu Wen <puwen@hygon.cn> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/1550495101-41755-1-git-send-email-zhangshaokun@hisilicon.com
2019-02-17Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-5/+34
Pull x86 fixes from Ingo Molnar: "Three changes: - An UV fix/quirk to pull UV BIOS calls into the efi_runtime_lock locking regime. (This done by aliasing __efi_uv_runtime_lock to efi_runtime_lock, which should make the quirk nature obvious and maintain the general policy that the EFI lock (name...) isn't exposed to drivers.) - Our version of MAGA: Make a.out Great Again. - Add a new Intel model name enumerator to an upstream header to help reduce dependencies going forward" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls x86/CPU: Add Icelake model number x86/a.out: Clear the dump structure initially
2019-02-17Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-2/+37
Pull perf fixes from Ingo Molnar: "Two fixes on the kernel side: fix an over-eager condition that failed larger perf ring-buffer sizes, plus fix crashes in the Intel BTS code for a corner case, found by fuzzing" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix impossible ring-buffer sizes warning perf/x86: Add check_period PMU callback
2019-02-16efi: Fix build error due to enum collision between efi.h and ima.hAnders Roxell1-2/+2
The following commit: a893ea15d764 ("tpm: move tpm_chip definition to include/linux/tpm.h") introduced a build error when both IMA and EFI are enabled: In file included from ../security/integrity/ima/ima_fs.c:30: ../security/integrity/ima/ima.h:176:7: error: redeclaration of enumerator "NONE" What happens is that both headers (ima.h and efi.h) defines the same 'NONE' constant, and it broke when they started getting included from the same file: Rework to prefix the EFI enum with 'EFI_*'. Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20190215165551.12220-2-ard.biesheuvel@linaro.org [ Cleaned up the changelog a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller6-32/+43
The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15x86/platform/UV: Use efi_runtime_lock to serialise BIOS callsHedi Berriche2-3/+28
Calls into UV firmware must be protected against concurrency, expose the efi_runtime_lock to the UV platform, and use it to serialise UV BIOS calls. Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Reviewed-by: Mike Travis <mike.travis@hpe.com> Cc: Andy Shevchenko <andy@infradead.org> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-efi <linux-efi@vger.kernel.org> Cc: platform-driver-x86@vger.kernel.org Cc: stable@vger.kernel.org # v4.9+ Cc: Steve Wahl <steve.wahl@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190213193413.25560-5-hedi.berriche@hpe.com
2019-02-15x86/platform/UV: Use efi_enabled() instead of test_bit()Hedi Berriche1-1/+1
Use ad hoc efi_enabled() instead of fiddling with test_bit(). Cleanup, no functional changes. Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Reviewed-by: Mike Travis <mike.travis@hpe.com> Cc: Andy Shevchenko <andy@infradead.org> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-efi <linux-efi@vger.kernel.org> Cc: platform-driver-x86@vger.kernel.org Cc: Steve Wahl <steve.wahl@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190213193413.25560-4-hedi.berriche@hpe.com
2019-02-15x86/platform/UV: Remove uv_bios_call_reentrant()Hedi Berriche2-13/+0
uv_bios_call_reentrant() has no callers nor is it exported, remove it. Cleanup, no functional changes. Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Reviewed-by: Mike Travis <mike.travis@hpe.com> Cc: Andy Shevchenko <andy@infradead.org> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-efi <linux-efi@vger.kernel.org> Cc: platform-driver-x86@vger.kernel.org Cc: Steve Wahl <steve.wahl@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190213193413.25560-3-hedi.berriche@hpe.com
2019-02-15x86/platform/UV: Remove unnecessary #ifdef CONFIG_EFIHedi Berriche2-6/+0
CONFIG_EFI is implied by CONFIG_X86_UV and x86/platform/uv/bios_uv.c requires the latter, get rid of the redundant #ifdef CONFIG_EFI directives. Cleanup, no functional changes. Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Russ Anderson <rja@hpe.com> Reviewed-by: Dimitri Sivanich <sivanich@hpe.com> Reviewed-by: Mike Travis <mike.travis@hpe.com> Cc: Andy Shevchenko <andy@infradead.org> Cc: Bhupesh Sharma <bhsharma@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-efi <linux-efi@vger.kernel.org> Cc: platform-driver-x86@vger.kernel.org Cc: Steve Wahl <steve.wahl@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190213193413.25560-2-hedi.berriche@hpe.com
2019-02-15EDAC/mce_amd: Decode MCA_STATUS[Scrub] bitYazen Ghannam1-0/+1
Previous AMD systems have had a bit in MCA_STATUS to indicate that an error was detected on a scrub operation. However, this bit was defined differently within different banks and families/models. Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero or defined as "Scrub", for all MCA banks and CPU models. Therefore, this bit can be defined as the "Scrub" bit. Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding module for Family 17h and newer systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190212212417.107049-1-Yazen.Ghannam@amd.com
2019-02-14x86/mm/dump_pagetables: Remove the unused prev_pud variableShaokun Zhang1-2/+0
The 'prev_pud' local variable became unused after commit 04b67022fb6d ("x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y"). Remove it. Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1550136829-49088-1-git-send-email-zhangshaokun@hisilicon.com
2019-02-14kvm: vmx: Fix entry number check for add_atomic_switch_msr()Xiaoyao Li1-1/+2
Commit ca83b4a7f2d068da79a0 ("x86/KVM/VMX: Add find_msr() helper function") introduces the helper function find_msr(), which returns -ENOENT when not find the msr in vmx->msr_autoload.guest/host. Correct checking contion of no more available entry in vmx->msr_autoload. Fixes: ca83b4a7f2d0 ("x86/KVM/VMX: Add find_msr() helper function") Cc: stable@vger.kernel.org Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-14KVM: x86: Recompute PID.ON when clearing PID.SNLuwei Kang3-21/+17
Some Posted-Interrupts from passthrough devices may be lost or overwritten when the vCPU is in runnable state. The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state but not running on physical CPU). If a posted interrupt comes at this time, the irq remapping facility will set the bit of PIR (Posted Interrupt Requests) but not ON (Outstanding Notification). Then, the interrupt will not be seen by KVM, which always expects PID.ON=1 if PID.PIR=1 as documented in the Intel processor SDM but not in the VT-d specification. To fix this, restore the invariant after PID.SN is cleared. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-14x86/CPU: Add Icelake model numberRajneesh Bhardwaj1-0/+2
Add the CPUID model number of Icelake (ICL) mobile processors to the Intel family list. Icelake U/Y series uses model number 0x7E. Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "David E. Box" <david.e.box@intel.com> Cc: dvhart@infradead.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: platform-driver-x86@vger.kernel.org Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190214115712.19642-2-rajneesh.bhardwaj@linux.intel.com
2019-02-13KVM: nVMX: Restore a preemption timer consistency checkSean Christopherson1-0/+4
A recently added preemption timer consistency check was unintentionally dropped when the consistency checks were being reorganized to match the SDM's ordering. Fixes: 461b4ba4c7ad ("KVM: nVMX: Move the checks for VM-Execution Control Fields to a separate helper function") Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-13x86/boot: Correct RSDP parsing with 32-bit EFIBorislav Petkov1-19/+31
Guenter Roeck reported triple faults of a 64-bit VM using a 32-bit OVMF EFI image. After some singlestepping of the image in gdb, it turned out that some of the EFI config tables were at bogus addresses. Which, as Ard pointed out, results from using the wrong efi_config_table typedef. So switch all EFI table pointers to unsigned longs and convert them to the proper typedef only when accessing them. This way, the proper table type is being used. Shorten variable names, while at it. Fixes: 33f0df8d843d ("x86/boot: Search for RSDP in the EFI tables") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: bhe@redhat.com Cc: caoj.fnst@cn.fujitsu.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190208190248.GA10854@roeck-us.net
2019-02-13x86/a.out: Clear the dump structure initiallyBorislav Petkov1-2/+4
dump_thread32() in aout_core_dump() does not clear the user32 structure allocated on the stack as the first thing on function entry. As a result, the dump.u_comm, dump.u_ar0 and dump.signal which get assigned before the clearing, get overwritten. Rename that function to fill_dump() to make it clear what it does and call it first thing. This was caught while staring at a patch by Derek Robson <robsonde@gmail.com>. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Derek Robson <robsonde@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Matz <matz@suse.de> Cc: x86@kernel.org Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20190202005512.3144-1-robsonde@gmail.com
2019-02-13x86/kprobes: Prohibit probing on IRQ handlers directlyMasami Hiramatsu1-0/+7
Prohibit probing on IRQ handlers in irqentry_text because if it interrupts user mode, at that point we haven't changed to kernel space yet and which eventually leads a double fault. E.g. # echo p apic_timer_interrupt > kprobe_events # echo 1 > events/kprobes/enable PANIC: double fault, error_code: 0x0 CPU: 1 PID: 814 Comm: less Not tainted 4.20.0-rc3+ #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:error_entry+0x12/0xf0 [snip] Call Trace: <ENTRY_TRAMPOLINE> ? native_iret+0x7/0x7 ? async_page_fault+0x8/0x30 ? trace_hardirqs_on_thunk+0x1c/0x1c ? error_entry+0x7c/0xf0 ? async_page_fault+0x8/0x30 ? native_iret+0x7/0x7 ? int3+0xa/0x20 ? trace_hardirqs_on_thunk+0x1c/0x1c ? error_entry+0x7c/0xf0 ? int3+0xa/0x20 ? apic_timer_interrupt+0x1/0x20 </ENTRY_TRAMPOLINE> Kernel panic - not syncing: Machine halted. Kernel Offset: disabled ---[ end Kernel panic - not syncing: Machine halted. ]--- Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998796400.31052.8406236614820687840.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13x86/kprobes: Prohibit probing on functions before kprobe_int3_handler()Masami Hiramatsu3-2/+5
Prohibit probing on the functions called before kprobe_int3_handler() in do_int3(). More specifically, ftrace_int3_handler(), poke_int3_handler(), and ist_enter(). And since rcu_nmi_enter() is called by ist_enter(), it also should be marked as NOKPROBE_SYMBOL. Since those are handled before kprobe_int3_handler(), probing those functions can cause a breakpoint recursion and crash the kernel. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998793571.31052.11301258949601150994.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13x86/kprobes: Move trampoline code into RODATAMasami Hiramatsu1-7/+2
Move optprobe trampoline code into RODATA since it is not executed, but copied and modified to be used on a trampoline buffer. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/154998790744.31052.3016106262944915510.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-13x86/kprobes: Prohibit probing on optprobe template codeMasami Hiramatsu1-0/+5
Prohibit probing on optprobe template code, since it is not a code but a template instruction sequence. If we modify this template, copied template must be broken. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 9326638cbee2 ("kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation") Link: http://lkml.kernel.org/r/154998787911.31052.15274376330136234452.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-12x86/kvm/nVMX: read from MSR_IA32_VMX_PROCBASED_CTLS2 only when it is availableVitaly Kuznetsov1-3/+5
SDM says MSR_IA32_VMX_PROCBASED_CTLS2 is only available "If (CPUID.01H:ECX.[5] && IA32_VMX_PROCBASED_CTLS[63])". It was found that some old cpus (namely "Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz (family: 0x6, model: 0xf, stepping: 0x6") don't have it. Add the missing check. Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Tested-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-11x86/fpu: Track AVX-512 usage of tasksAubrey Li2-0/+14
User space tools which do automated task placement need information about AVX-512 usage of tasks, because AVX-512 usage could cause core turbo frequency drop and impact the running task on the sibling CPU. The XSAVE hardware structure has bits that indicate when valid state is present in registers unique to AVX-512 use. Use these bits to indicate when AVX-512 has been in use and add per-task AVX-512 state timestamp tracking to context switch. Well-written AVX-512 applications are expected to clear the AVX-512 state when not actively using AVX-512 registers, so the tracking mechanism is imprecise and can theoretically miss AVX-512 usage during context switch. But it has been measured to be precise enough to be useful under real-world workloads like tensorflow and linpack. If higher precision is required, suggest user space tools to use the PMU-based mechanisms in combination. Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: aubrey.li@intel.com Link: http://lkml.kernel.org/r/20190117183822.31333-1-aubrey.li@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11Merge tag 'v5.0-rc6' into x86/fpu, to pick up fixesIngo Molnar36-182/+280
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86: Add check_period PMU callbackJiri Olsa3-2/+37
Vince (and later on Ravi) reported crashes in the BTS code during fuzzing with the following backtrace: general protection fault: 0000 [#1] SMP PTI ... RIP: 0010:perf_prepare_sample+0x8f/0x510 ... Call Trace: <IRQ> ? intel_pmu_drain_bts_buffer+0x194/0x230 intel_pmu_drain_bts_buffer+0x160/0x230 ? tick_nohz_irq_exit+0x31/0x40 ? smp_call_function_single_interrupt+0x48/0xe0 ? call_function_single_interrupt+0xf/0x20 ? call_function_single_interrupt+0xa/0x20 ? x86_schedule_events+0x1a0/0x2f0 ? x86_pmu_commit_txn+0xb4/0x100 ? find_busiest_group+0x47/0x5d0 ? perf_event_set_state.part.42+0x12/0x50 ? perf_mux_hrtimer_restart+0x40/0xb0 intel_pmu_disable_event+0xae/0x100 ? intel_pmu_disable_event+0xae/0x100 x86_pmu_stop+0x7a/0xb0 x86_pmu_del+0x57/0x120 event_sched_out.isra.101+0x83/0x180 group_sched_out.part.103+0x57/0xe0 ctx_sched_out+0x188/0x240 ctx_resched+0xa8/0xd0 __perf_event_enable+0x193/0x1e0 event_function+0x8e/0xc0 remote_function+0x41/0x50 flush_smp_call_function_queue+0x68/0x100 generic_smp_call_function_single_interrupt+0x13/0x30 smp_call_function_single_interrupt+0x3e/0xe0 call_function_single_interrupt+0xf/0x20 </IRQ> The reason is that while event init code does several checks for BTS events and prevents several unwanted config bits for BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows to create BTS event without those checks being done. Following sequence will cause the crash: If we create an 'almost' BTS event with precise_ip and callchains, and it into a BTS event it will crash the perf_prepare_sample() function because precise_ip events are expected to come in with callchain data initialized, but that's not the case for intel_pmu_drain_bts_buffer() caller. Adding a check_period callback to be called before the period is changed via PERF_EVENT_IOC_PERIOD. It will deny the change if the event would become BTS. Plus adding also the limit_period check as well. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11x86/defconfig: Enable EFI stub, mixed mode and BGRTArd Biesheuvel2-0/+5
Enable various options in the defconfigs of both i386 and x86_64 that could do with some test coverage in automated testing. Note that these options are typically enabled by distros as well. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11mm/gup: Remove the 'write' parameter from gup_fast_permitted()Ira Weiny1-2/+1
The 'write' parameter is unused in gup_fast_permitted() so remove it. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20190210223424.13934-1-ira.weiny@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86/intel: Add counter freezing quirk for GoldmontKan Liang1-2/+7
A microcode patch is also needed for Goldmont while counter freezing feature is enabled. Otherwise, there will be some issues, e.g. PMI lost. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-5-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86/intel: Clean up counter freezing quirkKan Liang1-16/+11
Clean up counter freezing quirk to use the new facility to check for min microcode revisions. Rename the counter freezing quirk related functions. Because other platforms, e.g. Goldmont, also needs to call the quirk. Only check the boot CPU, assuming models and features are consistent over all CPUs. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-4-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86/intel: Clean up SNB PEBS quirkKan Liang1-25/+10
Clean up SNB PEBS quirk to use the new facility to check for min microcode revisions. Only check the boot CPU, assuming models and features are consistent over all CPUs. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-3-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11perf/x86/kvm: Avoid unnecessary work in guest filteringAndi Kleen3-16/+75
KVM added a workaround for PEBS events leaking into guests with commit: 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.") This uses the VT entry/exit list to add an extra disable of the PEBS_ENABLE MSR. Intel also added a fix for this issue to microcode updates on Haswell/Broadwell/Skylake. It turns out using the MSR entry/exit list makes VM exits significantly slower. The list is only needed for disabling PEBS, because the GLOBAL_CTRL change gets optimized by KVM into changing the VMCS. Check for the microcode updates that have the microcode fix for leaking PEBS, and disable the extra entry/exit list entry for PEBS_ENABLE. In addition we always clear the GLOBAL_CTRL for the PEBS counter while running in the guest, which is enough to make them never fire at the wrong side of the host/guest transition. The overhead for VM exits with the filtering active with the patch is reduced from 8% to 4%. The microcode patch has already been merged into future platforms. This patch is one-off thing. The quirks is used here. For other old platforms which doesn't have microcode patch and quirks, extra disable of the PEBS_ENABLE MSR is still required. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: https://lkml.kernel.org/r/1549319013-4522-2-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11Merge branch 'x86/cpu' into perf/core, to pick up dependent commitIngo Molnar3-5/+62
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-11x86/cpufeature: Add facility to check for min microcode revisionsKan Liang2-0/+59
For bug workarounds or checks, it is useful to check for specific microcode revisions. Add a new generic function to match the CPU with stepping. Add the other function to check the min microcode revisions for the matched CPU. A new table format is introduced to facilitate the quirk to fill the related information. This does not change the existing x86_cpu_id because it's an ABI shared with modules, and also has quite different requirements, as in no wildcards, but everything has to be matched exactly. Originally-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: eranian@google.com Link: https://lkml.kernel.org/r/1549319013-4522-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-10Merge tag 'y2038-new-syscalls' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038Thomas Gleixner3-33/+53
Pull y2038 - time64 system calls from Arnd Bergmann: This series finally gets us to the point of having system calls with 64-bit time_t on all architectures, after a long time of incremental preparation patches. There was actually one conversion that I missed during the summer, i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes and review comments. The following system calls are now added on all 32-bit architectures using the same system call numbers: 403 clock_gettime64 404 clock_settime64 405 clock_adjtime64 406 clock_getres_time64 407 clock_nanosleep_time64 408 timer_gettime64 409 timer_settime64 410 timerfd_gettime64 411 timerfd_settime64 412 utimensat_time64 413 pselect6_time64 414 ppoll_time64 416 io_pgetevents_time64 417 recvmmsg_time64 418 mq_timedsend_time64 419 mq_timedreceiv_time64 420 semtimedop_time64 421 rt_sigtimedwait_time64 422 futex_time64 423 sched_rr_get_interval_time64 Each one of these corresponds directly to an existing system call that includes a 'struct timespec' argument, or a structure containing a timespec or (in case of clock_adjtime) timeval. Not included here are new versions of getitimer/setitimer and getrusage/waitid, which are planned for the future but only needed to make a consistent API rather than for correct operation beyond y2038. These four system calls are based on 'timeval', and it has not been finally decided what the replacement kernel interface will use instead. So far, I have done a lot of build testing across most architectures, which has found a number of bugs. Runtime testing so far included testing LTP on 32-bit ARM with the existing system calls, to ensure we do not regress for existing binaries, and a test with a 32-bit x86 build of LTP against a modified version of the musl C library that has been adapted to the new system call interface [3]. This library can be used for testing on all architectures supported by musl-1.1.21, but it is not how the support is getting integrated into the official musl release. Official musl support is planned but will require more invasive changes to the library. Link: https://lore.kernel.org/lkml/20190110162435.309262-1-arnd@arndb.de/T/ Link: https://lore.kernel.org/lkml/20190118161835.2259170-1-arnd@arndb.de/ Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]
2019-02-10Merge tag 'y2038-syscall-cleanup' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038Thomas Gleixner2-0/+13
Pull preparatory work for y2038 changes from Arnd Bergmann: System call unification and cleanup The system call tables have diverged a bit over the years, and a number of the recent additions never made it into all architectures, for one reason or another. This is an attempt to clean it up as far as we can without breaking compatibility, doing a number of steps: - Add system calls that have not yet been integrated into all architectures but that we definitely want there. This includes {,f}statfs64() and get{eg,eu,g,p,u,pp}id() on alpha, which have been missing traditionally. - The s390 compat syscall handling is cleaned up to be more like what we do on other architectures, while keeping the 31-bit pointer extension. This was merged as a shared branch by the s390 maintainers and is included here in order to base the other patches on top. - Add the separate ipc syscalls on all architectures that traditionally only had sys_ipc(). This version is done without support for IPC_OLD that is we have in sys_ipc. The new semtimedop_time64 syscall will only be added here, not in sys_ipc - Add syscall numbers for a couple of syscalls that we probably don't need everywhere, in particular pkey_* and rseq, for the purpose of symmetry: if it's in asm-generic/unistd.h, it makes sense to have it everywhere. I expect that any future system calls will get assigned on all platforms together, even when they appear to be specific to a single architecture. - Prepare for having the same system call numbers for any future calls. In combination with the generated tables, this hopefully makes it easier to add new calls across all architectures together. All of the above are technically separate from the y2038 work, but are done as preparation before we add the new 64-bit time_t system calls everywhere, providing a common baseline set of system calls. I expect that glibc and other libraries that want to use 64-bit time_t will require linux-5.1 kernel headers for building in the future, and at a much later point may also require linux-5.1 or a later version as the minimum kernel at runtime. Having a common baseline then allows the removal of many architecture or kernel version specific workarounds.
2019-02-10Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-26/+29
Pull x86 fixes from Ingo Molnar: "A handful of fixes: - Fix an MCE corner case bug/crash found via MCE injection testing - Fix 5-level paging boot crash - Fix MCE recovery cache invalidation bug - Fix regression on Xen guests caused by a recent PMD level mremap speedup optimization" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Make set_pmd_at() paravirt aware x86/mm/cpa: Fix set_mce_nospec() x86/boot/compressed/64: Do not corrupt EDX on EFER.LME=1 setting x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()
2019-02-10Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-6/+14
Pull perf fixes from Ingo Molnar: "A couple of kernel side fixes: - Fix the Intel uncore driver on certain hardware configurations - Fix a CPU hotplug related memory allocation bug - Remove a spurious WARN() ... plus also a handful of perf tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf script python: Add Python3 support to tests/attr.py perf trace: Support multiple "vfs_getname" probes perf symbols: Filter out hidden symbols from labels perf symbols: Add fallback definitions for GELF_ST_VISIBILITY() tools headers uapi: Sync linux/in.h copy from the kernel sources perf clang: Do not use 'return std::move(something)' perf mem/c2c: Fix perf_mem_events to support powerpc perf tests evsel-tp-sched: Fix bitwise operator perf/core: Don't WARN() for impossible ring-buffer sizes perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu() perf/x86/intel/uncore: Add Node ID mask
2019-02-10x86/mm: Make set_pmd_at() paravirt awareJuergen Gross1-1/+1
set_pmd_at() calls native_set_pmd() unconditionally on x86. This was fine as long as only huge page entries were written via set_pmd_at(), as Xen pv guests don't support those. Commit 2c91bd4a4e2e53 ("mm: speed up mremap by 20x on large regions") introduced a usage of set_pmd_at() possible on pv guests, leading to failures like: BUG: unable to handle kernel paging request at ffff888023e26778 #PF error: [PROT] [WRITE] RIP: e030:move_page_tables+0x7c1/0xae0 move_vma.isra.3+0xd1/0x2d0 __se_sys_mremap+0x3c6/0x5b0 do_syscall_64+0x49/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Make set_pmd_at() paravirt aware by just letting it use set_pmd(). Fixes: 2c91bd4a4e2e53 ("mm: speed up mremap by 20x on large regions") Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Cc: sstabellini@kernel.org Cc: hpa@zytor.com Cc: bp@alien8.de Cc: torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/20190210074056.11842-1-jgross@suse.com
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller16-17/+69
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>