aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu/mce
AgeCommit message (Collapse)AuthorFilesLines
2020-04-14x86/mce/amd: Straighten CPU hotplug pathThomas Gleixner1-17/+15
mce_threshold_create_device() hotplug callback runs on the plugged in CPU so: - use this_cpu_read() which is faster - pass in struct threshold_bank **bp to threshold_create_bank() and instead of doing per-CPU accesses - Use rdmsr_safe() instead of rdmsr_safe_on_cpu() which avoids an IPI. No functional changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-6-bp@alien8.de
2020-04-14x86/mce/amd: Sanitize thresholding device creation hotplug pathThomas Gleixner2-41/+27
Drop the stupid threshold_init_device() initcall iterating over all online CPUs in favor of properly setting up everything on the CPU hotplug path, when each CPU's callback is invoked. [ bp: Write commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-5-bp@alien8.de
2020-04-14x86/mce/amd: Protect a not-fully initialized bank from the thresholding interruptThomas Gleixner1-2/+17
Make sure the thresholding bank descriptor is fully initialized when the thresholding interrupt fires after a hotplug event. [ bp: Write commit message and document long-forgotten bank_map. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-4-bp@alien8.de
2020-04-14x86/mce/amd: Init thresholding machinery only on relevant vendorsThomas Gleixner3-5/+17
... and not unconditionally. [ bp: Add a new vendor_flags bit for that. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-3-bp@alien8.de
2020-04-14x86/mce/amd: Do proper cleanup on error pathsThomas Gleixner1-7/+8
Drop kobject reference counts properly on error in the banks and blocks allocation functions. [ bp: Write commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200403161943.1458-2-bp@alien8.de
2020-03-30Merge tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-2/+13
Pull x86 entry code updates from Thomas Gleixner: - Convert the 32bit syscalls to be pt_regs based which removes the requirement to push all 6 potential arguments onto the stack and consolidates the interface with the 64bit variant - The first small portion of the exception and syscall related entry code consolidation which aims to address the recently discovered issues vs. RCU, int3, NMI and some other exceptions which can interrupt any context. The bulk of the changes is still work in progress and aimed for 5.8. - A few lockdep namespace cleanups which have been applied into this branch to keep the prerequisites for the ongoing work confined. * tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) x86/entry: Fix build error x86 with !CONFIG_POSIX_TIMERS lockdep: Rename trace_{hard,soft}{irq_context,irqs_enabled}() lockdep: Rename trace_softirqs_{on,off}() lockdep: Rename trace_hardirq_{enter,exit}() x86/entry: Rename ___preempt_schedule x86: Remove unneeded includes x86/entry: Drop asmlinkage from syscalls x86/entry/32: Enable pt_regs based syscalls x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments x86/entry/32: Rename 32-bit specific syscalls x86/entry/32: Clean up syscall_32.tbl x86/entry: Remove ABI prefixes from functions in syscall tables x86/entry/64: Add __SYSCALL_COMMON() x86/entry: Remove syscall qualifier support x86/entry/64: Remove ptregs qualifier from syscall table x86/entry: Move max syscall number calculation to syscallhdr.sh x86/entry/64: Split X32 syscall table into its own file x86/entry/64: Move sys_ni_syscall stub to common.c x86/entry/64: Use syscall wrappers for x32_rt_sigreturn x86/entry: Refactor SYS_NI macros ...
2020-03-30Merge tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-20/+50
Pull RAS updates from Borislav Petkov: - Do not report spurious MCEs on some Intel platforms caused by errata; by Prarit Bhargava. - Change dev-mcelog's hardcoded limit of 32 error records to a dynamic one, controlled by the number of logical CPUs, by Tony Luck. - Add support for the processor identification number (PPIN) on AMD, by Wei Huang. * tag 'ras_updates_for_5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/amd: Add PPIN support for AMD MCE x86/mce/dev-mcelog: Dynamically allocate space for machine check records x86/mce: Do not log spurious corrected mce errors
2020-03-22x86/mce/amd: Add PPIN support for AMD MCEWei Huang1-0/+2
Newer AMD CPUs support a feature called protected processor identification number (PPIN). This feature can be detected via CPUID_Fn80000008_EBX[23]. However, CPUID alone is not enough to read the processor identification number - MSR_AMD_PPIN_CTL also needs to be configured properly. If, for any reason, MSR_AMD_PPIN_CTL[PPIN_EN] can not be turned on, such as disabled in BIOS, the CPU capability bit X86_FEATURE_AMD_PPIN needs to be cleared. When the X86_FEATURE_AMD_PPIN capability is available, the identification number is issued together with the MCE error info in order to keep track of the source of MCE errors. [ bp: Massage. ] Co-developed-by: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com> Signed-off-by: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200321193800.3666964-1-wei.huang2@amd.com
2020-03-10x86/mce/dev-mcelog: Dynamically allocate space for machine check recordsTony Luck1-20/+27
We have had a hard coded limit of 32 machine check records since the dawn of time. But as numbers of cores increase, it is possible for more than 32 errors to be reported before a user process reads from /dev/mcelog. In this case the additional errors are lost. Keep 32 as the minimum. But tune the maximum value up based on the number of processors. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200218184408.GA23048@agluck-desk2.amr.corp.intel.com
2020-02-27x86/mce: Fix logic and comments around MSR_PPIN_CTLTony Luck1-4/+5
There are two implemented bits in the PPIN_CTL MSR: Bit 0: LockOut (R/WO) Set 1 to prevent further writes to MSR_PPIN_CTL. Bit 1: Enable_PPIN (R/W) If 1, enables MSR_PPIN to be accessible using RDMSR. If 0, an attempt to read MSR_PPIN will cause #GP. So there are four defined values: 0: PPIN is disabled, PPIN_CTL may be updated 1: PPIN is disabled. PPIN_CTL is locked against updates 2: PPIN is enabled. PPIN_CTL may be updated 3: PPIN is enabled. PPIN_CTL is locked against updates Code would only enable the X86_FEATURE_INTEL_PPIN feature for case "2". When it should have done so for both case "2" and case "3". Fix the final test to just check for the enable bit. Also fix some of the other comments in this function. Fixes: 3f5a7896a509 ("x86/mce: Include the PPIN in MCE records when available") Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200226011737.9958-1-tony.luck@intel.com
2020-02-27x86/entry/32: Force MCE through do_mce()Thomas Gleixner1-0/+3
Remove the pointless difference between 32 and 64 bit to make further unifications simpler. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200225220216.428188397@linutronix.de
2020-02-27x86/mce: Disable tracing and kprobes on do_machine_check()Andy Lutomirski1-2/+10
do_machine_check() can be raised in almost any context including the most fragile ones. Prevent kprobes and tracing. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200225220216.315548935@linutronix.de
2020-02-25x86/mce/therm_throt: Undo thermal polling properly on CPU offlineThomas Gleixner1-2/+7
Chris Wilson reported splats from running the thermal throttling workqueue callback on offlined CPUs. The problem is that that callback should not even run on offlined CPUs but it happens nevertheless because the offlining callback thermal_throttle_offline() does not symmetrically undo the setup work done in its onlining counterpart. IOW, 1. The thermal interrupt vector should be masked out before ... 2. ... cancelling any pending work synchronously so that no new work is enqueued anymore. Do those things and fix the issue properly. [ bp: Write commit message. ] Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Pandruvada, Srinivas <srinivas.pandruvada@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/158120068234.18291.7938335950259651295@skylake-alporthouse-com
2020-02-19x86/mce: Do not log spurious corrected mce errorsPrarit Bhargava3-0/+21
A user has reported that they are seeing spurious corrected errors on their hardware. Intel Errata HSD131, HSM142, HSW131, and BDM48 report that "spurious corrected errors may be logged in the IA32_MC0_STATUS register with the valid field (bit 63) set, the uncorrected error field (bit 61) not set, a Model Specific Error Code (bits [31:16]) of 0x000F, and an MCA Error Code (bits [15:0]) of 0x0005." The Errata PDFs are linked in the bugzilla below. Block these spurious errors from the console and logs. [ bp: Move the intel_filter_mce() header declarations into the already existing CONFIG_X86_MCE_INTEL ifdeffery. ] Co-developed-by: Alexander Krupp <centos@akr.yagii.de> Signed-off-by: Alexander Krupp <centos@akr.yagii.de> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206587 Link: https://lkml.kernel.org/r/20200219131611.36816-1-prarit@redhat.com
2020-02-14x86/mce/amd: Fix kobject lifetimeThomas Gleixner1-6/+11
Accessing the MCA thresholding controls in sysfs concurrently with CPU hotplug can lead to a couple of KASAN-reported issues: BUG: KASAN: use-after-free in sysfs_file_ops+0x155/0x180 Read of size 8 at addr ffff888367578940 by task grep/4019 and BUG: KASAN: use-after-free in show_error_count+0x15c/0x180 Read of size 2 at addr ffff888368a05514 by task grep/4454 for example. Both result from the fact that the threshold block creation/teardown code frees the descriptor memory itself instead of defining proper ->release function and leaving it to the driver core to take care of that, after all sysfs accesses have completed. Do that and get rid of the custom freeing code, fixing the above UAFs in the process. [ bp: write commit message. ] Fixes: 95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200214082801.13836-1-bp@alien8.de
2020-02-13x86/mce/amd: Publish the bank pointer only after setup has succeededBorislav Petkov1-17/+16
threshold_create_bank() creates a bank descriptor per MCA error thresholding counter which can be controlled over sysfs. It publishes the pointer to that bank in a per-CPU variable and then goes on to create additional thresholding blocks if the bank has such. However, that creation of additional blocks in allocate_threshold_blocks() can fail, leading to a use-after-free through the per-CPU pointer. Therefore, publish that pointer only after all blocks have been setup successfully. Fixes: 019f34fccfd5 ("x86, MCE, AMD: Move shared bank to node descriptor") Reported-by: Saar Amar <Saar.Amar@microsoft.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200128140846.phctkvx5btiexvbx@kili.mountain
2020-01-28Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-7/+8
Pull x86 cpu-features updates from Ingo Molnar: "The biggest change in this cycle was a large series from Sean Christopherson to clean up the handling of VMX features. This both fixes bugs/inconsistencies and makes the code more coherent and future-proof. There are also two cleanups and a minor TSX syslog messages enhancement" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/cpu: Remove redundant cpu_detect_cache_sizes() call x86/cpu: Print "VMX disabled" error message iff KVM is enabled KVM: VMX: Allow KVM_INTEL when building for Centaur and/or Zhaoxin CPUs perf/x86: Provide stubs of KVM helpers for non-Intel CPUs KVM: VMX: Use VMX_FEATURE_* flags to define VMCS control bits KVM: VMX: Check for full VMX support when verifying CPU compatibility KVM: VMX: Use VMX feature flag to query BIOS enabling KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl() x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_* x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs x86/vmx: Introduce VMX_FEATURES_* x86/cpu: Clear VMX feature flag if VMX is not fully enabled x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization x86/centaur: Use common IA32_FEAT_CTL MSR initialization x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked x86/intel: Initialize IA32_FEAT_CTL MSR at boot tools/x86: Sync msr-index.h from kernel sources selftests, kvm: Replace manual MSR defs with common msr-index.h ...
2020-01-27Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds5-45/+33
Pull RAS updates from Borislav Petkov: - Misc fixes to the MCE code all over the place, by Jan H. Schönherr. - Initial support for AMD F19h and other cleanups to amd64_edac, by Yazen Ghannam. - Other small cleanups. * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/mce_amd: Make fam_ops static global EDAC/amd64: Drop some family checks for newer systems EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh x86/amd_nb: Add Family 19h PCI IDs EDAC/mce_amd: Always load on SMCA systems x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType x86/mce: Fix use of uninitialized MCE message string x86/mce: Fix mce=nobootlog x86/mce: Take action on UCNA/Deferred errors again x86/mce: Remove mce_inject_log() in favor of mce_log() x86/mce: Pass MCE message to mce_panic() on failed kernel recovery x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unused
2020-01-16x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaTypeYazen Ghannam1-0/+2
Add support for a new version of the Load Store unit bank type as indicated by its McaType value, which will be present in future SMCA systems. Add the new (HWID, MCATYPE) tuple. Reuse the same name, since this is logically the same to the user. Also, add the new error descriptions to edac_mce_amd. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-2-Yazen.Ghannam@amd.com
2020-01-15x86/mce/therm_throt: Do not access uninitialized therm_workChuansheng Liu1-4/+5
It is relatively easy to trigger the following boot splat on an Ice Lake client platform. The call stack is like: kernel BUG at kernel/timer/timer.c:1152! Call Trace: __queue_delayed_work queue_delayed_work_on therm_throt_process intel_thermal_interrupt ... The reason is that a CPU's thermal interrupt is enabled prior to executing its hotplug onlining callback which will initialize the throttling workqueues. Such a race can lead to therm_throt_process() accessing an uninitialized therm_work, leading to the above BUG at a very early bootup stage. Therefore, unmask the thermal interrupt vector only after having setup the workqueues completely. [ bp: Heavily massage commit message and correct comment formatting. ] Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200107004116.59353-1-chuansheng.liu@intel.com
2020-01-13x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlockedSean Christopherson1-5/+6
WARN if the IA32_FEAT_CTL MSR is somehow left unlocked now that CPU initialization unconditionally locks the MSR. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-6-sean.j.christopherson@intel.com
2020-01-13x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSRSean Christopherson1-5/+5
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL are quite a mouthful, especially the VMX bits which must differentiate between enabling VMX inside and outside SMX (TXT) operation. Rename the MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to make them a little friendlier on the eyes. Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name to match Intel's SDM, but a future patch will add a dedicated Kconfig, file and functions for the MSR. Using the full name for those assets is rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its nomenclature is consistent throughout the kernel. Opportunistically, fix a few other annoyances with the defines: - Relocate the bit defines so that they immediately follow the MSR define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL. - Add whitespace around the block of feature control defines to make it clear they're all related. - Use BIT() instead of manually encoding the bit shift. - Use "VMX" instead of "VMXON" to match the SDM. - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to be consistent with the kernel's verbiage used for all other feature control bits. Note, the SDM refers to the LMCE bit as LMCE_ON, likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN. Ignore the (literal) one-off usage of _ON, the SDM is simply "wrong". Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2020-01-13x86/mce: Fix use of uninitialized MCE message stringJan H. Schönherr1-2/+2
The function mce_severity() is not required to update its msg argument. In fact, mce_severity_amd() does not, which makes mce_no_way_out() return uninitialized data, which may be used later for printing. Assuming that implementations of mce_severity() either always or never update the msg argument (which is currently the case), it is sufficient to initialize the temporary variable in mce_no_way_out(). While at it, avoid printing a useless "Unknown". Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200103150722.20313-4-jschoenh@amazon.de
2020-01-13x86/mce: Fix mce=nobootlogJan H. Schönherr1-13/+9
Since commit 8b38937b7ab5 ("x86/mce: Do not enter deferred errors into the generic pool twice") the mce=nobootlog option has become mostly ineffective (after being only slightly ineffective before), as the code is taking actions on MCEs left over from boot when they have a usable address. Move the check for MCP_DONTLOG a bit outward to make it effective again. Also, since commit 011d82611172 ("RAS: Add a Corrected Errors Collector") the two branches of the remaining "if" at the bottom of machine_check_poll() do same. Unify them. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200103150722.20313-3-jschoenh@amazon.de
2020-01-13x86/mce: Take action on UCNA/Deferred errors againJan H. Schönherr1-15/+16
Commit fa92c5869426 ("x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll") added handling of UCNA and Deferred errors by adding them to the ring for SRAO errors. Later, commit fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors") switched storage from the SRAO ring to the unified pool that is still in use today. In order to only act on the intended errors, a filter for MCE_AO_SEVERITY is used -- effectively removing handling of UCNA/Deferred errors again. Extend the severity filter to include UCNA/Deferred errors again. Also, generalize the naming of the notifier from SRAO to UC to capture the extended scope. Note, that this change may cause a message like the following to appear, as the same address may be reported as SRAO and as UCNA: Memory failure: 0x5fe3284: already hardware poisoned Technically, this is a return to previous behavior. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de
2019-12-17x86/mce: Remove mce_inject_log() in favor of mce_log()Jan H. Schönherr3-13/+2
The mutex in mce_inject_log() became unnecessary with commit 5de97c9f6d85 ("x86/mce: Factor out and deprecate the /dev/mcelog driver"), though the original reason for its presence only vanished with commit 7298f08ea887 ("x86/mcelog: Get rid of RCU remnants"). Drop the mutex. And as that makes mce_inject_log() identical to mce_log(), get rid of the former in favor of the latter. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191210000733.17979-7-jschoenh@amazon.de
2019-12-17x86/mce: Pass MCE message to mce_panic() on failed kernel recoveryJan H. Schönherr1-1/+1
In commit b2f9d678e28c ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries") another call to mce_panic() was introduced. Pass the message of the handled MCE to that instance of mce_panic() as well, as there doesn't seem to be a reason not to. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191210000733.17979-6-jschoenh@amazon.de
2019-12-17x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unusedArnd Bergmann1-1/+1
throttle_active_work() is only called if CONFIG_SYSFS is set, otherwise we get a harmless warning: arch/x86/kernel/cpu/mce/therm_throt.c:238:13: error: 'throttle_active_work' \ defined but not used [-Werror=unused-function] Mark the function as __maybe_unused to avoid the warning. Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: bberg@redhat.com Cc: ckellner@redhat.com Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: hdegoede@redhat.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191210203925.3119091-1-arnd@arndb.de
2019-12-17x86/mce: Fix possibly incorrect severity calculation on AMDJan H. Schönherr1-1/+1
The function mce_severity_amd_smca() requires m->bank to be initialized for correct operation. Fix the one case, where mce_severity() is called without doing so. Fixes: 6bda529ec42e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems") Fixes: d28af26faa0b ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()") Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20191210000733.17979-4-jschoenh@amazon.de
2019-12-17x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]Yazen Ghannam1-1/+1
Each logical CPU in Scalable MCA systems controls a unique set of MCA banks in the system. These banks are not shared between CPUs. The bank types and ordering will be the same across CPUs on currently available systems. However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while other CPUs do not. In this case, the bank seen as Reserved on one CPU is assumed to be the same type as the bank seen as a known type on another CPU. In general, this occurs when the hardware represented by the MCA bank is disabled, e.g. disabled memory controllers on certain models, etc. The MCA bank is disabled in the hardware, so there is no possibility of getting an MCA/MCE from it even if it is assumed to have a known type. For example: Full system: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | UMC 2 | CS | CS System with hardware disabled: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | UMC | RAZ 2 | CS | CS For this reason, there is a single, global struct smca_banks[] that is initialized at boot time. This array is initialized on each CPU as it comes online. However, the array will not be updated if an entry already exists. This works as expected when the first CPU (usually CPU0) has all possible MCA banks enabled. But if the first CPU has a subset, then it will save a "Reserved" type in smca_banks[]. Successive CPUs will then not be able to update smca_banks[] even if they encounter a known bank type. This may result in unexpected behavior. Depending on the system configuration, a user may observe issues enumerating the MCA thresholding sysfs interface. The issues may be as trivial as sysfs entries not being available, or as severe as system hangs. For example: Bank | Type seen on CPU0 | Type seen on CPU1 ------------------------------------------------ 0 | LS | LS 1 | RAZ | UMC 2 | CS | CS Extend the smca_banks[] entry check to return if the entry is a non-reserved type. Otherwise, continue so that CPUs that encounter a known bank type can update smca_banks[]. Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com
2019-12-17x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()Konstantin Khlebnikov1-1/+1
... because interrupts are disabled that early and sending IPIs can deadlock: BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 no locks held by swapper/1/0. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 softirqs last enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0 softirqs last disabled at (0): [<0000000000000000>] 0x0 Preemption disabled at: [<ffffffff8104703b>] start_secondary+0x3b/0x190 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 Call Trace: dump_stack ___might_sleep.cold.92 wait_for_completion ? generic_exec_single rdmsr_safe_on_cpu ? wrmsr_on_cpus mce_amd_feature_init mcheck_cpu_init identify_cpu identify_secondary_cpu smp_store_cpu_info start_secondary secondary_startup_64 The function smca_configure() is called only on the current CPU anyway, therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid the IPI. [ bp: Update commit message. ] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzz
2019-11-29x86/mce/therm_throt: Mask out read-only and reserved MSR bitsSrinivas Pandruvada1-5/+12
While writing to MSR IA32_THERM_STATUS/IA32_PKG_THERM_STATUS, avoid writing 1 to read only and reserved fields because updating some fields generates exception. [ bp: Vertically align for better readability. ] Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Tested-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191128150824.22413-1-srinivas.pandruvada@linux.intel.com
2019-11-12x86/mce/therm_throt: Optimize notifications of thermal throttleSrinivas Pandruvada1-24/+227
Some modern systems have very tight thermal tolerances. Because of this they may cross thermal thresholds when running normal workloads (even during boot). The CPU hardware will react by limiting power/frequency and using duty cycles to bring the temperature back into normal range. Thus users may see a "critical" message about the "temperature above threshold" which is soon followed by "temperature/speed normal". These messages are rate-limited, but still may repeat every few minutes. This issue became worse starting with the Ivy Bridge generation of CPUs because they include a TCC activation offset in the MSR IA32_TEMPERATURE_TARGET. OEMs use this to provide alerts long before critical temperatures are reached. A test run on a laptop with Intel 8th Gen i5 core for two hours with a workload resulted in 20K+ thermal interrupts per CPU for core level and another 20K+ interrupts at package level. The kernel logs were full of throttling messages. The real value of these threshold interrupts, is to debug problems with the external cooling solutions and performance issues due to excessive throttling. So the solution here is the following: - In the current thermal_throttle folder, show: - the maximum time for one throttling event and, - the total amount of time the system was in throttling state. - Do not log short excursions. - Log only when, in spite of thermal throttling, the temperature is rising. On the high threshold interrupt trigger a delayed workqueue that monitors the threshold violation log bit (THERM_STATUS_PROCHOT_LOG). When the log bit is set, this workqueue callback calculates three point moving average and logs a warning message when the temperature trend is rising. When this log bit is clear and temperature is below threshold temperature, then the workqueue callback logs a "Normal" message. Once a high threshold event is logged, the logging is rate-limited. With this patch on the same test laptop, no warnings are printed in the logs as the max time the processor could bring the temperature under control is only 280 ms. This implementation is done with the inputs from Alan Cox and Tony Luck. [ bp: Touchups. ] Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: bberg@redhat.com Cc: ckellner@redhat.com Cc: hdegoede@redhat.com Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191111214312.81365-1-srinivas.pandruvada@linux.intel.com
2019-11-01x86/mce: Add Xeon Icelake to list of CPUs that support PPINTony Luck1-0/+1
New CPU model, same MSRs to control and read the inventory number. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191028163719.19708-1-tony.luck@intel.com
2019-10-17x86/mce: Lower throttling MCE messages' priority to warningBenjamin Berg1-1/+1
On modern CPUs it is quite normal that the temperature limits are reached and the CPU is throttled. In fact, often the thermal design is not sufficient to cool the CPU at full load and limits can quickly be reached when a burst in load happens. This will even happen with technologies like RAPL limitting the long term power consumption of the package. Also, these limits are "softer", as Srinivas explains: "CPU temperature doesn't have to hit max(TjMax) to get these warnings. OEMs ha[ve] an ability to program a threshold where a thermal interrupt can be generated. In some systems the offset is 20C+ (Read only value). In recent systems, there is another offset on top of it which can be programmed by OS, once some agent can adjust power limits dynamically. By default this is set to low by the firmware, which I guess the prime motivation of Benjamin to submit the patch." So these messages do not usually indicate a hardware issue (e.g. insufficient cooling). Log them as warnings to avoid confusion about their severity. [ bp: Massage commit mesage. ] Signed-off-by: Benjamin Berg <bberg@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Christian Kellner <ckellner@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191009155424.249277-1-bberg@redhat.com
2019-10-01x86/mce: Add Zhaoxin LMCE supportTony W Wang-oc3-4/+26
Newer Zhaoxin CPUs support LMCE compatible with Intel. Add support for that. [ bp: Export functions and massage. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-5-git-send-email-TonyWWang-oc@zhaoxin.com
2019-10-01x86/mce: Add Zhaoxin CMCI supportTony W Wang-oc3-2/+33
All newer Zhaoxin CPUs support CMCI and are compatible with Intel's Machine-Check Architecture. Add that support for Zhaoxin CPUs. [ bp: Massage comments and export intel_init_cmci(). ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-4-git-send-email-TonyWWang-oc@zhaoxin.com
2019-10-01x86/mce: Add Zhaoxin MCE supportTony W Wang-oc1-13/+31
All newer Zhaoxin CPUs are compatible with Intel's Machine-Check Architecture, so add support for them. [ bp: Reflow comment in vendor_disable_error_reporting() and massage commit message. ] Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: CooperYan@zhaoxin.com Cc: DavidWang@zhaoxin.com Cc: HerryYang@zhaoxin.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: QiyuanWang@zhaoxin.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1568787573-1297-2-git-send-email-TonyWWang-oc@zhaoxin.com
2019-10-01x86/mce/amd: Make disable_err_thresholding() staticBorislav Petkov1-1/+1
No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190928170539.2729-1-bp@alien8.de
2019-09-16Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull x86 cpu-feature updates from Ingo Molnar: - Rework the Intel model names symbols/macros, which were decades of ad-hoc extensions and added random noise. It's now a coherent, easy to follow nomenclature. - Add new Intel CPU model IDs: - "Tiger Lake" desktop and mobile models - "Elkhart Lake" model ID - and the "Lightning Mountain" variant of Airmont, plus support code - Add the new AVX512_VP2INTERSECT instruction to cpufeatures - Remove Intel MPX user-visible APIs and the self-tests, because the toolchain (gcc) is not supporting it going forward. This is the first, lowest-risk phase of MPX removal. - Remove X86_FEATURE_MFENCE_RDTSC - Various smaller cleanups and fixes * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) x86/cpu: Update init data for new Airmont CPU model x86/cpu: Add new Airmont variant to Intel family x86/cpu: Add Elkhart Lake to Intel family x86/cpu: Add Tiger Lake to Intel family x86: Correct misc typos x86/intel: Add common OPTDIFFs x86/intel: Aggregate microserver naming x86/intel: Aggregate big core graphics naming x86/intel: Aggregate big core mobile naming x86/intel: Aggregate big core client naming x86/cpufeature: Explain the macro duplication x86/ftrace: Remove mcount() declaration x86/PCI: Remove superfluous returns from void functions x86/msr-index: Move AMD MSRs where they belong x86/cpu: Use constant definitions for CPU models lib: Remove redundant ftrace flag removal x86/crash: Remove unnecessary comparison x86/bitops: Use __builtin_constant_p() directly instead of IS_IMMEDIATE() x86: Remove X86_FEATURE_MFENCE_RDTSC x86/mpx: Remove MPX APIs ...
2019-08-28x86/intel: Aggregate microserver namingPeter Zijlstra1-1/+1
Currently big microservers have _XEON_D while small microservers have _X, Make it uniformly: _D. for i in `git grep -l "\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*_\(X\|XEON_D\)"` do sed -i -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*ATOM.*\)_X/\1_D/g' \ -e 's/\(\(INTEL_FAM6_\|VULNWL_INTEL\|INTEL_CPU_FAM6\).*\)_XEON_D/\1_D/g' ${i} done Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190827195122.677152989@infradead.org
2019-08-05x86/mce: Don't check for the overflow bit on action optional machine checksTony Luck1-2/+2
We currently do not process SRAO (Software Recoverable Action Optional) machine checks if they are logged with the overflow bit set to 1 in the machine check bank status register. This is overly conservative. There are two cases where we could end up with an SRAO+OVER log based on the SDM volume 3 overwrite rules in "Table 15-8. Overwrite Rules for UC, CE, and UCR Errors" 1) First a corrected error is logged, then the SRAO error overwrites. The second error overwrites the first because uncorrected errors have a higher severity than corrected errors. 2) The SRAO error was logged first, followed by a correcetd error. In this case the first error is retained in the bank. So in either case the machine check bank will contain the address of the SRAO error. So we can process that even if the overflow bit was set. Reported-by: Yongkai Wu <yongkaiwu@tencent.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190718182920.32621-1-tony.luck@intel.com
2019-07-08Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds1-1/+1
Pull force_sig() argument change from Eric Biederman: "A source of error over the years has been that force_sig has taken a task parameter when it is only safe to use force_sig with the current task. The force_sig function is built for delivering synchronous signals such as SIGSEGV where the userspace application caused a synchronous fault (such as a page fault) and the kernel responded with a signal. Because the name force_sig does not make this clear, and because the force_sig takes a task parameter the function force_sig has been abused for sending other kinds of signals over the years. Slowly those have been fixed when the oopses have been tracked down. This set of changes fixes the remaining abusers of force_sig and carefully rips out the task parameter from force_sig and friends making this kind of error almost impossible in the future" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus signal: Remove the signal number and task parameters from force_sig_info signal: Factor force_sig_info_to_task out of force_sig_info signal: Generate the siginfo in force_sig signal: Move the computation of force into send_signal and correct it. signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal signal: Remove the task parameter from force_sig_fault signal: Use force_sig_fault_to_task for the two calls that don't deliver to current signal: Explicitly call force_sig_fault on current signal/unicore32: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from __do_user_fault signal/arm: Remove tsk parameter from ptrace_break signal/nds32: Remove tsk parameter from send_sigtrap signal/riscv: Remove tsk parameter from do_trap signal/sh: Remove tsk parameter from force_sig_info_fault signal/um: Remove task parameter from send_sigtrap signal/x86: Remove task parameter from send_sigtrap signal: Remove task parameter from force_sig_mceerr signal: Remove task parameter from force_sig signal: Remove task parameter from force_sigsegv ...
2019-07-08Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds5-154/+178
Pull RAS updates from Ingo Molnar: "Boris is on vacation so I'm sending the RAS bits this time. The main changes were: - Various RAS/CEC improvements and fixes by Borislav Petkov: - error insertion fixes - offlining latency fix - memory leak fix - additional sanity checks - cleanups - debug output improvements - More SMCA enhancements by Yazen Ghannam: - make banks truly per-CPU which they are in the hardware - don't over-cache certain registers - make the number of MCA banks per-CPU variable The long term goal with these changes is to support future heterogenous SMCA extensions. - Misc fixes and improvements" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Do not check return value of debugfs_create functions x86/MCE: Determine MCA banks' init state properly x86/MCE: Make the number of MCA banks a per-CPU variable x86/MCE/AMD: Don't cache block addresses on SMCA systems x86/MCE: Make mce_banks a per-CPU array x86/MCE: Make struct mce_banks[] static RAS/CEC: Add copyright RAS/CEC: Add CONFIG_RAS_CEC_DEBUG and move CEC debug features there RAS/CEC: Dump the different array element sections RAS/CEC: Rename count_threshold to action_threshold RAS/CEC: Sanity-check array on every insertion RAS/CEC: Fix potential memory leak RAS/CEC: Do not set decay value on error RAS/CEC: Check count_threshold unconditionally RAS/CEC: Fix pfn insertion
2019-06-14x86/mce: Do not check return value of debugfs_create functionsGreg Kroah-Hartman3-54/+13
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. The only way this can fail is if: * debugfs superblock can not be pinned - something really went wrong with the vfs layer. * file is created with same name - the caller's fault. * new_inode() fails - happens if memory is exhausted. so failing to clean up debugfs properly is the least of the system's sproblems in uch a situation. [ bp: Extend commit message, remove unused err var in inject_init(). ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190612151531.GA16278@kroah.com
2019-06-11x86/MCE: Determine MCA banks' init state properlyYazen Ghannam1-0/+39
The OS is expected to write all bits to MCA_CTL for each bank, thus enabling error reporting in all banks. However, some banks may be unused in which case the registers for such banks are Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control bits because of quirks, etc. A bank can be considered uninitialized if the MCA_CTL register returns zero. This is because either the OS did not write anything or because the hardware is enforcing RAZ/WI for the bank. Set a bank's init value based on if the control bits are set or not in hardware. Return an error code in the sysfs interface for uninitialized banks. Do a final bank init check in a separate function which is not part of any user-controlled code flows. This is so a user may enable/disable a bank during runtime without having to restart their system. [ bp: Massage a bit. Discover bank init state at boot. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-6-Yazen.Ghannam@amd.com
2019-06-11x86/MCE: Make the number of MCA banks a per-CPU variableYazen Ghannam3-30/+36
The number of MCA banks is provided per logical CPU. Historically, this number has been the same across all CPUs, but this is not an architectural guarantee. Future AMD systems may have MCA bank counts that vary between logical CPUs in a system. This issue was partially addressed in 006c077041dc ("x86/mce: Handle varying MCA bank counts") by allocating structures using the maximum number of MCA banks and by saving the maximum MCA bank count in a system as the global count. This means that some extra structures are allocated. Also, this means that CPUs will spend more time in the #MC and other handlers checking extra MCA banks. Thus, define the number of MCA banks as a per-CPU variable. [ bp: Make mce_num_banks an unsigned int. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-5-Yazen.Ghannam@amd.com
2019-06-11x86/MCE/AMD: Don't cache block addresses on SMCA systemsYazen Ghannam1-36/+37
On legacy systems, the addresses of the MCA_MISC* registers need to be recursively discovered based on a Block Pointer field in the registers. On Scalable MCA systems, the register space is fixed, and particular addresses can be derived by regular offsets for bank and register type. This fixed address space includes the MCA_MISC* registers. MCA_MISC0 is always available for each MCA bank. MCA_MISC1 through MCA_MISC4 are considered available if MCA_MISC0[BlkPtr]=1. Cache the value of MCA_MISC0[BlkPtr] for each bank and per CPU. This needs to be done only during init. The values should be saved per CPU to accommodate heterogeneous SMCA systems. Redo smca_get_block_address() to directly return the block addresses. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-4-Yazen.Ghannam@amd.com
2019-06-11x86/MCE: Make mce_banks a per-CPU arrayYazen Ghannam1-28/+48
Current AMD systems have unique MCA banks per logical CPU even though the type of the banks may all align to the same bank number. Each CPU will have control of a set of MCA banks in the hardware and these are not shared with other CPUs. For example, bank 0 may be the Load-Store Unit on every logical CPU, but each bank 0 is a unique structure in the hardware. In other words, there isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs share. This idea extends even to non-core MCA banks. For example, CPU0 and CPU4 may see a Unified Memory Controller at bank 15, but each CPU is actually seeing a unique hardware structure that is not shared with other CPUs. Because the MCA banks are all unique hardware structures, it would be good to control them in a more granular way. For example, if there is a known issue with the Floating Point Unit on CPU5 and a user wishes to disable an error type on the Floating Point Unit, then it would be good to do this only for CPU5 rather than all CPUs. Also, future AMD systems may have heterogeneous MCA banks. Meaning the bank numbers may not necessarily represent the same types between CPUs. For example, bank 20 visible to CPU0 may be a Unified Memory Controller and bank 20 visible to CPU4 may be a Coherent Slave. So granular control will be even more necessary should the user wish to control specific MCA banks. Split the device attributes from struct mce_bank leaving only the MCA bank control fields. Make struct mce_banks[] per_cpu in order to have more granular control over individual MCA banks in the hardware. Allocate the device attributes statically based on the maximum number of MCA banks supported. The sysfs interface will use as many as needed per CPU. Currently, this is set to mca_cfg.banks, but will be changed to a per_cpu bank count in a future patch. Allocate the MCA control bits statically. This is in order to avoid locking warnings when memory is allocated during secondary CPUs' init sequences. Also, remove the now unnecessary return values from __mcheck_cpu_mce_banks_init() and __mcheck_cpu_cap_init(). Redo the sysfs store/show functions to handle the per_cpu mce_banks[]. [ bp: s/mce_banks_percpu/mce_banks_array/g ] [ Locking issue reported by ] Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-3-Yazen.Ghannam@amd.com
2019-06-11x86/MCE: Make struct mce_banks[] staticYazen Ghannam2-11/+10
The struct mce_banks[] array is only used in mce/core.c so move its definition there and make it static. Also, change the "init" field to bool type. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: "x86@kernel.org" <x86@kernel.org> Link: https://lkml.kernel.org/r/20190607201752.221446-2-Yazen.Ghannam@amd.com