aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/acpi/apei/ghes.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-05-12EDAC, ghes: Remove unused argument to ghes_edac_report_mem_error()Alexandru Gagniuc1-1/+1
The use of the @ghes argument was removed in a previous commit, but function signature was not updated to reflect this. Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180430213358.8319-1-mr.nuke.me@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2018-05-02ghes, EDAC: Fix ghes_edac registrationBorislav Petkov1-8/+6
Tony reported seeing "Internal error: Can't find EDAC structure" when injecting correctable errors due to the fact that ghes_edac would still load even if the whitelist won't hit. Drop the pr_err() in ghes_edac_report_mem_error() for now due to the hacky way how ghes_edac depends on ghes.c. While at it, make ghes_edac_register() return an error if it doesn't hit in the whitelist as it is the only sensible thing to do in that situation. Furthermore, move the call to it to happen last in ghes_probe() so that GHES initializing properly does not depend on ghes_edac init at all as latter is only reporting errors and not required for GHES's proper functioning. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Tested-by: Sughosh Ganu <sughosh.ganu@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20180420182015.zao3olss4tvvlxki@agluck-desk
2018-01-30Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds1-1/+1
Pull siginfo cleanups from Eric Biederman: "Long ago when 2.4 was just a testing release copy_siginfo_to_user was made to copy individual fields to userspace, possibly for efficiency and to ensure initialized values were not copied to userspace. Unfortunately the design was complex, it's assumptions unstated, and humans are fallible and so while it worked much of the time that design failed to ensure unitialized memory is not copied to userspace. This set of changes is part of a new design to clean up siginfo and simplify things, and hopefully make the siginfo handling robust enough that a simple inspection of the code can be made to ensure we don't copy any unitializied fields to userspace. The design is to unify struct siginfo and struct compat_siginfo into a single definition that is shared between all architectures so that anyone adding to the set of information shared with struct siginfo can see the whole picture. Hopefully ensuring all future si_code assignments are arch independent. The design is to unify copy_siginfo_to_user32 and copy_siginfo_from_user32 so that those function are complete and cope with all of the different cases documented in signinfo_layout. I don't think there was a single implementation of either of those functions that was complete and correct before my changes unified them. The design is to introduce a series of helpers including force_siginfo_fault that take the values that are needed in struct siginfo and build the siginfo structure for their callers. Ensuring struct siginfo is built correctly. The remaining work for 4.17 (unless someone thinks it is post -rc1 material) is to push usage of those helpers down into the architectures so that architecture specific code will not need to deal with the fiddly work of intializing struct siginfo, and then when struct siginfo is guaranteed to be fully initialized change copy siginfo_to_user into a simple wrapper around copy_to_user. Further there is work in progress on the issues that have been documented requires arch specific knowledge to sort out. The changes below fix or at least document all of the issues that have been found with siginfo generation. Then proceed to unify struct siginfo the 32 bit helpers that copy siginfo to and from userspace, and generally clean up anything that is not arch specific with regards to siginfo generation. It is a lot but with the unification you can of siginfo you can already see the code reduction in the kernel" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits) signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr mm/memory_failure: Remove unused trapno from memory_failure signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap signal: Helpers for faults with specialized siginfo layouts signal: Add send_sig_fault and force_sig_fault signal: Replace memset(info,...) with clear_siginfo for clarity signal: Don't use structure initializers for struct siginfo signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered ptrace: Use copy_siginfo in setsiginfo and getsiginfo signal: Unify and correct copy_siginfo_to_user32 signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32 signal: Unify and correct copy_siginfo_from_user32 signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h signal/powerpc: Remove redefinition of NSIGTRAP on powerpc signal: Move addr_lsb into the _sigfault union for clarity ...
2018-01-23mm/memory_failure: Remove unused trapno from memory_failureEric W. Biederman1-1/+1
Today 4 architectures set ARCH_SUPPORTS_MEMORY_FAILURE (arm64, parisc, powerpc, and x86), while 4 other architectures set __ARCH_SI_TRAPNO (alpha, metag, sparc, and tile). These two sets of architectures do not interesect so remove the trapno paramater to remove confusion. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-12-05ACPI / APEI: remove redundant variables len and node_lenColin Ian King1-3/+0
Variables len and node_len are redundant and can be removed. Cleans up clang warning: node_len = GHES_ESTATUS_NODE_LEN(len); Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-12-05ACPI: APEI: call into AER handling regardless of severityTyler Baicar1-5/+17
Currently the GHES code only calls into the AER driver for recoverable type errors. This is incorrect because errors of other severities do not get logged by the AER driver and do not get exposed to user space via the AER trace event. So, call into the AER driver for PCIe errors regardless of the severity Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-12-05ACPI: APEI: handle PCIe AER errors in separate functionTyler Baicar1-30/+34
Move PCIe AER error handling code into a separate function. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-11-13Merge tag 'acpi-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds1-95/+22
Pull ACPI updates from Rafael Wysocki: "These update ACPICA to upstream revision 20170831, fix APEI to use the fixmap instead of ioremap_page_range(), add an operation region driver for TI PMIC TPS68470, add support for PCC subspace IDs to the ACPI CPPC driver, fix a few assorted issues and clean up some code. Specifics: - Update the ACPICA code to upstream revision 20170831 including * PDTT table header support (Bob Moore). * Cleanup and extension of internal string-to-integer conversion functions (Bob Moore). * Support for 64-bit hardware accesses (Lv Zheng). * ACPI PM Timer code adjustment to deal with 64-bit return values of acpi_hw_read() (Bob Moore). * Support for deferred table verification in acpiexec (Lv Zheng). - Fix APEI to use the fixmap instead of ioremap_page_range() which cannot work correctly the way the code in there attempted to use it and drop some code that's not necessary any more after that change (James Morse). - Clean up the APEI support code and make it use 64-bit timestamps (Arnd Bergmann, Dongjiu Geng, Jan Beulich). - Add operation region driver for TI PMIC TPS68470 (Rajmohan Mani). - Add support for PCC subspace IDs to the ACPI CPPC driver (George Cherian). - Fix an ACPI EC driver regression related to the handling of EC events during the "noirq" phases of system suspend/resume (Lv Zheng). - Delay the initialization of the lid state in the ACPI button driver to fix issues appearing on some systems (Hans de Goede). - Extend the KIOX000A "device always present" quirk to cover all affected BIOS versions (Hans de Goede). - Clean up some code in the ACPI core and drivers (Colin Ian King, Gustavo Silva)" * tag 'acpi-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits) ACPI: Mark expected switch fall-throughs ACPI / LPSS: Remove redundant initialization of clk ACPI / CPPC: Make CPPC ACPI driver aware of PCC subspace IDs mailbox: PCC: Move the MAX_PCC_SUBSPACES definition to header file ACPI / sysfs: Make function param_set_trace_method_name() static ACPI / button: Delay acpi_lid_initialize_state() until first user space open ACPI / EC: Fix regression related to triggering source of EC event handling APEI / ERST: use 64-bit timestamps ACPI / APEI: Remove arch_apei_flush_tlb_one() arm64: mm: Remove arch_apei_flush_tlb_one() ACPI / APEI: Remove ghes_ioremap_area ACPI / APEI: Replace ioremap_page_range() with fixmap ACPI / APEI: remove the unused dead-code for SEA/NMI notification type ACPI / x86: Extend KIOX000A quirk to cover all affected BIOS versions ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq() ACPICA: Update version to 20170831 ACPICA: Update acpi_get_timer for 64-bit interface to acpi_hw_read ACPICA: String conversions: Update to add new behaviors ACPICA: String conversions: Cleanup/format comments. No functional changes ACPICA: Restructure/cleanup all string-to-integer conversion functions ...
2017-11-07ACPI / APEI: Remove ghes_ioremap_areaJames Morse1-37/+2
Now that nothing is using the ghes_ioremap_area pages, rip them out. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: All applicable <stable@vger.kernel.org>
2017-11-07ACPI / APEI: Replace ioremap_page_range() with fixmapJames Morse1-30/+14
Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range() with __set_fixmap() as ioremap_page_range() may sleep to allocate a new level of page-table, even if its passed an existing final-address to use in the mapping. The GHES driver can only be enabled for architectures that select HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64. clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64 and __set_pte_vaddr() for x86. In each case its the same as the respective arch_apei_flush_tlb_one(). Reported-by: Fengguang Wu <fengguang.wu@intel.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Tested-by: Toshi Kani <toshi.kani@hpe.com> [ For the arm64 bits: ] Acked-by: Will Deacon <will.deacon@arm.com> [ For the x86 bits: ] Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: All applicable <stable@vger.kernel.org>
2017-11-02ACPI / APEI: Convert timers to use timer_setup()Kees Cook1-4/+3
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tyler Baicar <tbaicar@codeaurora.org> Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org> Cc: Shiju Jose <shiju.jose@huawei.com> Cc: linux-acpi@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
2017-10-23ACPI / APEI: remove the unused dead-code for SEA/NMI notification typeDongjiu Geng1-28/+5
For the SEA notification, the two functions ghes_sea_add() and ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA is defined. If not, it will return errors in the ghes_probe() and not continue. If the probe is failed, the ghes_sea_remove() also has no chance to be called. Hence, remove the unnecessary handling when CONFIG_ACPI_APEI_SEA is not defined. For the NMI notification, it has the same issue as SEA notification, so also remove the unused dead-code for it. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-10-11ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq()Jan Beulich1-1/+2
Match up with what 7edda0886b ("acpi: apei: handle SEA notification type for ARMv8") did for ghes_ioremap_pfn_nmi(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-09-27ACPI / APEI: clear error status before acknowledging the errorTyler Baicar1-7/+9
Currently we acknowledge errors before clearing the error status. This could cause a new error to be populated by firmware in-between the error acknowledgment and the error status clearing which would cause the second error's status to be cleared without being handled. So, clear the error status before acknowledging the errors. Also, make sure to acknowledge the error if the error status read fails. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-08-30ACPI / APEI: Suppress message if HEST not presentPunit Agrawal1-1/+6
According to the ACPI specification, firmware is not required to provide the Hardware Error Source Table (HEST). When HEST is not present, the following superfluous message is printed to the kernel boot log - [ 3.460067] GHES: HEST is not enabled! Extend hest_disable variable to track whether the firmware provides this table and if it is not present skip any log output. The existing behaviour is preserved in all other cases. Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-31ACPI: APEI: Enable APEI multiple GHES source to share a single external IRQLoc Ho1-1/+2
X-Gene platforms describe multiple GHES error sources with the same hardware error notification type (external interrupt) and interrupt number. Change the GHES interrupt request to support sharing the same IRQ. This change includs contributions from Tuan Phan <tphan@apm.com>. Signed-off-by: Loc Ho <lho@apm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-05Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds1-30/+176
Pull arm64 updates from Will Deacon: - RAS reporting via GHES/APEI (ACPI) - Indirect ftrace trampolines for modules - Improvements to kernel fault reporting - Page poisoning - Sigframe cleanups and preparation for SVE context - Core dump fixes - Sparse fixes (mainly relating to endianness) - xgene SoC PMU v3 driver - Misc cleanups and non-critical fixes * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: fix endianness annotation for 'struct jit_ctx' and friends arm64: cpuinfo: constify attribute_group structures. arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set() arm64: ptrace: Remove redundant overrun check from compat_vfp_set() arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails arm64: fix endianness annotation for __apply_alternatives()/get_alt_insn() arm64: fix endianness annotation in get_kaslr_seed() arm64: add missing conversion to __wsum in ip_fast_csum() arm64: fix endianness annotation in acpi_parking_protocol.c arm64: use readq() instead of readl() to read 64bit entry_point arm64: fix endianness annotation for reloc_insn_movw() & reloc_insn_imm() arm64: fix endianness annotation for aarch64_insn_write() arm64: fix endianness annotation in aarch64_insn_read() arm64: fix endianness annotation in call_undef_hook() arm64: fix endianness annotation for debug-monitors.c ras: mark stub functions as 'inline' arm64: pass endianness info to sparse arm64: ftrace: fix !CONFIG_ARM64_MODULE_PLTS kernels arm64: signal: Allow expansion of the signal frame acpi: apei: check for pending errors when probing GHES entries ...
2017-07-03Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-14/+25
Pull RAS updates from Thomas Gleixner: "The RAS updates for the 4.13 merge window: - Cleanup of the MCE injection facility (Borsilav Petkov) - Rework of the AMD/SMCA handling (Yazen Ghannam) - Enhancements for ACPI/APEI to handle new notitication types (Shiju Jose) - atomic_t to refcount_t conversion (Elena Reshetova) - A few fixes and enhancements all over the place" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: RAS/CEC: Check the correct variable in the debugfs error handling x86/mce: Always save severity in machine_check_poll() x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise x86/mce: Update bootlog description to reflect behavior on AMD x86/mce: Don't disable MCA banks when offlining a CPU on AMD x86/mce/mce-inject: Preset the MCE injection struct x86/mce: Clean up include files x86/mce: Get rid of register_mce_write_callback() x86/mce: Merge mce_amd_inj into mce-inject x86/mce/AMD: Use saved threshold block info in interrupt handler x86/mce/AMD: Use msr_stat when clearing MCA_STATUS x86/mce/AMD: Carve out SMCA bank configuration x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_t RAS: Make local function parse_ras_param() static ACPI/APEI: Handle GSIV and GPIO notification types
2017-06-22acpi: apei: check for pending errors when probing GHES entriesTyler Baicar1-0/+3
Check for pending errors when probing GHES entries. It is possible that a fatal error is already pending at this point, so we should handle it as soon as the driver is probed. This also avoids a potential issue if there was an interrupt that was already cleared for an error since the GHES driver wasn't present. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22arm/arm64: KVM: add guest SEA supportTyler Baicar1-6/+11
Currently external aborts are unsupported by the guest abort handling. Add handling for SEAs so that the host kernel reports SEAs which occur in the guest kernel. When an SEA occurs in the guest kernel, the guest exits and is routed to kvm_handle_guest_abort(). Prior to this patch, a print message of an unsupported FSC would be printed and nothing else would happen. With this patch, the code gets routed to the APEI handling of SEAs in the host kernel to report the SEA information. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22trace, ras: add ARM processor error trace eventTyler Baicar1-1/+5
Currently there are trace events for the various RAS errors with the exception of ARM processor type errors. Add a new trace event for such errors so that the user will know when they occur. These trace events are consistent with the ARM processor error section type defined in UEFI 2.6 spec section N.2.4.4. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22ras: acpi / apei: generate trace event for unrecognized CPER sectionTyler Baicar1-0/+18
The UEFI spec includes non-standard section type support in the Common Platform Error Record. This is defined in section N.2.3 of UEFI version 2.5. Currently if the CPER section's type (UUID) does not match any section type that the kernel knows how to parse, a trace event is not generated. Generate a trace event which contains the raw error data for non-standard section type error records. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Tested-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22acpi: apei: panic OS with fatal error status blockJonathan (Zhixiong) Zhang1-15/+21
Even if an error status block's severity is fatal, the kernel does not honor the severity level and panic. With the firmware first model, the platform could inform the OS about a fatal hardware error through the non-NMI GHES notification type. The OS should panic when a hardware error record is received with this severity. Call panic() after CPER data in error status block is printed if severity is fatal, before each error section is handled. Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22acpi: apei: handle SEA notification type for ARMv8Tyler Baicar1-6/+64
ARM APEI extension proposal added SEA (Synchronous External Abort) notification type for ARMv8. Add a new GHES error source handling function for SEA. If an error source's notification type is SEA, then this function can be registered into the SEA exception handler. That way GHES will parse and report SEA exceptions when they occur. An SEA can interrupt code that had interrupts masked and is treated as an NMI. To aid this the page of address space for mapping APEI buffers while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is changed to use the helper methods to find the prot_t to map with in the same way as ghes_ioremap_pfn_irq(). Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22ras: acpi/apei: cper: add support for generic data v3 structureTyler Baicar1-6/+5
The ACPI 6.1 spec adds a new revision of the generic error data entry structure. Add support to handle the new structure as well as properly verify and iterate through the generic data entries. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-22acpi: apei: read ack upon ghes record consumptionTyler Baicar1-3/+56
A RAS (Reliability, Availability, Serviceability) controller may be a separate processor running in parallel with OS execution, and may generate error records for consumption by the OS. If the RAS controller produces multiple error records, then they may be overwritten before the OS has consumed them. The Generic Hardware Error Source (GHES) v2 structure introduces the capability for the OS to acknowledge the consumption of the error record generated by the RAS controller. A RAS controller supporting GHESv2 shall wait for the acknowledgment before writing a new error record, thus eliminating the race condition. Add support for parsing of GHESv2 sub-tables as well. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-05ACPI / APEI: Switch to use new generic UUID APIAndy Shevchenko1-4/+4
There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-05-21ACPI/APEI: Handle GSIV and GPIO notification typesShiju Jose1-14/+25
System Controller Interrupts are received by ACPI's error device, which in turn notifies the GHES code. The same is true of APEI's GSIV and GPIO notification types. Add support for GSIV and GPIO sharing the SCI register/unregister/notifier code. Rename the list and notifier to show this is no longer just SCI, but anything from the Hardware Error Device. Signed-off-by: Shiju Jose <shiju.jose@huawei.com> [ Rewrite commit log. ] Signed-off-by: James Morse <james.morse@arm.com> [ Some small cleanups ontop. ] Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: James Morse <james.morse@arm.com> Link: http://lkml.kernel.org/r/86258A5CC0A3704780874CF6004BA8A62E695201@FRAEML521-MBX.china.huawei.com Cc: "Guohanjun (Hanjun Guo)" <guohanjun@huawei.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: "Zhengqiang (turing)" <zhengqiang10@huawei.com> Cc: "fu.wei@linaro.org" <fu.wei@linaro.org> Cc: "xuwei (O)" <xuwei5@hisilicon.com> Cc: Gabriele Paoloni <gabriele.paoloni@huawei.com> Cc: Geliang Tang <geliangtang@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: Len Brown <lenb@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Punit Agrawal <punit.agrawal@arm.com> Cc: linux-acpi@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-05-01Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-3/+2
Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - add the 'Corrected Errors Collector' kernel feature which collect and monitor correctable errors statistics and will preemptively (soft-)offline physical pages that have a suspiciously high error count. - handle MCE errors during kexec() more gracefully - factor out and deprecate the /dev/mcelog driver - ... plus misc fixes and cleanpus" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Check MCi_STATUS[MISCV] for usable addr on Intel only ACPI/APEI: Use setup_deferrable_timer() x86/mce: Update notifier priority check x86/mce: Enable PPIN for Knights Landing/Mill x86/mce: Do not register notifiers with invalid prio x86/mce: Factor out and deprecate the /dev/mcelog driver RAS: Add a Corrected Errors Collector x86/mce: Rename mce_log to mce_log_buffer x86/mce: Rename mce_log()'s argument x86/mce: Init some CPU features early x86/mce: Handle broadcasted MCE gracefully with kexec
2017-04-19ACPI/APEI: Use setup_deferrable_timer()Geliang Tang1-3/+2
Use setup_deferrable_timer() instead of init_timer_deferrable() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Link: http://lkml.kernel.org/r/3afa5498142ef68256023257dad37b9f8352e65e.1489060803.git.geliangtang@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-28ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removalJames Morse1-0/+1
When removing a GHES device notified by SCI, list_del_rcu() is used, ghes_remove() should call synchronize_rcu() before it goes on to call kfree(ghes), otherwise concurrent RCU readers may still hold this list entry after it has been freed. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Fixes: 81e88fdc432a (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>Ingo Molnar1-0/+1
We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-02ACPI / APEI: Fix NMI notification handlingPrarit Bhargava1-3/+4
When removing and adding cpu 0 on a system with GHES NMI the following stack trace is seen when re-adding the cpu: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+ Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2 Call Trace: dump_stack+0x63/0x8e __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 setup_local_APIC+0x275/0x370 apic_ap_setup+0xe/0x20 start_secondary+0x48/0x180 set_init_arg+0x55/0x55 early_idt_handler_array+0x120/0x120 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x13d/0x14c During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR (0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also 0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms that something has set the IRQ_WORK_VECTOR line prior to the APIC being initialized. Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") incorrectly modified the behavior such that the handler returns NMI_HANDLED only if an error was processed, and incorrectly runs the ghes work queue for every NMI. This patch modifies the ghes_proc_irq_work() to run as it did prior to 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by properly returning NMI_HANDLED and only calling the work queue if NMI_HANDLED has been set. Fixes: 2383844d4850 (GHES: Elliminate double-loop in the NMI handler) Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-24ACPI / APEI: Fix incorrect return value of ghes_proc()Punit Agrawal1-1/+1
Although ghes_proc() tests for errors while reading the error status, it always return success (0). Fix this by propagating the return value. Fixes: d334a49113a4a33 (ACPI, APEI, Generic Hardware Error Source memory error support) Signed-of-by: Punit Agrawal <punit.agrawa.@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-20ACPI / APEI: Send correct severity to calculate AER severityTyler Baicar1-1/+1
Currently the AER severity is calculated by calling cper_severity_to_aer(), but the parameter sent is actually the GHES severity. This causes the AER severity to be incorrect. Fix the parameter to be the CPER severity instead of the GHES severity. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2016-03-09drivers/acpi: make apei/ghes.c more explicitly non-modularPaul Gortmaker1-16/+7
The Kconfig currently controlling compilation of this code is: config ACPI_APEI_GHES bool "APEI Generic Hardware Error Source" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We replace module.h with moduleparam.h as we are keeping the pre-existing module_param that the file has, as currently that is the easiest way to maintain compatibility with the existing boot arg use cases. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-14Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fixIngo Molnar1-4/+0
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14acpi/apei: Use appropriate pgprot_t to map GHES memoryJonathan (Zhixiong) Zhang1-3/+7
If the ACPI APEI firmware handles hardware error first (called "firmware first handling"), the firmware updates the GHES memory region with hardware error record (called "generic hardware error record"). Essentially the firmware writes hardware error records in the GHES memory region, triggers an NMI/interrupt, then the GHES driver goes off and grabs the error record from the GHES region. The kernel currently maps the GHES memory region as cacheable (PAGE_KERNEL) for all architectures. However, on some arm64 platforms, there is a mismatch between how the kernel maps the GHES region (PAGE_KERNEL) and how the firmware maps it (EFI_MEMORY_UC, ie. uncacheable), leading to the possibility of the kernel GHES driver reading stale data from the cache when it receives the interrupt. With stale data being read, the kernel is unaware there is new hardware error to be handled when there actually is; this may lead to further damage in various scenarios, such as error propagation caused data corruption. If uncorrected error (such as double bit ECC error) happened in memory operation and if the kernel is unaware of such an event happening, errorneous data may be propagated to the disk. Instead GHES memory region should be mapped with page protection type according to what is returned from arch_apei_get_mem_attribute(). Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [ Small stylistic tweaks. ] Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1441372302-23242-3-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-08ACPI: Remove FSF mailing addressesJarkko Nikula1-4/+0
There is no need to carry potentially outdated Free Software Foundation mailing address in file headers since the COPYING file includes it. Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-27GHES: Make NMI handler have a single readerJiri Kosina1-5/+7
Since GHES sources are global, we theoretically need only a single CPU reading them per NMI instead of a thundering herd of CPUs waiting on a spinlock in NMI context for no reason at all. Do that. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-04-27GHES: Elliminate double-loop in the NMI handlerBorislav Petkov1-9/+2
There's no real need to iterate twice over the HW error sources in the NMI handler. With the previous cleanups, elliminating the second loop is almost trivial. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-04-27GHES: Panic right after detectionBorislav Petkov1-10/+6
The moment we log an error of panic severity, there's no need to noodle through the ghes_nmi list anymore. So panic instead right then and there. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-04-27GHES: Carve out the panic functionalityBorislav Petkov1-10/+14
... into another function for more clarity. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-04-27GHES: Carve out error queueing in a separate functionBorislav Petkov1-22/+29
Make the handler more readable. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-14Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds1-1/+0
Pull driver core update from Greg KH: "Here's the set of driver core patches for 3.19-rc1. They are dominated by the removal of the .owner field in platform drivers. They touch a lot of files, but they are "simple" changes, just removing a line in a structure. Other than that, a few minor driver core and debugfs changes. There are some ath9k patches coming in through this tree that have been acked by the wireless maintainers as they relied on the debugfs changes. Everything has been in linux-next for a while" * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits) Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries" fs: debugfs: add forward declaration for struct device type firmware class: Deletion of an unnecessary check before the function call "vunmap" firmware loader: fix hung task warning dump devcoredump: provide a one-way disable function device: Add dev_<level>_once variants ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries ath: use seq_file api for ath9k debugfs files debugfs: add helper function to create device related seq_file drivers/base: cacheinfo: remove noisy error boot message Revert "core: platform: add warning if driver has no owner" drivers: base: support cpu cache information interface to userspace via sysfs drivers: base: add cpu_device_create to support per-cpu devices topology: replace custom attribute macros with standard DEVICE_ATTR* cpumask: factor out show_cpumap into separate helper function driver core: Fix unbalanced device reference in drivers_probe driver core: fix race with userland in device_add() sysfs/kernfs: make read requests on pre-alloc files use the buffer. sysfs/kernfs: allow attributes to request write buffer be pre-allocated. fs: sysfs: return EGBIG on write if offset is larger than file size ...
2014-10-21GHES: Make ghes_estatus_caches staticBorislav Petkov1-1/+1
It is used only in ghes.c. Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-21APEI, GHES: Cleanup unnecessary function for lockless listChen, Gong1-16/+2
We have a generic function to reverse a lockless list, kill homegrown copy. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1406530260-26078-2-git-send-email-gong.chen@linux.intel.com Acked-by: Tony Luck <tony.luck@intel.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> [ Boris: correct commit msg ] Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-20acpi: apei: drop owner assignment from platform_driversWolfram Sang1-1/+0
A platform_driver does not need to set an owner, it will be populated by the driver core. Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2014-08-06Merge tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds1-16/+16
Pull ACPI and power management updates from Rafael Wysocki: "Again, ACPICA leads the pack (47 commits), followed by cpufreq (18 commits) and system suspend/hibernation (9 commits). From the new code perspective, the ACPICA update brings ACPI 5.1 to the table, including a new device configuration object called _DSD (Device Specific Data) that will hopefully help us to operate device properties like Device Trees do (at least to some extent) and changes related to supporting ACPI on ARM. Apart from that we have hibernation changes making it use radix trees to store memory bitmaps which should speed up some operations carried out by it quite significantly. We also have some power management changes related to suspend-to-idle (the "freeze" sleep state) support and more preliminary changes needed to support ACPI on ARM (outside of ACPICA). The rest is fixes and cleanups pretty much everywhere. Specifics: - ACPICA update to upstream version 20140724. That includes ACPI 5.1 material (support for the _CCA and _DSD predefined names, changes related to the DMAR and PCCT tables and ARM support among other things) and cleanups related to using ACPICA's header files. A major part of it is related to acpidump and the core code used by that utility. Changes from Bob Moore, David E Box, Lv Zheng, Sascha Wildner, Tomasz Nowicki, Hanjun Guo. - Radix trees for memory bitmaps used by the hibernation core from Joerg Roedel. - Support for waking up the system from suspend-to-idle (also known as the "freeze" sleep state) using ACPI-based PCI wakeup signaling (Rafael J Wysocki). - Fixes for issues related to ACPI button events (Rafael J Wysocki). - New device ID for an ACPI-enumerated device included into the Wildcat Point PCH from Jie Yang. - ACPI video updates related to backlight handling from Hans de Goede and Linus Torvalds. - Preliminary changes needed to support ACPI on ARM from Hanjun Guo and Graeme Gregory. - ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui. - Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros (Rafael J Wysocki). - ACPI-based device hotplug cleanups from Wei Yongjun and Rafael J Wysocki. - Cleanups and improvements related to system suspend from Lan Tianyu, Randy Dunlap and Rafael J Wysocki. - ACPI battery cleanup from Wei Yongjun. - cpufreq core fixes from Viresh Kumar. - Elimination of a deadband effect from the cpufreq ondemand governor and intel_pstate driver cleanups from Stratos Karafotis. - 350MHz CPU support for the powernow-k6 cpufreq driver from Mikulas Patocka. - Fix for the imx6 cpufreq driver from Anson Huang. - cpuidle core and governor cleanups from Daniel Lezcano, Sandeep Tripathy and Mohammad Merajul Islam Molla. - Build fix for the big_little cpuidle driver from Sachin Kamat. - Configuration fix for the Operation Performance Points (OPP) framework from Mark Brown. - APM cleanup from Jean Delvare. - cpupower utility fixes and cleanups from Peter Senna Tschudin, Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas Renninger" * tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (118 commits) ACPI / LPSS: add LPSS device for Wildcat Point PCH ACPI / PNP: Replace faulty is_hex_digit() by isxdigit() ACPICA: Update version to 20140724. ACPICA: ACPI 5.1: Update for PCCT table changes. ACPICA/ARM: ACPI 5.1: Update for GTDT table changes. ACPICA/ARM: ACPI 5.1: Update for MADT changes. ACPICA/ARM: ACPI 5.1: Update for FADT changes. ACPICA: ACPI 5.1: Support for the _CCA predifined name. ACPICA: ACPI 5.1: New notify value for System Affinity Update. ACPICA: ACPI 5.1: Support for the _DSD predefined name. ACPICA: Debug object: Add current value of Timer() to debug line prefix. ACPICA: acpihelp: Add UUID support, restructure some existing files. ACPICA: Utilities: Fix local printf issue. ACPICA: Tables: Update for DMAR table changes. ACPICA: Remove some extraneous printf arguments. ACPICA: Update for comments/formatting. No functional changes. ACPICA: Disassembler: Add support for the ToUUID opererator (macro). ACPICA: Remove a redundant cast to acpi_size for ACPI_OFFSET() macro. ACPICA: Work around an ancient GCC bug. ACPI / processor: Make it possible to get local x2apic id via _MAT ...
2014-07-22acpi, apei, ghes: Factor out ioremap virtual memory for IRQ and NMI context.Tomasz Nowicki1-7/+11
GHES currently maps two pages with atomic_ioremap. From now on, NMI is architectural depended so there is no need to allocate an NMI page for platforms without NMI support. To make it possible to not use a second page, swap the existing page order so that the IRQ context page is first, and the optional NMI context page is second. Then, use HAVE_ACPI_APEI_NMI to decide how many pages are to be allocated. Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>