diff options
author | Mauro Carvalho Chehab <mchehab+huawei@kernel.org> | 2021-09-30 11:44:51 +0200 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2021-10-05 16:24:15 +0200 |
commit | edfc8730ba45eac3cca20dba3799d6ae6c584b56 (patch) | |
tree | 1eef084e6fd0b2a3484e6459140d07c806ab0012 /Documentation/ABI | |
parent | scripts: get_abi.pl: better generate regex from what fields (diff) | |
download | linux-dev-edfc8730ba45eac3cca20dba3799d6ae6c584b56.tar.xz linux-dev-edfc8730ba45eac3cca20dba3799d6ae6c584b56.zip |
ABI: sysfs-mce: add a new ABI file
Reduce the gap of missing ABIs for Intel servers with MCE
by adding a new ABI file.
The contents of this file comes from:
Documentation/x86/x86_64/machinecheck.rst
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/801a26985e32589eb78ba4b728d3e19fdea18f04.1632994837.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'Documentation/ABI')
-rw-r--r-- | Documentation/ABI/testing/sysfs-mce | 107 |
1 files changed, 107 insertions, 0 deletions
diff --git a/Documentation/ABI/testing/sysfs-mce b/Documentation/ABI/testing/sysfs-mce new file mode 100644 index 000000000000..686fbfa02cdc --- /dev/null +++ b/Documentation/ABI/testing/sysfs-mce @@ -0,0 +1,107 @@ +What: /sys/devices/system/machinecheck/machinecheckX/ +Contact: Andi Kleen <ak@linux.intel.com> +Date: Feb, 2007 +Description: + (X = CPU number) + + Machine checks report internal hardware error conditions + detected by the CPU. Uncorrected errors typically cause a + machine check (often with panic), corrected ones cause a + machine check log entry. + + For more details about the x86 machine check architecture + see the Intel and AMD architecture manuals from their + developer websites. + + For more details about the architecture + see http://one.firstfloor.org/~andi/mce.pdf + + Each CPU has its own directory. + +What: /sys/devices/system/machinecheck/machinecheckX/bank<Y> +Contact: Andi Kleen <ak@linux.intel.com> +Date: Feb, 2007 +Description: + (Y bank number) + + 64bit Hex bitmask enabling/disabling specific subevents for + bank Y. + + When a bit in the bitmask is zero then the respective + subevent will not be reported. + + By default all events are enabled. + + Note that BIOS maintain another mask to disable specific events + per bank. This is not visible here + +What: /sys/devices/system/machinecheck/machinecheckX/check_interval +Contact: Andi Kleen <ak@linux.intel.com> +Date: Feb, 2007 +Description: + The entries appear for each CPU, but they are truly shared + between all CPUs. + + How often to poll for corrected machine check errors, in + seconds (Note output is hexadecimal). Default 5 minutes. + When the poller finds MCEs it triggers an exponential speedup + (poll more often) on the polling interval. When the poller + stops finding MCEs, it triggers an exponential backoff + (poll less often) on the polling interval. The check_interval + variable is both the initial and maximum polling interval. + 0 means no polling for corrected machine check errors + (but some corrected errors might be still reported + in other ways) + +What: /sys/devices/system/machinecheck/machinecheckX/tolerant +Contact: Andi Kleen <ak@linux.intel.com> +Date: Feb, 2007 +Description: + The entries appear for each CPU, but they are truly shared + between all CPUs. + + Tolerance level. When a machine check exception occurs for a + non corrected machine check the kernel can take different + actions. + + Since machine check exceptions can happen any time it is + sometimes risky for the kernel to kill a process because it + defies normal kernel locking rules. The tolerance level + configures how hard the kernel tries to recover even at some + risk of deadlock. Higher tolerant values trade potentially + better uptime with the risk of a crash or even corruption + (for tolerant >= 3). + + == =========================================================== + 0 always panic on uncorrected errors, log corrected errors + 1 panic or SIGBUS on uncorrected errors, log corrected errors + 2 SIGBUS or log uncorrected errors, log corrected errors + 3 never panic or SIGBUS, log all errors (for testing only) + == =========================================================== + + Default: 1 + + Note this only makes a difference if the CPU allows recovery + from a machine check exception. Current x86 CPUs generally + do not. + +What: /sys/devices/system/machinecheck/machinecheckX/trigger +Contact: Andi Kleen <ak@linux.intel.com> +Date: Feb, 2007 +Description: + The entries appear for each CPU, but they are truly shared + between all CPUs. + + Program to run when a machine check event is detected. + This is an alternative to running mcelog regularly from cron + and allows to detect events faster. + +What: /sys/devices/system/machinecheck/machinecheckX/monarch_timeout +Contact: Andi Kleen <ak@linux.intel.com> +Date: Feb, 2007 +Description: + How long to wait for the other CPUs to machine check too on a + exception. 0 to disable waiting for other CPUs. + + Unit: us + |