diff options
author | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2007-02-14 09:46:06 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2007-02-14 09:46:06 -0800 |
commit | 414f827c46973ba39320cfb43feb55a0eeb9b4e8 (patch) | |
tree | 45e860974ef698e71370a0ebdddcff4f14fbdf9e /Documentation/x86_64/machinecheck | |
parent | [PATCH] sysctl: hide the sysctl proc inodes from selinux (diff) | |
parent | [PATCH] x86-64: Remove mk_pte_phys() (diff) | |
download | linux-dev-414f827c46973ba39320cfb43feb55a0eeb9b4e8.tar.xz linux-dev-414f827c46973ba39320cfb43feb55a0eeb9b4e8.zip |
Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits)
[PATCH] x86-64: Remove mk_pte_phys()
[PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386
[PATCH] i386: fix 32-bit ioctls on x64_32
[PATCH] x86: Unify pcspeaker platform device code between i386/x86-64
[PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
[PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
[PATCH] i386: Move mce_disabled to asm/mce.h
[PATCH] i386: paravirt unhandled fallthrough
[PATCH] x86_64: Wire up compat epoll_pwait
[PATCH] x86: Don't require the vDSO for handling a.out signals
[PATCH] i386: Fix Cyrix MediaGX detection
[PATCH] i386: Fix warning in cpu initialization
[PATCH] i386: Fix warning in microcode.c
[PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
[PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
[PATCH] i386: Remove fastcall in paravirt.[ch]
[PATCH] x86-64: Fix wrong gcc check in bitops.h
[PATCH] x86-64: survive having no irq mapping for a vector
[PATCH] i386: geode configuration fixes
[PATCH] i386: add option to show more code in oops reports
...
Diffstat (limited to 'Documentation/x86_64/machinecheck')
-rw-r--r-- | Documentation/x86_64/machinecheck | 70 |
1 files changed, 70 insertions, 0 deletions
diff --git a/Documentation/x86_64/machinecheck b/Documentation/x86_64/machinecheck new file mode 100644 index 000000000000..068a6d9904b9 --- /dev/null +++ b/Documentation/x86_64/machinecheck @@ -0,0 +1,70 @@ + +Configurable sysfs parameters for the x86-64 machine check code. + +Machine checks report internal hardware error conditions detected +by the CPU. Uncorrected errors typically cause a machine check +(often with panic), corrected ones cause a machine check log entry. + +Machine checks are organized in banks (normally associated with +a hardware subsystem) and subevents in a bank. The exact meaning +of the banks and subevent is CPU specific. + +mcelog knows how to decode them. + +When you see the "Machine check errors logged" message in the system +log then mcelog should run to collect and decode machine check entries +from /dev/mcelog. Normally mcelog should be run regularly from a cronjob. + +Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN +(N = CPU number) + +The directory contains some configurable entries: + +Entries: + +bankNctl +(N bank number) + 64bit Hex bitmask enabling/disabling specific subevents for bank N + When a bit in the bitmask is zero then the respective + subevent will not be reported. + By default all events are enabled. + Note that BIOS maintain another mask to disable specific events + per bank. This is not visible here + +The following entries appear for each CPU, but they are truly shared +between all CPUs. + +check_interval + How often to poll for corrected machine check errors, in seconds + (Note output is hexademical). Default 5 minutes. + +tolerant + Tolerance level. When a machine check exception occurs for a non + corrected machine check the kernel can take different actions. + Since machine check exceptions can happen any time it is sometimes + risky for the kernel to kill a process because it defies + normal kernel locking rules. The tolerance level configures + how hard the kernel tries to recover even at some risk of deadlock. + + 0: always panic, + 1: panic if deadlock possible, + 2: try to avoid panic, + 3: never panic or exit (for testing only) + + Default: 1 + + Note this only makes a difference if the CPU allows recovery + from a machine check exception. Current x86 CPUs generally do not. + +trigger + Program to run when a machine check event is detected. + This is an alternative to running mcelog regularly from cron + and allows to detect events faster. + +TBD document entries for AMD threshold interrupt configuration + +For more details about the x86 machine check architecture +see the Intel and AMD architecture manuals from their developer websites. + +For more details about the architecture see +see http://one.firstfloor.org/~andi/mce.pdf |