aboutsummaryrefslogtreecommitdiffstats
path: root/arch/ia64/kernel/mca_drv.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2007-05-11[IA64] spelling fixes: arch/ia64/Simon Arlott1-2/+2
Spelling and apostrophe fixes in arch/ia64/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-05-08header cleaning: don't include smp_lock.h when not usedRandy Dunlap1-1/+0
Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-08[IA64] Cache error recoveryRuss Anderson1-21/+11
Similar to memory error recovery, when a cache error is consumed by a user process terminate the user instead of crashing the system. Signed-off-by: Russ Anderson (rja@sgi.com) Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2007-03-08[IA64] Proper handling of TLB errors from duplicate itr.d dropinsRuss Anderson1-0/+33
Jack Steiner noticed that duplicate TLB DTC entries do not cause a linux panic. See discussion: http://www.gelato.unsw.edu.au/archives/linux-ia64/0307/6108.html The current TLB recovery code is recovering from the duplicate itr.d dropins, masking the underlying problem. This change modifies the MCA recovery code to look for the TLB check signature of the duplicate TLB entry and panic in that case. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-10-31[IA64] MCA recovery: Montecito supportRuss Anderson1-27/+68
The information in MCA records is filled in slightly differently on Montecito than on Madison/McKinley. Usually, the cache check and bus check target identifiers have the same address. On Montecito the cache check and bus check target identifiers can be different if a corrected error (ie SBE or unconsumed poison data) was encountered and then an uncorrected error (ie DBE) was consumed. In that case, the cache check target identifier is the physical address of the DBE (that caused the MCA to surface) while the bus check target identifier is the physical address of the SBE. This patch correctly finds the target identifier that triggered the MCA. If there are multiple valid cache target identifiers in the same error record then use the one with the lowest cache level. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-09-26[IA64] printing support for MCA/INITHidetoshi Seto1-18/+36
Printing message to console from MCA/INIT handler is useful, however doing oops_in_progress = 1 in them exactly makes something in kernel wrong. Especially it sounds ugly if system goes wrong after returning from recoverable MCA. This patch adds ia64_mca_printk() function that collects messages into temporary-not-so-large message buffer during in MCA/INIT environment and print them out later, after returning to normal context or when handlers determine to down the system. Also this print function is exported for use in extensional MCA handler. It would be useful to describe detail about recovery. NOTE: I don't think it is sane thing if temporary message buffer is enlarged enough to hold whole stack dumps from INIT, so buffering is disabled during stack dump from INIT-monarch (= default_monarch_init_process). please fix it in future. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel1-1/+0
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-27[IA64] Add mca recovery failure messagesRuss Anderson1-18/+36
When the mca recovery code encounters a condition that makes the MCA non-recoverable, print the reason it could not recover. This will make it easier to identify why the recovery code did not recover. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-24[IA64] MCA recovery: kernel context recovery tableRuss Anderson1-7/+15
Memory errors encountered by user applications may surface when the CPU is running in kernel context. The current code will not attempt recovery if the MCA surfaces in kernel context (privilage mode 0). This patch adds a check for cases where the user initiated the load that surfaces in kernel interrupt code. An example is a user process lauching a load from memory and the data in memory had bad ECC. Before the bad data gets to the CPU register, and interrupt comes in. The code jumps to the IVT interrupt entry point and begins execution in kernel context. The process of saving the user registers (SAVE_REST) causes the bad data to be loaded into a CPU register, triggering the MCA. The MCA surfaces in kernel context, even though the load was initiated from user context. As suggested by David and Tony, this patch uses an exception table like approach, puting the tagged recovery addresses in a searchable table. One difference from the exception table is that MCAs do not surface in precise places (such as with a TLB miss), so instead of tagging specific instructions, address ranges are registers. A single macro is used to do the tagging, with the input parameter being the label of the starting address and the macro being the ending address. This limits clutter in the code. This patch only tags one spot, the interrupt ivt entry. Testing showed that spot to be a "heavy hitter" with MCAs surfacing while saving user registers. Other spots can be added as needed by adding a single macro. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-07[IA64] mca recovery return value when no bus checkRuss Anderson1-2/+7
When there is no bus check, the return code should be failure, not success. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-03-07[IA64] Increase severity of MCA recovery messagesRuss Anderson1-3/+4
The MCA recovery messages are currently KERN_DEBUG, so they don't show up in /var/log/messages (by default). Increase the severity to KERN_ERR, for the initial message (and also add the physical address to this message). Leave the successful isolation message as KERN_DEBUG, but increase the severity when isolation fails to KERN_CRIT. [Russ' patch made these all KERN_CRIT] Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>
2006-02-09[IA64] mca_drv: Add minstate validationHidetoshi Seto1-0/+3
MCA driver can cause panic if kernel gets a state info with no minstate. This patch adds minstate validation before handling it. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-10Pull mca-check-psp into release branchTony Luck1-4/+13
2005-11-08[IA64] MCA recovery: Bump reference count on bad pagesRuss Anderson1-0/+1
When a page has a memory uncorrectable ECC error, the recovery code wants to prevent the page from being reused. This change bumps the reference count to prevent the page from getting back on the free list. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-08[IA64] MCA recovery: pfn_valid() needs a pfnRuss Anderson1-1/+1
paddr needs to be shifted by PAGE_SHIFT to be valid input for pfn_valid(). Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-11-08[IA64] MCA recovery based on PSP bitsRuss Anderson1-4/+13
The determination of whether an MCA is recoverable or not must be based on the bits set in the PSP (Processor State Parameter). The specific bits are shown in the Intel IA-64 Architecture Software Developer's Manual, Vol 2, Table 11-6 Software Recovery Bits in Processor State Parameter. Those bits should be consistent across the entire IA-64 family of processors. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-22[IA64] MCA recovery verify pfn_validHidetoshi Seto1-6/+15
Verify the pfn is valid before calling pfn_to_page(), and cut isolation message if nothing was done. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-16[IA64] mca_drv cleanupHidetoshi Seto1-45/+69
There were some trailing white spaces, long lines, brackets in weird style etc. This patch cleans them up. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-11[PATCH] MCA/INIT: use per cpu stacksKeith Owens1-20/+17
The bulk of the change. Use per cpu MCA/INIT stacks. Change the SAL to OS state (sos) to be per process. Do all the assembler work on the MCA/INIT stacks, leaving the original stack alone. Pass per cpu state data to the C handlers for MCA and INIT, which also means changing the mca_drv interfaces slightly. Lots of verification on whether the original stack is usable before converting it to a sleeping process. Signed-off-by: Keith Owens <kaos@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-05-03[IA64] MCA recovery improvementsRuss Anderson1-2/+2
Jack Steiner uncovered some opportunities for improvement in the MCA recovery code. 1) Set bsp to save registers on the kernel stack. 2) Disable interrupts while in the MCA recovery code. 3) Change the way the user process is killed, to avoid a panic in schedule. Testing shows that these changes make the recovery code much more reliable with the 2.6.12 kernel. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-04-16Linux-2.6.12-rc2Linus Torvalds1-0/+639
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!