sparc64: sun4v TLB error power off events

We've witnessed a few TLB events causing the machine to power off because of prom_halt. In one case it was some nfs related area during rmmod. Another was an mmapper of /dev/mem. A more recent one is an ITLB issue with a bad pagesize which could be a hardware bug. Bugs happen but we should attempt to not power off the machine and/or hang it when possible. This is a DTLB error from an mmapper of /dev/mem: [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1 SUN4V-DTLB: TPC<0xfffff80100903e6c> SUN4V-DTLB: O7[fffff801081979d0] SUN4V-DTLB: O7<0xfffff801081979d0> SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2] . This is recent mainline for ITLB: [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc> [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8] [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8> [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4] . Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call prom_halt() and drop us to OF command prompt "ok". This isn't the case for LDOMs and the machine powers off. For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal one("1"). Otherwise, for %tl > 1, we proceed eventually to die_if_kernel(). The logic of this patch was partially inspired by David Miller's feedback. Power off of large sparc64 machines is painful. Plus die_if_kernel provides more context. A reset sequence isn't a brief period on large sparc64 but better than power-off/power-on sequence. Cc: sparclinux@vger.kernel.org Signed-off-by: Bob Picco <bob.picco@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
author: bob picco <bpicco@meloft.net> 2014-09-16 09:26:47 -0400
committer: David S. Miller <davem@davemloft.net> 2014-09-16 17:46:44 -0700
commit: 4ccb9272892c33ef1c19a783cfa87103b30c2784 (patch)
tree: fe904676d83557eff6d1bb04127ba23541736140 /arch/sparc/mm/fault_64.c
parent: sparc32: dma_alloc_coherent must honour gfp flags (diff)
download: linux-dev-4ccb9272892c33ef1c19a783cfa87103b30c2784.tar.xz
linux-dev-4ccb9272892c33ef1c19a783cfa87103b30c2784.zip
1 files changed, 3 insertions, 0 deletions
diff --git a/arch/sparc/mm/fault_64.c b/arch/sparc/mm/fault_64.c
index 587cd0565128..18fcd7167095 100644
--- a/arch/sparc/mm/fault_64.c
+++ b/arch/sparc/mm/fault_64.c
@@ -346,6 +346,9 @@ retry:
 		down_read(&mm->mmap_sem);
 	}
 
+	if (fault_code & FAULT_CODE_BAD_RA)
+		goto do_sigbus;
+
 	vma = find_vma(mm, address);
 	if (!vma)
 		goto bad_area;
author	bob picco <bpicco@meloft.net>	2014-09-16 09:26:47 -0400
committer	David S. Miller <davem@davemloft.net>	2014-09-16 17:46:44 -0700
commit	4ccb9272892c33ef1c19a783cfa87103b30c2784 (patch)
tree	fe904676d83557eff6d1bb04127ba23541736140 /arch/sparc/mm/fault_64.c
parent	sparc32: dma_alloc_coherent must honour gfp flags (diff)
download	linux-dev-4ccb9272892c33ef1c19a783cfa87103b30c2784.tar.xz linux-dev-4ccb9272892c33ef1c19a783cfa87103b30c2784.zip