aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/edac (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-04-22x86 EDAC, sb_edac.c: Take account of channel hashing when neededTony Luck1-1/+23
Haswell and Broadwell can be configured to hash the channel interleave function using bits [27:12] of the physical address. On those processor models we must check to see if hashing is enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and act accordingly. Based on a patch by patrickg <patrickg@supermicro.com> Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel addressTony Luck1-3/+3
In commit: eb1af3b71f9d ("Fix computation of channel address") I switched the "sck_way" variable from holding the log2 value read from the h/w to instead be the actual number. Unfortunately it is needed in log2 form when used to shift the address. Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Fixes: eb1af3b71f9d ("Fix computation of channel address") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-16Merge tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds9-111/+616
Pull EDAC updates from Borislav Petkov: - Altera: L2 cache and On-Chip RAM support (Thor Thayer). - EDAC: Workqueue handling cleanups (Borislav Petkov). - Xgene: Register bus error handling (Loc Ho). - Misc small fixes. * tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: ARM: socfpga: Enable OCRAM ECC on startup ARM: socfpga: Enable L2 cache ECC on startup ARM: dts: Add Altera L2 Cache and OCRAM EDAC entries EDAC, altera: Add Altera L2 cache and OCRAM support EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit() EDAC, mpc85xx: Silence unused variable warning EDAC: Cleanup/sync workqueue functions EDAC: Kill workqueue setup/teardown functions EDAC: Balance workqueue setup and teardown arm64: Update the APM X-Gene EDAC node with the RB register resource EDAC, xgene: Add missing SoC register bus error handling Documentation, EDAC: Update xgene binding for missing register bus EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()
2016-03-14Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-19/+342
Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call
2016-03-10EDAC/sb_edac: Fix computation of channel addressLuck, Tony1-16/+10
Large memory Haswell-EX systems with multiple DIMMs per channel were sometimes reporting the wrong DIMM. Found three problems: 1) Debug printouts for socket and channel interleave were not interpreting the register fields correctly. The socket interleave field is a 2^X value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2, 2=3. 3=4). 2) Actual use of the socket interleave value didn't interpret as 2^X 3) Conversion of address to channel address was complicated, and wrong. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errorsAravind Gopalakrishnan1-3/+332
For Scalable MCA enabled processors, errors are listed per IP block. And since it is not required for an IP to map to a particular bank, we need to use HWID and McaType values from the MCx_IPID register to figure out which IP a given bank represents. We also have a new bit (TCC) in the MCx_STATUS register to indicate Task context is corrupt. Add logic here to decode errors from all known IP blocks for Fam17h Model 00-0fh and to print TCC errors. [ Minor fixups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-07EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon PhiHubert Chrzaniuk1-1/+1
Correct a typo introduced by d0cdf9003140 ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support") As a result under some configurations DIMMs were not correctly recognized. Problem affects only Xeon Phi architecture. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-11EDAC, altera: Add Altera L2 cache and OCRAM supportThor Thayer3-8/+512
Add L2 Cache and On-Chip RAM EDAC support for the Altera SoCs. The SDRAM controller is using the Memory Controller model. Each type of ECC is individually configurable. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: galak@codeaurora.org Cc: grant.likely@linaro.org Cc: ijc+devicetree@hellion.org.uk Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-doc@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: pawel.moll@arm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1455132384-17108-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-10EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit()Thor Thayer1-1/+1
debugfs_remove() is used to remove a file or a directory from the debugfs filesystem on an EDAC device exit. However edac_debugfs might not be empty. This is similar to 30f84a891bf6 ("EDAC: Use edac_debugfs_remove_recursive()") which changed the EDAC MCI code to use edac_debugfs_remove_recursive(). Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1455064165-3816-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC, mpc85xx: Silence unused variable warningSudip Mukherjee1-1/+1
We were getting this build warning: drivers/edac/mpc85xx_edac.c:1247:6: warning: unused variable 'pvr' pvr is only used if CONFIG_FSL_SOC_BOOKE is defined. Declare it __maybe_unused. Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1454427573-7994-1-git-send-email-sudipm.mukherjee@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Cleanup/sync workqueue functionsBorislav Petkov2-20/+18
They're both running only when ->edac_check is initialized so remove that check from the workqueue function itself. Synchronize/generalize the ->op_state check between the two. Kill useless comments, while at it. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Kill workqueue setup/teardown functionsBorislav Petkov2-70/+8
We have the generic wrappers now, use those. edac_pci_workq_setup() had an unused argument anyway. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Balance workqueue setup and teardownBorislav Petkov2-13/+10
We use the ->edac_check function pointers to determine whether we need to setup a polling workqueue. However, the destroy path is not balanced and we might try to teardown an unitialized workqueue. Balance init and destroy paths by looking at ->edac_check in both cases. Set op_state to OP_OFFLINE *before* destroying anything. Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-25EDAC, xgene: Add missing SoC register bus error handlingLoc Ho1-1/+69
Add missing register bus error handling for APM X-Gene EDAC SoC and fix a checking condition for CE error promoted to UE. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Link: http://lkml.kernel.org/r/1453495625-28006-3-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-25EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()Dan Carpenter1-1/+1
dct_sel_base_off is declared as a u64 but we're only using the lower 32 bits because of a shift wrapping bug. This can possibly truncate the upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS row. Fixes: c8e518d5673d ('amd64_edac: Sanitize f10_get_base_addr_offset') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-01EDAC, i5100: Use to_delayed_work()Geliang Tang1-3/+1
Use to_delayed_work() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/58c0e319c7263a10b692100c657c06c42814aecf.1451659910.git.geliangtang@163.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC, sb_edac: Set fixed DIMM width on Xeon Knights LandingHubert Chrzaniuk1-1/+7
Knights Landing does not come with register that could be used to fetch DIMM width. However the value is fixed for this architecture so it can be hardcoded. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Rework workqueue handlingBorislav Petkov7-88/+71
Hide the EDAC workqueue pointer in a separate compilation unit and add accessors for the workqueue manipulations needed. Remove edac_pci_reset_delay_period() which wasn't used by anything. It seems it got added without a user with 91b99041c1d5 ("drivers/edac: updated PCI monitoring") Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Make edac_device workqueue setup/teardown functions staticBorislav Petkov2-6/+3
They're not used anywhere else. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Remove edac_get_sysfs_subsys() error handlingBorislav Petkov3-24/+2
It cannot fail now. We either load EDAC core after having successfully initialized edac_subsys or we don't. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Unexport and make edac_subsys staticBorislav Petkov1-2/+1
... and use the accessor instead. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Rip out the edac_subsys reference countingBorislav Petkov5-55/+43
This was really dumb - reference counting for the main EDAC sysfs object. While we could've simply registered it as the first thing in the module init path and then hand it around to what needs it. Do that and rip out all the code around it, thus simplifying the whole handling significantly. Move the edac_subsys node back to edac_module.c. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Robustify workqueues destructionBorislav Petkov3-23/+11
EDAC workqueue destruction is really fragile. We cancel delayed work but if it is still running and requeues itself, we still go ahead and destroy the workqueue and the queued work explodes when workqueue core attempts to run it. Make the destruction more robust by switching op_state to offline so that requeuing stops. Cancel any pending work *synchronously* too. EDAC i7core: Driver loaded. general protection fault: 0000 [#1] SMP CPU 12 Modules linked in: Supported: Yes Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0 < ... regs ...> Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600) Stack: ... Call Trace: call_timer_fn run_timer_softirq __do_softirq call_softirq do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt intel_idle cpuidle_idle_call cpu_idle Code: ... RIP __queue_work RSP <...> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org>
2015-12-11EDAC, mc_sysfs: Fix freeing bus' nameBorislav Petkov1-7/+14
I get the splat below when modprobing/rmmoding EDAC drivers. It happens because bus->name is invalid after bus_unregister() has run. The Code: section below corresponds to: .loc 1 1108 0 movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus .loc 1 1109 0 popq %rbx # .loc 1 1108 0 movq (%rax), %rdi # _7->name, jmp kfree # and %rax has some funky stuff 2030203020312030 which looks a lot like something walked over it. Fix that by saving the name ptr before doing stuff to string it points to. general protection fault: 0000 [#1] SMP Modules linked in: ... CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48 Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011 task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000 RIP: 0010:[<ffffffffa019da92>] [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292 RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286 RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110 R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68 R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000 FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0 Stack: Call Trace: i7core_unregister_mci.isra.9 i7core_remove pci_device_remove __device_release_driver driver_detach bus_remove_driver pci_unregister_driver i7core_exit SyS_delete_module system_call_fastpath 0x7fc9bf426536 Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b RIP [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP <ffff88030da3fe28> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: <stable@vger.kernel.org> # v3.6.. Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
2015-12-11EDAC, mpc85xx: Make mpc85xx-pci-edac a platform deviceScott Wood1-5/+33
Originally the mpc85xx-pci-edac driver bound directly to the PCI controller node. Commit 905e75c46dba ("powerpc/fsl-pci: Unify pci/pcie initialization code") turned the PCI controller code into a platform device. Since we can't have two drivers binding to the same device, the EDAC code was changed to be called into as a library-style submodule. However, this doesn't work if the EDAC driver is built as a module. Commit 8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting") exposed another problem with this approach -- mpc85xx_pci_err_probe() was being called in the same early boot phase that the PCI controller is initialized, rather than in the device_initcall phase that the EDAC layer expects. This caused a crash on boot. To fix this, the PCI controller code now creates a child platform device specifically for EDAC, which the mpc85xx-pci-edac driver binds to. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daniel Axtens <dja@axtens.net> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Jia Hongtao <B38951@freescale.com> Cc: Jiri Kosina <jkosina@suse.com> Cc: Kim Phillips <kim.phillips@freescale.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Masanari Iida <standby24x7@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: http://lkml.kernel.org/r/1449774432-18593-1-git-send-email-scottwood@freescale.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-05EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) supportJim Snow1-45/+921
Knights Landing is the next generation architecture for HPC market. KNL introduces concept of a tile and CHA - Cache/Home Agent for memory accesses. Some things are fixed in KNL: () There's single DIMM slot per channel () There's 2 memory controllers with 3 channels each, however, from EDAC standpoint, it is presented as single memory controller with 6 channels. In order to represent 2 MCs w/ 3 CH, it would require major redesign of EDAC core driver. Basically, two functionalities are added/extended: () during driver initialization KNL topology is being recognized, i.e. which channels are populated with what DIMM sizes (knl_get_dimm_capacity function) () handle MCE errors - channel swizzling Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Jim Snow <jim.m.snow@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-5-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-05EDAC, sb_edac: Add support for duplicate device IDsJim Snow1-8/+32
Add options to sbridge_get_all_devices() to allow for duplicate device IDs and devices that are scattered across mulitple PCI buses. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-4-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-05EDAC, sb_edac: Virtualize several hard-coded functionsJim Snow1-11/+48
SAD limit, interleave mode and DRAM related functionalities are now virtualized, so that overriding them is easier. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-3-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-03EDAC, mv64x60: Use platform_register/unregister_drivers()Thierry Reding1-28/+11
These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Signed-off-by: Thierry Reding <treding@nvidia.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1449073138-10852-2-git-send-email-thierry.reding@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-03EDAC, mpc85xx: Use platform_register/unregister_drivers()Thierry Reding1-8/+8
These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1449136632-11680-1-git-send-email-thierry.reding@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-11-18EDAC, pci: Remove old disabled codeBorislav Petkov1-35/+0
Remove an unused edac_pci_find() function iterating over edac_pci_list. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-11-06Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-genericLinus Torvalds3-3/+3
Pull asm-generic cleanups from Arnd Bergmann: "The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig to clean up various abuses of headers in there. The patch to rename the io-64-nonatomic-*.h headers caused some conflicts with new users, so I added a workaround that we can remove in the next merge window. The only other patch is a warning fix from Marek Vasut" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: temporarily add back asm-generic/io-64-nonatomic*.h asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations gpio-mxc: stop including <asm-generic/bug> n_tracesink: stop including <asm-generic/bug> n_tracerouter: stop including <asm-generic/bug> mlx5: stop including <asm-generic/kmap_types.h> hifn_795x: stop including <asm-generic/kmap_types.h> drbd: stop including <asm-generic/kmap_types.h> move count_zeroes.h out of asm-generic move io-64-nonatomic*.h out of asm-generic
2015-11-03Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-3/+3
Pull RAS changes from Ingo Molnar: "The main system reliability related changes were from x86, but also some generic RAS changes: - AMD MCE error injection subsystem enhancements. (Aravind Gopalakrishnan) - Fix MCE and CPU hotplug interaction bug. (Ashok Raj) - kcrash bootup robustness fix. (Baoquan He) - kcrash cleanups. (Borislav Petkov) - x86 microcode driver rework: simplify it by unmodularizing it and other cleanups. (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init() x86/mce: Add a Scalable MCA vendor flags bit MAINTAINERS: Unify the microcode driver section x86/microcode/intel: Move #ifdef DEBUG inside the function x86/microcode/amd: Remove maintainers from comments x86/microcode: Remove modularization leftovers x86/microcode: Merge the early microcode loader x86/microcode: Unmodularize the microcode driver x86/mce: Fix thermal throttling reporting after kexec kexec/crash: Say which char is the unrecognized x86/setup/crash: Check memblock_reserve() retval x86/setup/crash: Cleanup some more x86/setup/crash: Remove alignment variable x86/setup: Cleanup crashkernel reservation functions x86/amd_nb, EDAC: Rename amd_get_node_id() x86/setup: Do not reserve crashkernel high memory if low reservation failed x86/microcode/amd: Do not overwrite final patch levels x86/microcode/amd: Extract current patch level read to a function x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts ...
2015-10-22EDAC: Fix PAGES_TO_MiB macro misuseTan Xiaojun2-2/+2
The PAGES_TO_MiB macro is used for unit conversion but the trace_mc_event() tracepoint expects a page address. Fix that. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-21x86/amd_nb, EDAC: Rename amd_get_node_id()Aravind Gopalakrishnan1-3/+3
This function doesn't give us the "Node ID" as the function name suggests. Rather, it receives a PCI device as argument, checks the available F3 PCI device IDs in the system and returns the index of the matching Bus/Device IDs. Rename it to amd_pci_dev_to_node_id(). No functional change is introduced. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-15EDAC, altera: SoCFPGA EDAC should not look for ECC_CORR_ENDinh Nguyen1-2/+1
The bootloader may or may not enable the ECC_CORR_EN bit. By not enabling ECC_CORR_EN, when error happens, it is the user's responsibility to perform a full SDRAM scrub. Remove the check for ECC_CORR_EN. Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Thor Thayer <tthayer@opensource.altera.com> Link: http://lkml.kernel.org/r/1444864456-21778-1-git-send-email-dinguyen@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-15move io-64-nonatomic*.h out of asm-genericChristoph Hellwig3-3/+3
These are not implementations of default architecture code but helpers for drivers. Move them to the place they belong to. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Darren Hart <dvhart@linux.intel.com> Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2015-10-14EDAC: Use edac_debugfs_remove_recursive()Tan Xiaojun2-2/+2
debugfs_remove() is used to remove a file or a directory from the debugfs filesystem, but mci->debugfs might not empty. This can be triggered by the following sequence: 1) Enable CONFIG_EDAC_DEBUG 2) insmod an EDAC module (like i3000_edac or similar) 3) rmmod this module 4) we can see files remaining under <debugfs_mountpoint>/edac/ like "fake_inject", for example. Removing edac_core then, causes a NULL pointer dereference. Reported-by: Yun Wu (Abel) <wuyun.wu@huawei.com> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-10-03EDAC, ppc4xx_edac: Fix module autoload for OF platform driverLuis de Bethencourt1-0/+1
This platform driver has an OF device ID table but the OF module alias information is not created so module autoloading won't work. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20150917114619.GA13145@goodgumbo.baconseed.org Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-29EDAC, amd64_edac: Update copyright and remove changelogAravind Gopalakrishnan1-55/+1
Git provides us all the changelogs anyway. So trim the comments section here. Update the copyrights info while at it. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443440593-2316-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-29EDAC, amd64_edac: Extend scrub rate support to F15hM60hAravind Gopalakrishnan2-10/+27
The scrub rate control register has moved to function 2 in PCI config space and is at a different offset on family 0x15, models 0x60 and later. The minimum recommended scrub rate has also changed. (Refer to D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG). Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate this. Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com [ Cleanup conditionals. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-28EDAC: Don't allow empty DIMM labelsToshi Kani1-2/+2
Updating dimm_label to an empty string does not make much sense. Change the sysfs dimm_label store operation to fail a request when an input string is empty. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: elliott@hpe.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25EDAC: Fix sysfs dimm_label store operationToshi Kani1-10/+24
Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues in their store operation: 1) A newline-terminated input string causes redundant newlines: # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label test # od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 164 145 163 164 012 012 t e s t \n \n 0000006 2) The original label string (31 characters) cannot be stored due to an improper size check: # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label # od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 012 012 \n \n 0000002 3) An input string longer than the buffer size results a wrong label info as it allows a retry with the remaining string: # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label _TEST Fix these issues by making the following changes: 1) Replace a newline character at the end by setting a null. It also assures that the string is null-terminated in the label buffer. 2) Check the label buffer size with 'sizeof(dimm->label)'. 3) Fail a request if its string exceeds the label buffer size. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Robert Elliott <elliott@hpe.com> Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25EDAC: Fix sysfs dimm_label show operationToshi Kani1-2/+2
After 7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket") sysfs "dimm_label" and "chX_dimm_label" show their label string without a newline "\n" at the end. [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]# [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]# The label strings now have 31 characters, which are the same as EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show() and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN, the newline in the format "%s\n" is ignored. [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060 C P U _ S r c I D # 0 _ H a # 0 0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000 _ C h a n # 0 _ D I M M # 0 \0 0000037 Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the snprintf()s in channel_dimm_label_show() and dimmdev_label_show(). Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25EDAC, xgene: Add SoC supportLoc Ho1-0/+498
Add support for the SoC component. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: ijc+devicetree@hellion.org.uk Cc: jcm@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1443055261-8613-4-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25EDAC, xgene: Fix possible sprintf() overflow issueLoc Ho1-2/+2
Replace sprintf() with snprintf() to avoid possible string array overflow. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: ijc+devicetree@hellion.org.uk Cc: jcm@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1443116287-11752-1-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25EDAC, xgene: Add L3 supportLoc Ho1-195/+474
Add EDAC support for the L3 component. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: ijc+devicetree@hellion.org.uk Cc: jcm@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1443055261-8613-3-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-24EDAC, sb_edac: Fix TAD presence check for sbridge_mci_bind_devs()Seth Jennings1-4/+4
In commit 7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket") NUM_CHANNELS was changed to 8 and the channel space was renumerated to handle EN, EP, and EX configurations. The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() - got a new device presence check in the form of saw_chan_mask. However, sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop. With the increase in NUM_CHANNELS, this loop fails at index 4 since SB only has 4 TADs. This results in the following error on SB machines: EDAC sbridge: Some needed devices are missing EDAC sbridge: Couldn't find mci handler EDAC sbridge: Couldn't find mci handle This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as well. After this patch: EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED) EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED) Signed-off-by: Seth Jennings <sjenning@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # v4.2 Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-23EDAC, ghes_edac: Remove redundant memory_type arrayAravind Gopalakrishnan1-21/+1
We already have edac_mem_types[] that enumerates the different kinds of memory. So, use that and remove the redundant memory_type[] array here. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1442436811-23382-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-23EDAC, xgene: Convert to debugfs wrappersBorislav Petkov1-13/+13
Drop CONFIG_EDAC_DEBUG ifdeffery too, while at it. Tested-by: Loc Ho <lho@apm.com> Cc: linux-edac@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>