aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/edac/edac_mc.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-11-10EDAC/mc: Reduce indentation level in edac_mc_handle_error()Robert Richter1-28/+29
Reduce the indentation level in edac_mc_handle_error() a bit. No functional changes. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-7-rrichter@marvell.com
2019-11-10EDAC/mc: Remove needless zero string terminationRobert Richter1-1/+0
The e string to which this is pointing to has already been cleared earlier in the function so remove the needless zero string termination. [ bp: Correct the commit message. ] Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-6-rrichter@marvell.com
2019-11-10EDAC/mc: Do not BUG_ON() in edac_mc_alloc()Robert Richter1-1/+2
No need to crash the system in case edac_mc_alloc() is called with invalid arguments, just warn and return. This would cause a checkpatch warning when touching the code later, so just fix it. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-5-rrichter@marvell.com
2019-11-10EDAC: Introduce an mci_for_each_dimm() iteratorRobert Richter1-8/+11
Introduce an mci_for_each_dimm() iterator. It returns a pointer to a struct dimm_info. This makes the declaration and use of an index obsolete and avoids access to internal data of struct mci (direct array access etc). [ bp: push the struct dimm_info *dimm; declaration into the CONFIG_EDAC_DEBUG block. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-4-rrichter@marvell.com
2019-11-09EDAC: Remove EDAC_DIMM_OFF() macroRobert Richter1-15/+13
The EDAC_DIMM_OFF() macro takes 5 arguments to get the DIMM's index. Simplify this by storing the index in struct dimm_info to avoid its calculation and remove the EDAC_DIMM_OFF() macro. The index can be directly used then. Another advantage is that edac_mc_alloc() could be used even if the exact size of the layers is unknown. Only the number of DIMMs would be needed. Rename iterator variable to idx, while at it. The name is more handy, esp. when searching for it in the code. Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20191106093239.25517-3-rrichter@marvell.com
2019-09-03EDAC: Prefer 'unsigned int' to bare use of 'unsigned'Robert Richter1-10/+10
Use of 'unsigned int' instead of bare use of 'unsigned'. Fix this for edac_mc*, ghes and the i5100 driver as reported by checkpatch.pl. While at it, struct member dev_ch_attribute->channel is always used as unsigned int. Change type to unsigned int to avoid type casts. [ bp: Massage. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190902123216.9809-2-rrichter@marvell.com
2019-08-14EDAC/mc: Cleanup _edac_mc_free() codeRobert Richter1-13/+12
Remove needless and boilerplate variable declarations. No functional changes. [ bp: Add newlines for better readability. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190624150758.6695-10-rrichter@marvell.com
2019-08-03EDAC/mc: Fix grain_bits calculationRobert Richter1-2/+6
The grain in EDAC is defined as "minimum granularity for an error report, in bytes". The following calculation of the grain_bits in edac_mc is wrong: grain_bits = fls_long(e->grain) + 1; Where grain_bits is defined as: grain = 1 << grain_bits Example: grain = 8 # 64 bit (8 bytes) grain_bits = fls_long(8) + 1 grain_bits = 4 + 1 = 5 grain = 1 << grain_bits grain = 1 << 5 = 32 Replace it with the correct calculation: grain_bits = fls_long(e->grain - 1); The example gives now: grain_bits = fls_long(8 - 1) grain_bits = fls_long(7) grain_bits = 3 grain = 1 << 3 = 8 Also, check if the hardware reports a reasonable grain != 0 and fallback with a warning to 1 byte granularity otherwise. [ bp: massage a bit. ] Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20190624150758.6695-2-rrichter@marvell.com
2019-05-14EDAC/mc: Fix edac_mc_find() in case no device is foundRobert Richter1-8/+4
The function should return NULL in case no device is found, but it always returns the last checked mc device from the list even if the index did not match. Fix that. I did some analysis why this did not raise any issues for about 3 years and the reason is that edac_mc_find() is mostly used to search for existing devices. Thus, the bug is not triggered. [ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ] Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list") Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
2018-11-13EDAC: Drop per-memory controller busesBorislav Petkov1-8/+1
... and use the single edac_subsys object returned from subsys_system_register(). The idea is to have a single bus and multiple devices on it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> CC: Aristeu Rozanski Filho <arozansk@redhat.com> CC: Greg KH <gregkh@linuxfoundation.org> CC: Justin Ernst <justin.ernst@hpe.com> CC: linux-edac <linux-edac@vger.kernel.org> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Russ Anderson <rja@hpe.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
2018-08-17EDAC: Add missing MEM_LRDDR4 entry in edac_mem_types[]Takashi Iwai1-0/+1
The edac_mem_types[] array misses a MEM_LRDDR4 entry, which leads to NULL pointer dereference when accessed via sysfs or such. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20180810141426.8918-1-tiwai@suse.de Fixes: 1e8096bb2031 ("EDAC: Add LRDDR4 DRAM type") Signed-off-by: Borislav Petkov <bp@suse.de>
2018-03-14EDAC: Add new memory type for non-volatile DIMMsTony Luck1-1/+2
There are now non-volatile versions of DIMMs. Add a new entry to "enum mem_type" and a new string in edac_mem_types[]. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Len Brown <lenb@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/20180312182430.10335-3-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2018-03-14EDAC: Drop duplicated array of strings for memory type namesTony Luck1-20/+20
Somehow we ended up with two separate arrays of strings to describe the "enum mem_type" values. In edac_mc.c we have an exported list edac_mem_types[] that is used by a couple of drivers in debug messaged. In edac_mc_sysfs.c we have a private list that is used to display values in: /sys/devices/system/edac/mc/mc*/dimm*/dimm_mem_type /sys/devices/system/edac/mc/mc*/csrow*/mem_type This list was missing a value for MEM_LRDDR3. The string values in the two lists were different :-( Combining the lists, I kept the values so that the sysfs output will be unchanged as some scripts may depend on that. Reported-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Len Brown <lenb@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-acpi@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/20180312182430.10335-2-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-25EDAC: Add helper which returns the loaded platform driverToshi Kani1-1/+6
Only a single EDAC platform driver can be loaded. When ghes_edac is enabled, an EDAC platform driver still attempts to register itself and fails in edac_mc_add_mc(). Add edac_get_owner() so that EDAC platform drivers can check the owner first. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Borislav Petkov <bp@alien8.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170823225447.15608-5-toshi.kani@hpe.com [ Massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10EDAC: Rename report status accessorsBorislav Petkov1-4/+4
Change them to have the edac_ prefix. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10EDAC: Delete edac_stub.cBorislav Petkov1-0/+61
Move the remaining functionality to edac_mc.c. Convert "edac_report=" to a module parameter. Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10EDAC: Issue tracepoint only when it is definedBorislav Petkov1-4/+7
... and this happens only when CONFIG_RAS is enabled. Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10EDAC: Move edac_op_state to edac_mc.cBorislav Petkov1-0/+3
... as part of moving stuff away from edac_stub.c Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10EDAC: Remove edac_err_assertBorislav Petkov1-17/+1
... and the glue around it. It is not needed anymore. Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-10EDAC: Get rid of edac_handlersBorislav Petkov1-4/+2
Use mc_devices list instead to check whether we have EDAC driver instances successfully registered with EDAC core. Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-28EDAC: Add routine to check if MC devices list is emptyYazen Ghannam1-0/+14
We need to know if any MC devices have been allocated. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com [ Prettify text. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15edac: move documentation from edac_mc.c to edac_core.hMauro Carvalho Chehab1-82/+0
Several functions are documented at edac_mc.c. As we'll be including edac_core.h at drivers-api book, move those, in order for the kernel-doc markups be part of the API documentation book. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-12-15edac: rename edac_core.h to edac_mc.hMauro Carvalho Chehab1-1/+1
Now, all left at edac_core.h are at drivers/edac/edac_mc.c, so rename it to edac_mc.h. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-11-14EDAC, mc: Fix locking around mc_devices listBorislav Petkov1-18/+31
When accessing the mc_devices list of memory controller descriptors, we need to hold mem_ctls_mutex. This was not always the case, fix that. Make all external callers call a version which grabs the mutex since the last is local to edac_mc.c. Reported-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2016-06-03EDAC: Fix workqueues poll period resettingNicholas Krause1-1/+2
After the workqueue cleanup, we're registering workqueues based on the presence of an ->edac_check function. When that is the case, we're setting OP_RUNNING_POLL. But we forgot to check that in edac_mc_reset_delay_period(), leading to: BUG: unable to handle kernel paging request at 0000000000015d10 IP: [ .. ] queued_spin_lock_slowpath PGD 3ffcc8067 PUD 3ffc56067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: ... CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1 Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 Stack: Call Trace: ? _raw_spin_lock_irqsave ? lock_timer_base.isra.34 ? del_timer ? try_to_grab_pending ? mod_delayed_work_on ? edac_mc_reset_delay_period ? edac_set_poll_msec ? param_attr_store ? module_attr_store ? kernfs_fop_write ? __vfs_write ? __vfs_read ? __alloc_fd ? vfs_write ? SyS_write ? entry_SYSCALL_64_fastpath Code: RIP [ .. ] queued_spin_lock_slowpath RSP <> CR2: 0000000000015d10 ---[ end trace 3f286bc71cca15d1 ]--- Kernel panic - not syncing: Fatal exception Fix it. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Cc: <stable@vger.kernel.org> # 4.5 Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com [ Rewrite commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23EDAC: Increment correct counter in edac_inc_ue_error()Emmanouil Maroudas1-1/+1
Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of ce_noinfo_count. Signed-off-by: Emmanouil Maroudas <emmanouil.maroudas@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 4275be635597 ("edac: Change internal representation to work with layers") Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Cleanup/sync workqueue functionsBorislav Petkov1-5/+3
They're both running only when ->edac_check is initialized so remove that check from the workqueue function itself. Synchronize/generalize the ->op_state check between the two. Kill useless comments, while at it. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Kill workqueue setup/teardown functionsBorislav Petkov1-39/+4
We have the generic wrappers now, use those. edac_pci_workq_setup() had an unused argument anyway. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Balance workqueue setup and teardownBorislav Petkov1-8/+7
We use the ->edac_check function pointers to determine whether we need to setup a polling workqueue. However, the destroy path is not balanced and we might try to teardown an unitialized workqueue. Balance init and destroy paths by looking at ->edac_check in both cases. Set op_state to OP_OFFLINE *before* destroying anything. Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Rework workqueue handlingBorislav Petkov1-12/+7
Hide the EDAC workqueue pointer in a separate compilation unit and add accessors for the workqueue manipulations needed. Remove edac_pci_reset_delay_period() which wasn't used by anything. It seems it got added without a user with 91b99041c1d5 ("drivers/edac: updated PCI monitoring") Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Robustify workqueues destructionBorislav Petkov1-11/+3
EDAC workqueue destruction is really fragile. We cancel delayed work but if it is still running and requeues itself, we still go ahead and destroy the workqueue and the queued work explodes when workqueue core attempts to run it. Make the destruction more robust by switching op_state to offline so that requeuing stops. Cancel any pending work *synchronously* too. EDAC i7core: Driver loaded. general protection fault: 0000 [#1] SMP CPU 12 Modules linked in: Supported: Yes Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0 < ... regs ...> Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600) Stack: ... Call Trace: call_timer_fn run_timer_softirq __do_softirq call_softirq do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt intel_idle cpuidle_idle_call cpu_idle Code: ... RIP __queue_work RSP <...> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org>
2015-10-22EDAC: Fix PAGES_TO_MiB macro misuseTan Xiaojun1-1/+1
The PAGES_TO_MiB macro is used for unit conversion but the trace_mc_event() tracepoint expects a page address. Fix that. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-28EDAC: Cleanup atomic_scrub messBorislav Petkov1-2/+7
So first of all, this atomic_scrub() function's naming is bad. It looks like an atomic_t helper. Change it to edac_atomic_scrub(). The bigger problem is that this function is arch-specific and every new arch which doesn't necessarily need that functionality still needs to define it, otherwise EDAC doesn't compile. So instead of doing that and including arch-specific headers, have each arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c for ifdeffery. Much cleaner. And we already are doing this with another symbol - EDAC_SUPPORT. This is also much cleaner than having CONFIG_EDAC enumerate all the arches which need/have EDAC support and drivers. This way I can kill the useless edac.h header in tile too. Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Doug Thompson <dougthompson@xmission.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linuxppc-dev@lists.ozlabs.org Cc: "Maciej W. Rozycki" <macro@codesourcery.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Paul Mackerras <paulus@samba.org> Cc: "Steven J. Hill" <Steven.Hill@imgtec.com> Cc: x86@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: Allow to pass driver-specific attribute groupsTakashi Iwai1-5/+7
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info object with the optional attribute groups. This allows drivers to pass additional sysfs entries without manual (and racy) device_create_file() and co calls. edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups() with NULL groups. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-20EDAC: Sync memory types and namesBorislav Petkov1-23/+20
Make keeping the sync between the mem_types enum and the actual string names simpler by using designated initializers. Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-20EDAC: Add DDR3 LRDIMM entries to edac_mem_typesAravind Gopalakrishnan1-0/+3
F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: improve comments. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2014-09-02EDAC: Fix mem_types strings typeBorislav Petkov1-1/+1
This one got forgotten during an earlier cleanup. Signed-off-by: Borislav Petkov <bp@suse.de>
2014-06-23trace, RAS: Add basic RAS trace eventChen, Gong1-3/+0
To avoid confuision and conflict of usage for RAS related trace event, add an unified RAS trace event stub. Start a RAS subsystem menu which will be fleshed out in time, when more features get added to it. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-05-09EDAC: Fix MC scrub mode comparsion bug for correctable errorsLoc Ho1-1/+1
The MC structure field scrub_mode is of integer type - not bit field. Use it accordingly. Signed-off-by: Loc Ho <lho@apm.com> Link: http://lkml.kernel.org/r/1399590199-12256-2-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-14EDAC: Correct workqueue setup pathBorislav Petkov1-4/+7
We're using edac_mc_workq_setup() both on the init path, when we load an edac driver and when we change the polling period (edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec. On that second path we don't need to init the workqueue which has been initialized already. Thanks to Tejun for workqueue insights. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: <stable@vger.kernel.org>
2014-02-14EDAC: Poll timeout cannot be zero, p2Borislav Petkov1-2/+2
Sanitize code even more to accept unsigned longs only and to not allow polling intervals below 1 second as this is unnecessary and doesn't make much sense anyway for polling errors. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: Doug Thompson <dougthompson@xmission.com> Cc: <stable@vger.kernel.org>
2013-11-04edac: Unify reporting of device info for device, mc and pciRobert Richter1-2/+4
Log messages slightly differ between edac subsystems. Unifying it. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rric@kernel.org>
2013-07-23EDAC: Fix lockdep splatBorislav Petkov1-0/+9
Fix the following: BUG: key ffff88043bdd0330 not in .data! ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0() DEBUG_LOCKS_WARN_ON(1) Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1 Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc Call Trace: dump_stack warn_slowpath_common warn_slowpath_fmt lockdep_init_map ? trace_hardirqs_on_caller ? trace_hardirqs_on debug_mutex_init __mutex_init bus_register edac_create_sysfs_mci_device edac_mc_add_mc sbridge_probe pci_device_probe driver_probe_device __driver_attach ? driver_probe_device bus_for_each_dev driver_attach bus_add_driver driver_register __pci_register_driver ? 0xffffffffa0010fff sbridge_init ? 0xffffffffa0010fff do_one_initcall load_module ? unset_module_init_ro_nx SyS_init_module tracesys ---[ end trace d24a70b0d3ddf733 ]--- EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0 EDAC sbridge: Driver loaded. What happens is that bus_register needs a statically allocated lock_key because the last is handed in to lockdep. However, struct mem_ctl_info embeds struct bus_type (the whole struct, not a pointer to it) and the whole thing gets dynamically allocated. Fix this by using a statically allocated struct bus_type for the MC bus. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: stable@kernel.org # v3.10 Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-03-16EDAC: Merge mci.mem_is_per_rank with mci.csbasedMauro Carvalho Chehab1-3/+3
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-02-21edac: add support for raw error reportsMauro Carvalho Chehab1-22/+42
That allows APEI GHES driver to report errors directly, using the EDAC error report API. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21edac: reduce stack pressure by using a pre-allocated bufferMauro Carvalho Chehab1-33/+48
The number of variables at the stack is too big. Reduces the stack usage by using a pre-allocated error buffer. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21edac: lock module owner to avoid error report conflictsMauro Carvalho Chehab1-4/+21
APEI GHES and i7core_edac/sb_edac currently can be loaded at the same time, but those are Highlander modules: "There can be only one". There are two reasons for that: 1) Each driver assumes that it is the only one registering at the EDAC core, as it is driver's responsibility to number the memory controllers, and all of them start from 0; 2) If BIOS is handling the memory errors, the OS can't also be doing it, as one will mangle with the other. So, we need to add an module owner's lock at the EDAC core, in order to avoid having two different modules handling memory errors at the same time. The best way for doing this lock seems to use the driver's name, as this is unique, and won't require changes on every driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21edac: add a new memory layer typeMauro Carvalho Chehab1-0/+1
There are some cases where the memory controller layout is completely hidden. This is the case of firmware-driven error code, like the one provided by GHES. Add a new layer to be used on such memory error report mechanisms. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20Merge tag 'v3.8-rc7' into nextMauro Carvalho Chehab1-3/+3
Linux 3.8-rc7 * tag 'v3.8-rc7': (12052 commits) Linux 3.8-rc7 net: sctp: sctp_endpoint_free: zero out secret key data net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree atm/iphase: rename fregt_t -> ffreg_t ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned ARM: DMA mapping: fix bad atomic test ARM: realview: ensure that we have sufficient IRQs available ARM: GIC: fix GIC cpumask initialization net: usb: fix regression from FLAG_NOARP code l2tp: dont play with skb->truesize net: sctp: sctp_auth_key_put: use kzfree instead of kfree netback: correct netbk_tx_err to handle wrap around. xen/netback: free already allocated memory on failure in xen_netbk_get_requests xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. xen/netback: shutdown the ring if it contains garbage. drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try virtio_console: Don't access uninitialized data. net: qmi_wwan: add more Huawei devices, including E320 net: cdc_ncm: add another Huawei vendor specific device ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() ...