aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/eeh_event.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-06-11powerpc/powernv: Fix killed EEH eventGavin Shan1-6/+15
On PowerNV platform, EEH errors are reported by IO accessors or poller driven by interrupt. After the PE is isolated, we won't produce EEH event for the PE. The current implementation has possibility of EEH event lost in this way: The interrupt handler queues one "special" event, which drives the poller. EEH thread doesn't pick the special event yet. IO accessors kicks in, the frozen PE is marked as "isolated" and EEH event is queued to the list. EEH thread runs because of special event and purge all existing EEH events. However, we never produce an other EEH event for the frozen PE. Eventually, the PE is marked as "isolated" and we don't have EEH event to recover it. The patch fixes the issue to keep EEH events for PEs that have been marked as "isolated" with the help of additional "force" help to eeh_remove_event(). Reported-by: Rolf Brudeseth <rolfb@us.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-11-21powerpc/eeh: More accurate logGavin Shan1-2/+7
This clarifies in the log whether the error is a global PHB error or an individual PE being frozen. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-25powerpc/eeh: Use interruptible sleep in keehdGavin Shan1-1/+2
To replace down() with down_interrutible() to avoid following warning: [c00000007ba7b710] [c000000000014410] .__switch_to+0x1b0/0x380 [c00000007ba7b7c0] [c0000000007b408c] .__schedule+0x3ec/0x970 [c00000007ba7ba50] [c0000000007b1f24] .schedule_timeout+0x1a4/0x2b0 [c00000007ba7bb30] [c0000000007b34a4] .__down+0xa4/0x104 [c00000007ba7bbf0] [c0000000000b9230] .down+0x60/0x70 [c00000007ba7bc80] [c0000000000336d0] .eeh_event_handler+0x70/0x190 [c00000007ba7bd30] [c0000000000b1a58] .kthread+0xe8/0xf0 [c00000007ba7be30] [c00000000000a05c] .ret_from_kernel_thread+0x5c/0x8 This also avoids keeping the load average up while doing nothing. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/eeh: Allow to purge EEH eventsGavin Shan1-0/+37
On PowerNV platform, we might run into the situation where subsequent events are duplicated events of former one, which is being processed. For the case, we need the function implemented by the patch to purge EEH events accordingly. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/eeh: Single kthread to handle eventsGavin Shan1-47/+49
We possiblly have multiple kthreads running for multiple EEH errors (events) and use one spinlock to make the process of handling those EEH events serialized. That's unnecessary and the patch creates only one kthread, which is started during EEH core initialization time in eeh_init(). A new semaphore introduced to count the number of existing EEH events in the queue and the kthread waiting on the semaphore. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/eeh: Move common part to kernel directoryGavin Shan1-0/+142
The patch moves the common part of EEH core into arch/powerpc/kernel directory so that we needn't PPC_PSERIES while compiling POWERNV platform: * Move the EEH common part into arch/powerpc/kernel * Move the functions for PCI hotplug from pSeries platform to arch/powerpc/kernel/pci-hotplug.c * Move CONFIG_EEH from arch/powerpc/platforms/pseries/Kconfig to arch/powerpc/platforms/Kconfig * Adjust makefile accordingly Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>