aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/PCI/pci-error-recovery.txt (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-09-26PCI/ERR: Handle fatal error recoveryKeith Busch1-25/+10
We don't need to be paranoid about the topology changing while handling an error. If the device has changed in a hotplug capable slot, we can rely on the presence detection handling to react to a changing topology. Restore the fatal error handling behavior that existed before merging DPC with AER with 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices"). Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
2018-05-17PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devicesOza Pawandeep1-10/+25
PCIe ERR_FATAL errors mean the Link is unreliable. Components on the Link may need to be reset to return to reliable operation (PCIe r4.0, sec 6.2.2). We previously handled these errors much differently depending on whether the platform supports Downstream Port Containment (DPC) (PCIe r4.0, sec 6.2.10) or not. The AER driver has historically logged the error details, called driver-supplied pci_error_handlers callbacks, and reset the Link. This reset downstream devices, but did not remove them from the PCI subsystem, re-enumerate them, or call their driver .remove() or .probe() methods. DPC is different because the hardware automatically disables the Link when it detects ERR_FATAL, which resets downstream devices. There's no opportunity for pci_error_handlers callbacks before resetting the Link. The DPC driver removes affected devices (which calls their driver .remove() methods), brings the Link back up, and re-enumerates (which calls driver .probe() methods). Align AER ERR_FATAL handling with DPC by resetting the Link in software, skipping the driver pci_error_handlers callbacks, removing the devices from the PCI subsystem, and re-enumerating. The idea is that drivers and devices should see the same behavior for ERR_FATAL events, regardless of whether they're handled by AER or DPC. Here are the basic ERR_FATAL recovery steps, showing the previous AER behavior, the AER behavior after this patch, and the DPC behavior: AER AER DPC previous new behavior -------- --- -------- Log error yes yes yes (minimal) drv.error_detected() yes no no Reset Link yes yes yes drv.mmio_enabled() yes no no drv.slot_reset() yes no no drv.resume() yes no no Remove PCI devices no yes yes (calls drv.remove()) Re-enumerate no yes yes (calls drv.probe()) N.B. With DPC, the Link reset happens before the driver .remove() calls, while with AER, the reset happens *after* the .remove() calls. The goal is to eventually do the reset before .remove() for AER as well. Signed-off-by: Oza Pawandeep <poza@codeaurora.org> [bhelgaas: changelog, squash doc patch into this, remove unused "result_data"] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com>
2017-03-29pci-error-recovery: doc cleanupCao jin1-6/+6
Include whitespace shooting; correction; typo fix; superfluous word dropping. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-02-09PCI/AER: Remove unused .link_reset() callbackMichael S. Tsirkin1-21/+3
No hardware seems to actually call .link_reset(), and no driver implements it as more than a nop stub. Drop mentions of the callback from everywhere. It's dropped from the documentation as well, but the doc really needs to be updated to reflect reality better (e.g., on PCIe, slot reset is the link reset). This will be done in a later patch. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-03-20doc:pci: Fix typo in Documentation/PCIMasanari Iida1-1/+1
This patch fix spelling typo in Documentation/PCI. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2010-04-23Documentation/: it's -> its where appropriateFrancis Galiegue1-2/+2
Fix obvious cases of "it's" being used when "its" was meant. Signed-off-by: Francis Galiegue <fgaliegue@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-09PCI: document PCIe fundamental reset interfacesMike Mason1-42/+77
The attached patch updates the Documentation/PCI/pci-error-recovery.txt file with changes related to this new bit field, as well a few unrelated updates. Signed-off-by: Linas Vepstas <linasvepstas@gmail.com> Signed-off-by: Mike Mason <mmlnx@us.ibm.com> Signed-off-by: Richard Lary <rlary@us.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-04-20PCI: doc/pci: create Documentation/PCI/ and move files into itRandy Dunlap1-0/+396
Create Documentation/PCI/ and move PCI-related files to it. Fix a few instances of trailing whitespace. Update references to the new file locations. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>