aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers/pci/pcie/aer.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-07-13PCI/AER: Iterate over error counters instead of error stringsMohamed Khalfella1-1/+6
Previously we iterated over AER stat *names*, e.g., aer_correctable_error_string[32], but the actual stat *counters* may not be that large, e.g., pdev->aer_stats->dev_cor_errs[16], which means that we printed junk in the sysfs stats files. Iterate over the stat counter arrays instead of the names to avoid this junk. Also, added a build time check to make sure all counters have entries in strings array. Fixes: 0678e3109a3c ("PCI/AER: Simplify __aer_print_error()") Link: https://lore.kernel.org/r/20220509181441.31884-1-mkhalfella@purestorage.com Reported-by: Meeta Saggi <msaggi@purestorage.com> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Meeta Saggi <msaggi@purestorage.com> Reviewed-by: Eric Badger <ebadger@purestorage.com> Cc: stable@vger.kernel.org
2022-07-13PCI/AER: Enable error reporting when AER is nativeStefan Roese1-0/+3
If we have native control of AER, set the following error reporting enable bits: - Correctable Error Reporting Enable - Non-Fatal Error Reporting Enable - Fatal Error Reporting Enable - Unsupported Request Reporting Enable Note that these bits are all in the Device Control register and are not AER-specific. This affects all devices with an AER capability, including hot-added devices. Please note that this change is quite invasive, as error reporting now will be enabled for all available PCIe Endpoints, which was previously not the case. When "pci=noaer" is selected, error reporting stays disabled of course. [bhelgaas: commit log, note error reporting is not AER-specific] Link: https://lore.kernel.org/r/20220125071820.2247260-4-sr@denx.de Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Pali Rohár <pali@kernel.org> Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Yao Hongbo <yaohongbo@linux.alibaba.com> Cc: Naveen Naidu <naveennaidu479@gmail.com>
2022-07-11PCI/AER: Configure ECRC for every deviceStefan Roese1-3/+2
Move pcie_set_ecrc_checking() to pci_aer_init() to make sure that pcie_set_ecrc_checking() is called for each PCIe device, including hot-added devices. Link: https://lore.kernel.org/r/20220125071820.2247260-2-sr@denx.de Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Pali Rohár <pali@kernel.org> Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Yao Hongbo <yaohongbo@linux.alibaba.com> Cc: Naveen Naidu <naveennaidu479@gmail.com>
2022-05-17PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bitsKuppuswamy Sathyanarayanan1-1/+6
When a Root Port or Root Complex Event Collector receives an error Message e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status register and logs the Requester ID in the Error Source Identification register. If it receives a second ERR_COR Message before software clears PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the Requester ID is lost. In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared: - hardware receives ERR_COR message - hardware sets PCI_ERR_ROOT_COR_RCV - aer_irq() entered - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS) - aer_irq(): now status == PCI_ERR_ROOT_COR_RCV - hardware receives second ERR_COR message - hardware sets PCI_ERR_ROOT_MULTI_COR_RCV - aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status) - PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set - aer_irq() entered again - aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS) - aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV - aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set - PCI_ERR_ROOT_MULTI_COR_RCV is still set The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV. Fix the problem by queueing an AER event and clearing the Root Error Status bits when any of these bits are set: PCI_ERR_ROOT_COR_RCV PCI_ERR_ROOT_UNCOR_RCV PCI_ERR_ROOT_MULTI_COR_RCV PCI_ERR_ROOT_MULTI_UNCOR_RCV See the bugzilla link for details from Eric about how to reproduce this problem. [bhelgaas: commit log, move repro details to bugzilla] Fixes: e167bfcaa4cd ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992 Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.com Reported-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com>
2021-10-08PCI: Correct misspelled and remove duplicated wordsKrzysztof Wilczyński1-1/+1
Correct a number of misspelled words and remove any words that were duplicated in the PCI tree. No change to functionality intended. Link: https://lore.kernel.org/r/20211006233827.147328-1-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-18PCI: Change the type of probe argument in reset functionsAmey Narkhede1-1/+1
Change the type of probe argument in functions which implement reset methods from int to bool to make the context and intent clear. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20210817180500.1253-10-ameynarkhede03@gmail.com Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-08-17PCI: Add pcie_reset_flr() with 'probe' argumentAmey Narkhede1-7/+5
Most reset methods are of the form "pci_*_reset(dev, probe)". pcie_flr() was an exception because it relied on a separate pcie_has_flr() function instead of taking a "probe" argument. Add "pcie_reset_flr(dev, probe)" to follow the convention. Remove pcie_has_flr(). Some pcie_flr() callers that did not use pcie_has_flr() remain. [bhelgaas: commit log, rework pcie_reset_flr() to use dev->devcap directly] Link: https://lore.kernel.org/r/20210817180500.1253-3-ameynarkhede03@gmail.com Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
2021-07-06Merge branch 'pci/sysfs'Bjorn Helgaas1-9/+11
- Fix dsm_label_utf16s_to_utf8s() buffer overrun (Krzysztof Wilczyński) - Use sysfs_emit() and sysfs_emit_at() in "show" functions (Krzysztof Wilczyński) - Fix 'resource_alignment' newline issues (Krzysztof Wilczyński) - Add newline to 'devspec' sysfs file (Krzysztof Wilczyński) * pci/sysfs: PCI/sysfs: Add 'devspec' newline PCI/sysfs: Fix 'resource_alignment' newline issues PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functions PCI/sysfs: Rely on lengths from scnprintf(), dsm_label_utf16s_to_utf8s() PCI/sysfs: Fix dsm_label_utf16s_to_utf8s() buffer overrun # Conflicts: # drivers/pci/p2pdma.c
2021-06-03PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functionsKrzysztof Wilczyński1-9/+11
The sysfs_emit() and sysfs_emit_at() functions were introduced to make it less ambiguous which function is preferred when writing to the output buffer in a device attribute's "show" callback [1]. Convert the PCI sysfs object "show" functions from sprintf(), snprintf() and scnprintf() to sysfs_emit() and sysfs_emit_at() accordingly, as the latter is aware of the PAGE_SIZE buffer and correctly returns the number of bytes written into the buffer. No functional change intended. [1] Documentation/filesystems/sysfs.rst Related commit: ad025f8e46f3 ("PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functions"). Link: https://lore.kernel.org/r/20210603000112.703037-2-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
2021-05-27PCI/AER: Use consistent format when printing PCI deviceYicong Yang1-2/+2
We use format domain:bus:slot.function when printing PCI device. Use consistent format in AER messages. [bhelgaas: also drop "AER recover:" prefix since we already have an "AER:" prefix from pr_fmt()] Link: https://lore.kernel.org/r/1617015721-51701-1-git-send-email-yangyicong@hisilicon.com Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
2021-03-11PCI: Fix kernel-doc errorsKrzysztof Wilczyński1-3/+3
Fix kernel-doc formatting errors, function names that don't match the doc, and some missing parameter documentation. These are reported by: make W=1 drivers/pci/ No functional change intended. [bhelgaas: squashed into one patch since this only changes comments] Link: https://lore.kernel.org/r/20210311001724.423356-1-kw@linux.com Link: https://lore.kernel.org/r/20210311001724.423356-2-kw@linux.com Link: https://lore.kernel.org/r/20210311001724.423356-3-kw@linux.com Link: https://lore.kernel.org/r/20210311001724.423356-4-kw@linux.com Link: https://lore.kernel.org/r/20210311001724.423356-5-kw@linux.com Link: https://lore.kernel.org/r/20210311001724.423356-6-kw@linux.com Link: https://lore.kernel.org/r/20210311001724.423356-7-kw@linux.com Link: https://lore.kernel.org/r/20210311001724.423356-8-kw@linux.com Signed-off-by: Krzysztof Wilczyński <kw@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2021-02-23PCI/AER: Specify the type of Port that was resetKeith Busch1-1/+2
The AER driver may be called upon to reset either a Downstream or a Root Port. Check which type it is to properly identify it when logging that the reset occurred. Link: https://lore.kernel.org/r/20210104230300.1277180-5-kbusch@kernel.org Tested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Hedi Berriche <hedi.berriche@hpe.com>
2021-02-23PCI/AER: Clear AER status from Root Port when resetting Downstream PortKeith Busch1-1/+1
The pci_dev parameter given to aer_root_reset() may be a Downstream Port rather than the Root Port. Get the Root Port from the provided device in order to clear the root's AER status. Link: https://lore.kernel.org/r/20210104230300.1277180-3-kbusch@kernel.org Tested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Sean V Kelley <sean.v.kelley@intel.com> Acked-by: Hedi Berriche <hedi.berriche@hpe.com>
2020-12-05PCI/AER: Add pcie_walk_rcec() to RCEC AER handlingSean V Kelley1-4/+11
Root Complex Event Collectors (RCEC) appear as peers to Root Ports and also have the AER capability. In addition, actions need to be taken for associated RCiEPs. In such cases the RCECs will need to be walked in order to find and act upon their respective RCiEPs. Extend the existing ability to link the RCECs with a walking function pcie_walk_rcec(). Add RCEC support to the current AER service driver and attach the AER service driver to the RCEC device. Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20201121001036.8560-14-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2020-12-05PCI/ERR: Recover from RCiEP AER errorsQiuxu Zhuo1-5/+19
Add support for handling AER errors detected by Root Complex Integrated Endpoints (RCiEPs). These errors are signaled to software natively via a Root Complex Event Collector (RCEC) or non-natively via ACPI APEI if the platform retains control of AER or uses a non-standard RCEC-like device. When recovering from RCiEP errors, the Root Error Command and Status registers are in the AER Capability of an associated RCEC (if any), not in a Root Port. In the non-native case, the platform is responsible for those registers and we can't touch them. [bhelgaas: commit log, etc] Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com> Link: https://lore.kernel.org/r/20201121001036.8560-13-sean.v.kelley@intel.com Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-12-05PCI/ERR: Recover from RCEC AER errorsSean V Kelley1-17/+41
A Root Complex Event Collector (RCEC) collects and signals AER errors that were detected by Root Complex Integrated Endpoints (RCiEPs), but it may also signal errors it detects itself. This is analogous to errors detected and signaled by a Root Port. Update the AER service driver to claim RCECs in addition to Root Ports. Add support for handling RCEC-detected AER errors. This does not include handling RCiEP-detected errors that are signaled by the RCEC. Note that we expect these errors only from the native AER and APEI paths, not from DPC or EDR. [bhelgaas: split from combined RCEC/RCiEP patch, commit log] Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-12-04PCI/ERR: Simplify by computing pci_pcie_type() onceSean V Kelley1-2/+3
Instead of calling pci_pcie_type(dev) twice, call it once and save the result. No functional change intended. Link: https://lore.kernel.org/r/20201121001036.8560-7-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2020-12-01PCI/AER: Write AER Capability only when we control itSean V Kelley1-13/+16
If an OS has not been granted AER control via _OSC, it should not make changes to PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS related registers. Per section 4.5.1 of the System Firmware Intermediary (SFI) _OSC and DPC Updates ECN [1], this bit also covers these aspects of the PCI Express Advanced Error Reporting. Based on the above and earlier discussion [2], make the following changes: Add a check for the native case (i.e., AER control via _OSC) Note that the previous "clear, reset, enable" order suggests that the reset might cause errors that we should ignore. After this commit, those errors (if any) will remain logged in the PCI_ERR_ROOT_STATUS register. [1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, 2020, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/14076 [2] https://lore.kernel.org/linux-pci/20201020162820.GA370938@bjorn-Precision-5520/ Link: https://lore.kernel.org/r/20201121001036.8560-2-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-08-07Merge tag 'pci-v5.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds1-32/+55
Pull PCI updates from Bjorn Helgaas: "Enumeration: - Fix pci_cfg_wait queue locking problem (Bjorn Helgaas) - Convert PCIe capability PCIBIOS errors to errno (Bolarinwa Olayemi Saheed) - Align PCIe capability and PCI accessor return values (Bolarinwa Olayemi Saheed) - Fix pci_create_slot() reference count leak (Qiushi Wu) - Announce device after early fixups (Tiezhu Yang) PCI device hotplug: - Make rpadlpar functions static (Wei Yongjun) Driver binding: - Add device even if driver attach failed (Rajat Jain) Virtualization: - xen: Remove redundant initialization of irq (Colin Ian King) IOMMU: - Add pci_pri_supported() to check device or associated PF (Ashok Raj) - Release IVRS table in AMD ACS quirk (Hanjun Guo) - Mark AMD Navi10 GPU rev 0x00 ATS as broken (Kai-Heng Feng) - Treat "external-facing" devices themselves as internal (Rajat Jain) MSI: - Forward MSI-X error code in pci_alloc_irq_vectors_affinity() (Piotr Stankiewicz) Error handling: - Clear PCIe Device Status errors only if OS owns AER (Jonathan Cameron) - Log correctable errors as warning, not error (Matt Jolly) - Use 'pci_channel_state_t' instead of 'enum pci_channel_state' (Luc Van Oostenryck) Peer-to-peer DMA: - Allow P2PDMA on AMD Zen and newer CPUs (Logan Gunthorpe) ASPM: - Add missing newline in sysfs 'policy' (Xiongfeng Wang) Native PCIe controllers: - Convert to devm_platform_ioremap_resource_byname() (Dejin Zheng) - Convert to devm_platform_ioremap_resource() (Dejin Zheng) - Remove duplicate error message from devm_pci_remap_cfg_resource() callers (Dejin Zheng) - Fix runtime PM imbalance on error (Dinghao Liu) - Remove dev_err() when handing an error from platform_get_irq() (Krzysztof Wilczyński) - Use pci_host_bridge.windows list directly instead of splicing in a temporary list for cadence, mvebu, host-common (Rob Herring) - Use pci_host_probe() instead of open-coding all the pieces for altera, brcmstb, iproc, mobiveil, rcar, rockchip, tegra, v3, versatile, xgene, xilinx, xilinx-nwl (Rob Herring) - Default host bridge parent device to the platform device (Rob Herring) - Use pci_is_root_bus() instead of tracking root bus number separately in aardvark, designware (imx6, keystone, designware-host), mobiveil, xilinx-nwl, xilinx, rockchip, rcar (Rob Herring) - Set host bridge bus number in pci_scan_root_bus_bridge() instead of each driver for aardvark, designware-host, host-common, mediatek, rcar, tegra, v3-semi (Rob Herring) - Move DT resource setup into devm_pci_alloc_host_bridge() (Rob Herring) - Set bridge map_irq and swizzle_irq to default functions; drivers that don't support legacy IRQs (iproc) need to undo this (Rob Herring) ARM Versatile PCIe controller driver: - Drop flag PCI_ENABLE_PROC_DOMAINS (Rob Herring) Cadence PCIe controller driver: - Use "dma-ranges" instead of "cdns,no-bar-match-nbits" property (Kishon Vijay Abraham I) - Remove "mem" from reg binding (Kishon Vijay Abraham I) - Fix cdns_pcie_{host|ep}_setup() error path (Kishon Vijay Abraham I) - Convert all r/w accessors to perform only 32-bit accesses (Kishon Vijay Abraham I) - Add support to start link and verify link status (Kishon Vijay Abraham I) - Allow pci_host_bridge to have custom pci_ops (Kishon Vijay Abraham I) - Add new *ops* for CPU addr fixup (Kishon Vijay Abraham I) - Fix updating Vendor ID and Subsystem Vendor ID register (Kishon Vijay Abraham I) - Use bridge resources for outbound window setup (Rob Herring) - Remove private bus number and range storage (Rob Herring) Cadence PCIe endpoint driver: - Add MSI-X support (Alan Douglas) HiSilicon PCIe controller driver: - Remove non-ECAM HiSilicon hip05/hip06 driver (Rob Herring) Intel VMD host bridge driver: - Use Shadow MEMBAR registers for QEMU/KVM guests (Jon Derrick) Loongson PCIe controller driver: - Use DECLARE_PCI_FIXUP_EARLY for bridge_class_quirk() (Tiezhu Yang) Marvell Aardvark PCIe controller driver: - Indicate error in 'val' when config read fails (Pali Rohár) - Don't touch PCIe registers if no card connected (Pali Rohár) Marvell MVEBU PCIe controller driver: - Setup BAR0 in order to fix MSI (Shmuel Hazan) Microsoft Hyper-V host bridge driver: - Fix a timing issue which causes kdump to fail occasionally (Wei Hu) - Make some functions static (Wei Yongjun) NVIDIA Tegra PCIe controller driver: - Revert tegra124 raw_violation_fixup (Nicolas Chauvet) - Remove PLL power supplies (Thierry Reding) Qualcomm PCIe controller driver: - Change duplicate PCI reset to phy reset (Abhishek Sahu) - Add missing ipq806x clocks in PCIe driver (Ansuel Smith) - Add missing reset for ipq806x (Ansuel Smith) - Add ext reset (Ansuel Smith) - Use bulk clk API and assert on error (Ansuel Smith) - Add support for tx term offset for rev 2.1.0 (Ansuel Smith) - Define some PARF params needed for ipq8064 SoC (Ansuel Smith) - Add ipq8064 rev2 variant (Ansuel Smith) - Support PCI speed set for ipq806x (Sham Muthayyan) Renesas R-Car PCIe controller driver: - Use devm_pci_alloc_host_bridge() (Rob Herring) - Use struct pci_host_bridge.windows list directly (Rob Herring) - Convert rcar-gen2 to use modern host bridge probe functions (Rob Herring) TI J721E PCIe driver: - Add TI J721E PCIe host and endpoint driver (Kishon Vijay Abraham I) Xilinx Versal CPM PCIe controller driver: - Add Versal CPM Root Port driver and YAML schema (Bharat Kumar Gogada) MicroSemi Switchtec management driver: - Add missing __iomem and __user tags to fix sparse warnings (Logan Gunthorpe) Miscellaneous: - Replace http:// links with https:// (Alexander A. Klimov) - Replace lkml.org, spinics, gmane with lore.kernel.org (Bjorn Helgaas) - Remove unused pci_lost_interrupt() (Heiner Kallweit) - Move PCI_VENDOR_ID_REDHAT definition to pci_ids.h (Huacai Chen) - Fix kerneldoc warnings (Krzysztof Kozlowski)" * tag 'pci-v5.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (113 commits) PCI: Fix kerneldoc warnings PCI: xilinx-cpm: Add Versal CPM Root Port driver PCI: xilinx-cpm: Add YAML schemas for Versal CPM Root Port PCI: Set bridge map_irq and swizzle_irq to default functions PCI: Move DT resource setup into devm_pci_alloc_host_bridge() PCI: rcar-gen2: Convert to use modern host bridge probe functions PCI: Remove dev_err() when handing an error from platform_get_irq() MAINTAINERS: Add Kishon Vijay Abraham I for TI J721E SoC PCIe misc: pci_endpoint_test: Add J721E in pci_device_id table PCI: j721e: Add TI J721E PCIe driver PCI: switchtec: Add missing __iomem tag to fix sparse warnings PCI: switchtec: Add missing __iomem and __user tags to fix sparse warnings PCI: rpadlpar: Make functions static PCI/P2PDMA: Allow P2PDMA on AMD Zen and newer CPUs PCI: Release IVRS table in AMD ACS quirk PCI: Announce device after early fixups PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken PCI: Remove unused pci_lost_interrupt() dt-bindings: PCI: Add EP mode dt-bindings for TI's J721E SoC dt-bindings: PCI: Add host mode dt-bindings for TI's J721E SoC ...
2020-08-05Merge branch 'pci/misc'Bjorn Helgaas1-3/+8
- Convert PCIe capability PCIBIOS errors to errno (Bolarinwa Olayemi Saheed) - Align PCIe capability and PCI accessor return values (Bolarinwa Olayemi Saheed) - Replace http:// links with https:// (Alexander A. Klimov) - Replace lkml.org, spinics, gmane with lore.kernel.org (Bjorn Helgaas) - Update panic message to mention kzalloc(), not kmalloc() (Liao Pingfang) - Move PCI_VENDOR_ID_REDHAT definition to pci_ids.h (Huacai Chen) - Remove unused pci_lost_interrupt() (Heiner Kallweit) * pci/misc: PCI: Remove unused pci_lost_interrupt() PCI: Move PCI_VENDOR_ID_REDHAT definition to pci_ids.h PCI: Fix error in panic message PCI: Replace lkml.org, spinics, gmane with lore.kernel.org PCI: Replace http:// links with https:// PCI: Align PCIe capability and PCI accessor return values PCI: Convert PCIe capability PCIBIOS errors to errno
2020-07-22PCI/ERR: Clear PCIe Device Status errors only if OS owns AERJonathan Cameron1-1/+2
pcie_clear_device_status() resets the error bits in the PCIe Device Status Register (PCI_EXP_DEVSTA). Previously we did this unconditionally, but on ACPI systems, the _OSC AER bit negotiates control of the AER capability. Per sec 4.5.1 of the System Firmware Intermediary _OSC and DPC Updates ECN [1], this bit also covers other error enable/status bits including the following: Correctable Error Reporting Enable Non-Fatal Error Reporting Enable Fatal Error Reporting Enable Unsupported Request Reporting Enable These bits are all in the PCIe Device Control register (the ECN omitted "Reporting", but I think that's a typo), so by implication the _OSC AER bit also applies to the error status bits in the PCIe Device Status register: Correctable Error Detected Non-Fatal Error Detected Fatal Error Detected Unsupported Request Detected Clear the PCIe Device Status error bits only when the OS controls the AER capability and related error enable/status bits. If platform firmware controls the AER capability, firmware is responsible for clearing these bits. One call path leading here is: ghes_do_proc ghes_handle_aer aer_recover_queue schedule_work(&aer_recover_work) ... aer_recover_work_func pcie_do_recovery pcie_clear_device_status [1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, 2020, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/14076 [bhelgaas: commit log, move test from pcie_clear_device_status() to callers] Link: https://lore.kernel.org/r/20200622113523.891666-1-Jonathan.Cameron@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-07-22PCI/ERR: Rename pci_aer_clear_device_status() to pcie_clear_device_status()Bjorn Helgaas1-9/+1
pci_aer_clear_device_status() clears the error bits in the PCIe Device Status Register (PCI_EXP_DEVSTA). Every PCIe device has this register, regardless of whether it supports AER. Rename pci_aer_clear_device_status() to pcie_clear_device_status() to make clear that it is PCIe-specific but not AER-specific. Move it to drivers/pci/pci.c, again since it's not AER-specific. No functional change intended. Link: https://lore.kernel.org/r/20200717195619.766662-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-07-16treewide: Remove uninitialized_var() usageKees Cook1-1/+1
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-07PCI/AER: Log correctable errors as warning, not errorMatt Jolly1-10/+15
PCIe correctable errors are recovered by hardware with no need for software intervention (PCIe r5.0, sec 6.2.2.1). Reduce the log level of correctable errors from KERN_ERR to KERN_WARNING. The bug reports below are for correctable error logging. This doesn't fix the cause of those reports, but it may make the messages less alarming. [bhelgaas: commit log, use pci_printk() to avoid code duplication] Link: https://bugzilla.kernel.org/show_bug.cgi?id=201517 Link: https://bugzilla.kernel.org/show_bug.cgi?id=196183 Link: https://lore.kernel.org/r/20200618155511.16009-1-Kangie@footclan.ninja Signed-off-by: Matt Jolly <Kangie@footclan.ninja> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-07-07PCI/AER: Simplify __aer_print_error()Bjorn Helgaas1-14/+34
aer_correctable_error_string[] and aer_uncorrectable_error_string[] have descriptions of AER error status bits. Add NULL entries to these tables so all entries for bits 0-31 are defined. Then we don't have to check for ARRAY_SIZE() when decoding a status word, which simplifies __aer_print_error(). Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-06-26PCI: Convert PCIe capability PCIBIOS errors to errnoBolarinwa Olayemi Saheed1-3/+8
The PCI config accessors (pci_read_config_word(), et al) return PCIBIOS_SUCCESSFUL (zero) or positive error values like PCIBIOS_FUNC_NOT_SUPPORTED. The PCIe capability accessors (pcie_capability_read_word(), et al) similarly return PCIBIOS errors, but some callers assume they return generic errno values like -EINVAL. For example, the Myri-10G probe function returns a positive PCIBIOS error if the pcie_capability_clear_and_set_word() in pcie_set_readrq() fails: myri10ge_probe status = pcie_set_readrq return pcie_capability_clear_and_set_word if (status) return status A positive return from a PCI driver probe function would cause a "Driver probe function unexpectedly returned" warning from local_pci_probe() instead of the desired probe failure. Convert PCIBIOS errors to generic errno for all callers of: pcie_capability_read_word pcie_capability_read_dword pcie_capability_write_word pcie_capability_write_dword pcie_capability_set_word pcie_capability_set_dword pcie_capability_clear_word pcie_capability_clear_dword pcie_capability_clear_and_set_word pcie_capability_clear_and_set_dword that check the return code for anything other than zero. [bhelgaas: commit log, squash together] Suggested-by: Bjorn Helgaas <bjorn@helgaas.com> Link: https://lore.kernel.org/r/20200615073225.24061-1-refactormyself@gmail.com Signed-off-by: Bolarinwa Olayemi Saheed <refactormyself@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-06-01PCI/AER: Use "aer" variable for capability offsetBjorn Helgaas1-95/+84
Previously we used "pos" or "aer_pos" for the offset of the AER Capability. Use "aer" consistently and initialize it the same way everywhere. No functional change intended. Link: https://lore.kernel.org/r/20200529230915.GA479883@bjorn-Precision-5520 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2020-06-01PCI/AER: Remove redundant dev->aer_cap checksKuppuswamy Sathyanarayanan1-10/+2
pcie_aer_get_firmware_first() checks dev->aer_cap, so we can remove redundant dev->aer_cap checks in the callers. Link: https://lore.kernel.org/r/d5ccc7a060ec9cdc234bdae7df8a0a4410f13f42.1590534843.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-06-01PCI/AER: Remove redundant pci_is_pcie() checksKuppuswamy Sathyanarayanan1-9/+0
AER is a PCIe Extended Capability, so dev->aer_cap will only be set for PCIe devices. Remove redundant pci_is_pcie() checks. Link: https://lore.kernel.org/r/361c622eabe5b845b8092e0bec04a3a2c262cb38.1590534843.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-06-01PCI/AER: Remove HEST/FIRMWARE_FIRST parsing for AER ownershipKuppuswamy Sathyanarayanan1-107/+11
Commit c100beb9ccfb ("PCI/AER: Use only _OSC to determine AER ownership") removed the use of HEST in determining AER ownership, but the AER driver still used HEST to verify AER ownership in some of its APIs. Per the ACPI spec v6.3, sec 18.3.2.4, some HEST table entries contain a FIRMWARE_FIRST bit, but that bit does not tell us anything about ownership of the AER capability. Remove parsing of HEST to look for FIRMWARE_FIRST. Add pcie_aer_is_native() for the places that need to know whether the OS owns the AER capability. [bhelgaas: commit log, reorder patch, remove unused __aer_firmware_first] Link: https://lore.kernel.org/r/9a37f53a4e6ff4942ff8e18dbb20b00e16c47341.1590534843.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-04-30PCI/AER: Use only _OSC to determine AER ownershipAlexandru Gagniuc1-25/+1
Per the PCI Firmware spec, r3.2, sec 4.5.1, the OS can request control of AER via bit 3 of the _OSC Control Field. In the returned value of the Control Field: The firmware sets [bit 3] to 1 to grant control over PCI Express Advanced Error Reporting. ... after control is transferred to the operating system, firmware must not modify the Advanced Error Reporting Capability. If control of this feature was requested and denied or was not requested, firmware returns this bit set to 0. Previously the pci_root driver looked at the HEST FIRMWARE_FIRST bit to determine whether to request ownership of the AER Capability. This was based on ACPI spec v6.3, sec 18.3.2.4, and similar sections, which say things like: Bit [0] - FIRMWARE_FIRST: If set, indicates that system firmware will handle errors from this source first. Bit [1] - GLOBAL: If set, indicates that the settings contained in this structure apply globally to all PCI Express Devices. These ACPI references don't say anything about ownership of the AER Capability. Remove use of the FIRMWARE_FIRST bit and rely only on the _OSC bit to determine whether we have control of the AER Capability. Link: https://lore.kernel.org/r/20181115231605.24352-1-mr.nuke.me@gmail.com/ v1 Link: https://lore.kernel.org/r/20190326172343.28946-1-mr.nuke.me@gmail.com/ v2 Link: https://lore.kernel.org/r/67af2931705bed9a588b5a39d369cb70b9942190.1587925636.git.sathyanarayanan.kuppuswamy@linux.intel.com [bhelgaas: commit log, note: Alex posted this identical patch 18 months ago, and I failed to apply it then, so I made him the author, added links to his postings, and added his Signed-off-by] Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
2020-03-28PCI/AER: Rationalize error status register clearingKuppuswamy Sathyanarayanan1-4/+4
The AER interfaces to clear error status registers were a confusing mess: - pci_cleanup_aer_uncorrect_error_status() cleared non-fatal errors from the Uncorrectable Error Status register. - pci_aer_clear_fatal_status() cleared fatal errors from the Uncorrectable Error Status register. - pci_cleanup_aer_error_status_regs() cleared the Root Error Status register (for Root Ports), the Uncorrectable Error Status register, and the Correctable Error Status register. Rename them to make them consistent: From To ---------------------------------------- ------------------------------- pci_cleanup_aer_uncorrect_error_status() pci_aer_clear_nonfatal_status() pci_aer_clear_fatal_status() pci_aer_clear_fatal_status() pci_cleanup_aer_error_status_regs() pci_aer_clear_status() Since pci_cleanup_aer_error_status_regs() (renamed to pci_aer_clear_status()) is only used within drivers/pci/, move the declaration from <linux/aer.h> to drivers/pci/pci.h. [bhelgaas: commit log, add renames] Link: https://lore.kernel.org/r/d1310a75dc3d28f7e8da4e99c45fbd3e60fe238e.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error StatusKuppuswamy Sathyanarayanan1-4/+18
Per the SFI _OSC and DPC Updates ECN [1] implementation note flowchart, the OS seems to be expected to clear AER status even if it doesn't have ownership of the AER capability. Unlike the DPC capability, where a DPC ECN [2] specifies a window when the OS is allowed to access DPC registers even if it doesn't have ownership, there is no clear model for AER. Add pci_aer_raw_clear_status() to clear the AER error status registers unconditionally. This is intended for use only by the EDR path (see [2]). [1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, 2020, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/14076 [2] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/12888 [bhelgaas: changelog] Link: https://lore.kernel.org/r/c19ad28f3633cce67448609e89a75635da0da07d.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-03-28PCI/ERR: Remove service dependency in pcie_do_recovery()Kuppuswamy Sathyanarayanan1-7/+5
Previously we passed the PCIe service type parameter to pcie_do_recovery(), where reset_link() looked up the underlying pci_port_service_driver and its .reset_link() function pointer. Instead of using this roundabout way, we can just pass the driver-specific .reset_link() callback function when calling pcie_do_recovery() function. This allows us to call pcie_do_recovery() from code that is not a PCIe port service driver, e.g., Error Disconnect Recover (EDR) support. Remove pcie_port_find_service() and pcie_port_service_driver.reset_link since they are now unused. Link: https://lore.kernel.org/r/60e02b87b526cdf2930400059d98704bf0a147d1.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2020-01-23PCI/AER: Initialize aer_fifoDongdong Liu1-0/+1
Previously we did not call INIT_KFIFO() for aer_fifo. This leads to kfifo_put() sometimes returning 0 (queue full) when in fact it is not. It is easy to reproduce the problem by using aer-inject: $ aer-inject -s :82:00.0 multiple-corr-nonfatal The content of the multiple-corr-nonfatal file is as below: AER COR RCVR HL 0 1 2 3 AER UNCOR POISON_TLP HL 4 5 6 7 Fixes: 27c1ce8bbed7 ("PCI/AER: Use kfifo for tracking events instead of reimplementing it") Link: https://lore.kernel.org/r/1579767991-103898-1-git-send-email-liudongdong3@huawei.com Signed-off-by: Dongdong Liu <liudongdong3@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-18PCI/AER: Fix kernel-doc warningsAndy Shevchenko1-1/+3
Kernel-doc validator complains: aer.c:207: warning: Function parameter or member 'str' not described in 'pcie_ecrc_get_policy' aer.c:1209: warning: Function parameter or member 'irq' not described in 'aer_isr' aer.c:1209: warning: Function parameter or member 'context' not described in 'aer_isr' aer.c:1209: warning: Excess function parameter 'work' description in 'aer_isr' Fix the above accordingly. Link: https://lore.kernel.org/r/20190827151823.75312-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2019-10-18PCI/AER: Use for_each_set_bit() to simplify codeAndy Shevchenko1-11/+8
Simplify error counting code by using for_each_set_bit() library function. Link: https://lore.kernel.org/r/20190827151823.75312-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2019-10-18PCI/AER: Add PoisonTLPBlocked to Uncorrectable error countersRajat Jain1-1/+2
The elements in the aer_uncorrectable_error_string[] refer to the bit names in Uncorrectable Error Status Register. Add PoisonTLPBlocked, which was added in PCIe r3.1, sec 7.10.2. Link: https://lore.kernel.org/r/20190827222145.32642-1-rajatja@google.com Signed-off-by: Rajat Jain <rajatja@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-10-18PCI/AER: Save AER Capability for suspend/resumePatel, Mayurkumar1-2/+60
Previously we did not save and restore the AER configuration on suspend/resume, so the configuration may be lost after resume. Save the AER configuration during suspend and restore it during resume. [bhelgaas: commit log] Link: https://lore.kernel.org/r/92EBB4272BF81E4089A7126EC1E7B28492C3B007@IRSMSX101.ger.corp.intel.com Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-05-09PCI/AER: Log messages with pci_dev, not pcie_deviceFrederick Lawler1-7/+12
Log messages with pci_dev, not pcie_device. Factor out common message prefixes with dev_fmt(). Example output change: - aer 0000:00:00.0:pci002: AER enabled with IRQ ... + pcieport 0000:00:00.0: AER: enabled with IRQ ... Link: https://lore.kernel.org/lkml/20190509141456.223614-5-helgaas@kernel.org Signed-off-by: Frederick Lawler <fred@fredlawl.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2019-05-09PCI/AER: Replace dev_printk(KERN_DEBUG) with dev_info()Frederick Lawler1-8/+5
Replace dev_printk(KERN_DEBUG) with dev_info() or dev_err() to be more consistent with other logging. These could be converted to dev_dbg(), but that depends on CONFIG_DYNAMIC_DEBUG and DEBUG, and we want most of these messages to *always* be in the dmesg log. Also remove a redundant kzalloc() failure message. Link: https://lore.kernel.org/lkml/20190509141456.223614-2-helgaas@kernel.org Signed-off-by: Frederick Lawler <fred@fredlawl.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
2019-01-29PCI/AER: Use match_string() helper to simplify the codeAndy Shevchenko1-6/+3
match_string() returns the array index of a matching string. Use it instead of the open-coded implementation. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-12-14PCI/AER: Queue one GHES event, not several uninitialized onesYanjiang Jin1-1/+1
ecae65e133f2 ("PCI/AER: Use kfifo_in_spinlocked() to insert locked elements") replaced kfifo_put() with kfifo_in_spinlocked(), but passed the *size* of the queue entry, where kfifo_in_spinlocked() expects the *number* of entries to be copied. We want to insert only one element into kfifo, not "sizeof(entry) = 16". Without this patch, we would get 15 uninitialized elements. Fixes: ecae65e133f2 ("PCI/AER: Use kfifo_in_spinlocked() to insert locked elements") Signed-off-by: Yanjiang Jin <yanjiang.jin@hxt-semitech.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Keith Busch <keith.busch@intel.com>
2018-10-18PCI/AER: Abstract AER interrupt handlingKeith Busch1-2/+1
The aer_inject module was directly calling aer_irq(). This required the AER driver export its private IRQ handler for no other reason than to support error injection. A driver should not have to expose its private interfaces, so use the IRQ subsystem to route injection to the AER driver, and make aer_irq() a private interface. This provides additional benefits: First, directly calling the IRQ handler bypassed the IRQ subsytem so the injection wasn't really synthesizing what happens if a shared AER interrupt occurs. The error injection had to provide the callback data directly, which may be racing with a removal that is freeing that structure. The IRQ subsystem can handle that race. Finally, using the IRQ subsystem automatically reacts to threaded IRQs, keeping the error injection abstracted from that implementation detail. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-18PCI/AER: Use managed resource allocationsKeith Busch1-12/+5
Use the managed device resource allocations for the service data so the AER driver doesn't need to manage it, further simplifying this driver. Link: https://lore.kernel.org/linux-pci/20180918235848.26694-12-keith.busch@intel.com Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-08PCI/AER: Use threaded IRQ for bottom halfKeith Busch1-47/+13
The threaded IRQ is naturally single threaded as desired, so use that to simplify the AER bottom half handler. Since the root port structure has much less to do now, remove the rpc construction helper routine. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-08PCI/AER: Use kfifo_in_spinlocked() to insert locked elementsKeith Busch1-4/+2
Use the recommended kernel API for writing to a concurrently-accessed kfifo. No functional change here. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-08PCI/AER: Use kfifo for tracking events instead of reimplementing itKeith Busch1-77/+11
The kernel provides a generic FIFO implementation, so no need to reinvent that capability in a driver. Replace the AER-specific implementation with the kernel-provided kfifo. Since the interrupt handler producer and work queue consumer run single threaded, there is no need for additional locking, so remove that lock, too. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-08PCI/AER: Remove error source from AER struct aer_rpcKeith Busch1-17/+16
The AER struct aer_rpc was carrying a copy of the error source simply as a temperary variable. Remove that from the structure and use a stack variable for the purpose. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-10-08PCI/AER: Remove unused aer_error_resume()Keith Busch1-13/+0
The error recovery callbacks are only run on child devices. A Root Port is never a child device, so this error resume callback was never invoked. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>