linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-09-01	vfio/pci: Simplify the is_intx/msi/msix/etc defines	Jason Gunthorpe	1	-1/+1
	Only three of these are actually used, simplify to three inline functions, and open code the if statement in vfio_pci_config.c. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/3-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01	vfio/pci: Split linux/vfio_pci_core.h	Jason Gunthorpe	1	-1/+1
	The header in include/linux should have only the exported interface for other vfio_pci modules to use. Internal definitions for vfio_pci.ko should be in a "priv" header along side the .c files. Move the internal declarations out of vfio_pci_core.h. They either move to vfio_pci_priv.h or to the C file that is the only user. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/1-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-08-01	vfio/pci: fix the wrong word	Bo Liu	1	-1/+1
	This patch fixes a wrong word in comment. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220801013918.2520-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-06	vfio/pci: fix the wrong word	Bo Liu	1	-1/+1
	This patch fixes a wrong word in comment. Signed-off-by: Bo Liu <liubo03@inspur.com> Link: https://lore.kernel.org/r/20220704023649.3913-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18	vfio/pci: Virtualize PME related registers bits and initialize to zero	Abhishek Sahu	1	-1/+32
	If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18	vfio/pci: Invalidate mmaps and block the access in D3hot power state	Abhishek Sahu	1	-2/+21
	According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Introduce vfio_pci_core.ko	Max Gurtovoy	1	-1/+1
	Now that vfio_pci has been split into two source modules, one focusing on the "struct pci_driver" (vfio_pci.c) and a toolbox library of code (vfio_pci_core.c), complete the split and move them into two different kernel modules. As before vfio_pci.ko continues to present the same interface under sysfs and this change will have no functional impact. Splitting into another module and adding exports allows creating new HW specific VFIO PCI drivers that can implement device specific functionality, such as VFIO migration interfaces or specialized device requirements. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-14-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Rename vfio_pci_device to vfio_pci_core_device	Max Gurtovoy	1	-34/+34
	This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. The new vfio_pci_core_device structure will be the main structure of the core driver and later on vfio_pci_device structure will be the main structure of the generic vfio_pci driver. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-08-26	vfio/pci: Rename vfio_pci_private.h to vfio_pci_core.h	Max Gurtovoy	1	-1/+1
	This is a preparation patch for separating the vfio_pci driver to a subsystem driver and a generic pci driver. This patch doesn't change any logic. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20210826103912.128972-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-05-24	vfio/pci: Fix error return code in vfio_ecap_init()	Zhen Lei	1	-1/+1
	The error code returned from vfio_ext_cap_len() is stored in 'len', not in 'ret'. Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Message-Id: <20210515020458.6771-1-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06	vfio/pci: fix a couple of spelling mistakes	Zhen Lei	1	-1/+1
	There are several spelling mistakes, as follows: thru ==> through presense ==> presence Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-4-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-22	Merge branches 'v5.10/vfio/bardirty', 'v5.10/vfio/dma_avail', 'v5.10/vfio/misc', 'v5.10/vfio/no-cmd-mem' and 'v5.10/vfio/yan_zhao_fixes' into v5.10/vfio/next	Alex Williamson	1	-10/+14

2020-09-22	vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn	Matthew Rosato	1	-10/+14
	While it is true that devices with is_virtfn=1 will have a Memory Space Enable bit that is hard-wired to 0, this is not the only case where we see this behavior -- For example some bare-metal hypervisors lack Memory Space Enable bit emulation for devices not setting is_virtfn (s390). Fix this by instead checking for the newly-added no_command_memory bit which directly denotes the need for PCI_COMMAND_MEMORY emulation in vfio. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-09-21	vfio/pci: Don't regenerate vconfig for all BARs if !bardirty	Zenghui Yu	1	-0/+3
	Now we regenerate vconfig for all the BARs via vfio_bar_fixup(), every time any offset of any of them are read. Though BARs aren't re-read regularly, the regeneration can be avoided if no BARs had been written since they were last read, in which case vdev->bardirty is false. Let's return immediately in vfio_bar_fixup() if bardirty is false. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-25	vfio/pci: Fix SR-IOV VF handling with MMIO blocking	Alex Williamson	1	-1/+16
	SR-IOV VFs do not implement the memory enable bit of the command register, therefore this bit is not set in config space after pci_enable_device(). This leads to an unintended difference between PF and VF in hand-off state to the user. We can correct this by setting the initial value of the memory enable bit in our virtualized config space. There's really no need however to ever fault a user on a VF though as this would only indicate an error in the user's management of the enable bit, versus a PF where the same access could trigger hardware faults. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-26	Merge branches 'v5.8/vfio/alex-block-mmio-v3', 'v5.8/vfio/alex-zero-cap-v2' and 'v5.8/vfio/qian-leak-fixes' into v5.8/vfio/next	Alex Williamson	1	-9/+41

2020-05-18	vfio/pci: fix memory leaks in alloc_perm_bits()	Qian Cai	1	-2/+5
	vfio_pci_disable() calls vfio_config_free() but forgets to call free_perm_bits() resulting in memory leaks, unreferenced object 0xc000000c4db2dee0 (size 16): comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s) hex dump (first 16 bytes): 00 00 ff 00 ff ff ff ff ff ff ff ff ff ff 00 00 ................ backtrace: [<00000000a6a4552d>] alloc_perm_bits+0x58/0xe0 [vfio_pci] [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci] init_pci_cap_msi_perm at drivers/vfio/pci/vfio_pci_config.c:1125 (inlined by) vfio_msi_cap_len at drivers/vfio/pci/vfio_pci_config.c:1180 (inlined by) vfio_cap_len at drivers/vfio/pci/vfio_pci_config.c:1241 (inlined by) vfio_cap_init at drivers/vfio/pci/vfio_pci_config.c:1468 (inlined by) vfio_config_init at drivers/vfio/pci/vfio_pci_config.c:1707 [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci] [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio] [<000000009e34c54f>] ksys_ioctl+0xd8/0x130 [<000000006577923d>] sys_ioctl+0x28/0x40 [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0 [<0000000008ea7dd5>] system_call_common+0xf0/0x278 unreferenced object 0xc000000c4db2e330 (size 16): comm "qemu-kvm", pid 4305, jiffies 4295020272 (age 3463.780s) hex dump (first 16 bytes): 00 ff ff 00 ff ff ff ff ff ff ff ff ff ff 00 00 ................ backtrace: [<000000004c71914f>] alloc_perm_bits+0x44/0xe0 [vfio_pci] [<00000000ac990549>] vfio_config_init+0xdf0/0x11b0 [vfio_pci] [<000000006db873a1>] vfio_pci_open+0x234/0x700 [vfio_pci] [<00000000630e1906>] vfio_group_fops_unl_ioctl+0x8e0/0xb84 [vfio] [<000000009e34c54f>] ksys_ioctl+0xd8/0x130 [<000000006577923d>] sys_ioctl+0x28/0x40 [<000000006d7b1cf2>] system_call_exception+0x114/0x1e0 [<0000000008ea7dd5>] system_call_common+0xf0/0x278 Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver") Signed-off-by: Qian Cai <cai@lca.pw> [aw: rolled in follow-up patch] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-18	vfio-pci: Mask cap zero	Alex Williamson	1	-1/+6
	The PCI Code and ID Assignment Specification changed capability ID 0 from reserved to a NULL capability in the v1.1 revision. The NULL capability is defined to include only the 16-bit capability header, ie. only the ID and next pointer. Unfortunately vfio-pci creates a map of config space, where ID 0 is used to reserve the standard type 0 header. Finding an actual capability with this ID therefore results in a bogus range marked in that map and conflicts with subsequent capabilities. As this seems to be a dummy capability anyway and we already support dropping capabilities, let's hide this one rather than delving into the potentially subtle dependencies within our map. Seen on an NVIDIA Tesla T4. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-18	vfio-pci: Invalidate mmaps and block MMIO access on disabled memory	Alex Williamson	1	-6/+30
	Accessing the disabled memory space of a PCI device would typically result in a master abort response on conventional PCI, or an unsupported request on PCI express. The user would generally see these as a -1 response for the read return data and the write would be silently discarded, possibly with an uncorrected, non-fatal AER error triggered on the host. Some systems however take it upon themselves to bring down the entire system when they see something that might indicate a loss of data, such as this discarded write to a disabled memory space. To avoid this, we want to try to block the user from accessing memory spaces while they're disabled. We start with a semaphore around the memory enable bit, where writers modify the memory enable state and must be serialized, while readers make use of the memory region and can access in parallel. Writers include both direct manipulation via the command register, as well as any reset path where the internal mechanics of the reset may both explicitly and implicitly disable memory access, and manipulation of the MSI-X configuration, where the MSI-X vector table resides in MMIO space of the device. Readers include the read and write file ops to access the vfio device fd offsets as well as memory mapped access. In the latter case, we make use of our new vma list support to zap, or invalidate, those memory mappings in order to force them to be faulted back in on access. Our semaphore usage will stall user access to MMIO spaces across internal operations like reset, but the user might experience new behavior when trying to access the MMIO space while disabled via the PCI command register. Access via read or write while disabled will return -EIO and access via memory maps will result in a SIGBUS. This is expected to be compatible with known use cases and potentially provides better error handling capabilities than present in the hardware, while avoiding the more readily accessible and severe platform error responses that might otherwise occur. Fixes: CVE-2020-12888 Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-10-14	PCI: Add PCI_STD_NUM_BARS for the number of standard BARs	Denis Efremov	1	-15/+17
	Code that iterates over all standard PCI BARs typically uses PCI_STD_RESOURCE_END. However, that requires the unusual test "i <= PCI_STD_RESOURCE_END" rather than something the typical "i < PCI_STD_NUM_BARS". Add a definition for PCI_STD_NUM_BARS and change loops to use the more idiomatic C style to help avoid fencepost errors. Link: https://lore.kernel.org/r/20190927234026.23342-1-efremov@linux.com Link: https://lore.kernel.org/r/20190927234308.23935-1-efremov@linux.com Link: https://lore.kernel.org/r/20190916204158.6889-3-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Sebastian Ott <sebott@linux.ibm.com> # arch/s390/ Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> # video/fbdev/ Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> # pci/controller/dwc/ Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> # scsi/pm8001/ Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi/pm8001/ Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # memstick/
2019-06-19	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500	Thomas Gleixner	1	-4/+1
	Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-22	vfio: Use dev_printk() when possible	Bjorn Helgaas	1	-16/+13
	Use dev_printk() when possible to make messages consistent with other device-related messages. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-18	vfio/pci: Restore device state on PM transition	Alex Williamson	1	-1/+1
	PCI core handles save and restore of device state around reset, but when using pci_set_power_state() we can unintentionally trigger a soft reset of the device, where PCI core only restores the BAR state. If we're using vfio-pci's idle D3 support to try to put devices into low power when unused, this might trigger a reset when the device is woken for use. Also power state management by the user, or within a guest, can put the device into D3 power state with potentially limited ability to restore the device if it should undergo a reset. The PCI spec does not define the extent of a soft reset and many devices reporting soft reset on D3->D0 transition do not undergo a PCI config space reset. It's therefore assumed safe to unconditionally restore the remainder of the state if the device indicates soft reset support, even on a user initiated wakeup. Implement a wrapper in vfio-pci to tag devices reporting PM reset support, save their state on transitions into D3 and restore on transitions back to D0. Reported-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-09-25	vfio/pci: Mask buggy SR-IOV VF INTx support	Alex Williamson	1	-0/+27
	The SR-IOV spec requires that VFs must report zero for the INTx pin register as VFs are precluded from INTx support. It's much easier for the host kernel to understand whether a device is a VF and therefore whether a non-zero pin register value is bogus than it is to do the same in userspace. Override the INTx count for such devices and virtualize the pin register to provide a consistent view of the device to the user. As this is clearly a spec violation, warn about it to support hardware validation, but also provide a known whitelist as it doesn't do much good to continue complaining if the hardware vendor doesn't plan to fix it. Known devices with this issue: 8086:270c Tested-by: Gage Eads <gage.eads@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-09-25	vfio/pci: Fix potential memory leak in vfio_msi_cap_len	Li Qiang	1	-1/+3
	Free allocated vdev->msi_perm in error path. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-02	vfio/pci: Virtualize Maximum Read Request Size	Alex Williamson	1	-3/+26
	MRRS defines the maximum read request size a device is allowed to make. Drivers will often increase this to allow more data transfer with a single request. Completions to this request are bound by the MPS setting for the bus. Aside from device quirks (none known), it doesn't seem to make sense to set an MRRS value less than MPS, yet this is a likely scenario given that user drivers do not have a system-wide view of the PCI topology. Virtualize MRRS such that the user can set MRRS >= MPS, but use MPS as the floor value that we'll write to hardware. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-02	vfio/pci: Virtualize Maximum Payload Size	Alex Williamson	1	-2/+4
	With virtual PCI-Express chipsets, we now see userspace/guest drivers trying to match the physical MPS setting to a virtual downstream port. Of course a lone physical device surrounded by virtual interconnects cannot make a correct decision for a proper MPS setting. Instead, let's virtualize the MPS control register so that writes through to hardware are disallowed. Userspace drivers like QEMU assume they can write anything to the device and we'll filter out anything dangerous. Since mismatched MPS can lead to AER and other faults, let's add it to the kernel side rather than relying on userspace virtualization to handle it. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2017-07-27	vfio/pci: Fix handling of RC integrated endpoint PCIe capability size	Alex Williamson	1	-4/+9
	Root complex integrated endpoints do not have a link and therefore may use a smaller PCIe capability in config space than we expect when building our config map. Add a case for these to avoid reporting an erroneous overlap. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-15	Merge tag 'pci-v4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci	Linus Torvalds	1	-2/+0
	Pull PCI updates from Bjorn Helgaas: "PCI changes: - add support for PCI on ARM64 boxes with ACPI. We already had this for theoretical spec-compliant hardware; now we're adding quirks for the actual hardware (Cavium, HiSilicon, Qualcomm, X-Gene) - add runtime PM support for hotplug ports - enable runtime suspend for Intel UHCI that uses platform-specific wakeup signaling - add yet another host bridge registration interface. We hope this is extensible enough to subsume the others - expose device revision in sysfs for DRM - to avoid device conflicts, make sure any VF BAR updates are done before enabling the VF - avoid unnecessary link retrains for ASPM - allow INTx masking on Mellanox devices that support it - allow access to non-standard VPD for Chelsio devices - update Broadcom iProc support for PAXB v2, PAXC v2, inbound DMA, etc - update Rockchip support for max-link-speed - add NVIDIA Tegra210 support - add Layerscape LS1046a support - update R-Car compatibility strings - add Qualcomm MSM8996 support - remove some uninformative bootup messages" * tag 'pci-v4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (115 commits) PCI: Enable access to non-standard VPD for Chelsio devices (cxgb3) PCI: Expand "VPD access disabled" quirk message PCI: pciehp: Remove loading message PCI: hotplug: Remove hotplug core message PCI: Remove service driver load/unload messages PCI/AER: Log AER IRQ when claiming Root Port PCI/AER: Log errors with PCI device, not PCIe service device PCI/AER: Remove unused version macros PCI/PME: Log PME IRQ when claiming Root Port PCI/PME: Drop unused support for PMEs from Root Complex Event Collectors PCI: Move config space size macros to pci_regs.h x86/platform/intel-mid: Constify mid_pci_platform_pm PCI/ASPM: Don't retrain link if ASPM not possible PCI: iproc: Skip check for legacy IRQ on PAXC buses PCI: pciehp: Leave power indicator on when enabling already-enabled slot PCI: pciehp: Prioritize data-link event over presence detect PCI: rcar: Add gen3 fallback compatibility string for pcie-rcar PCI: rcar: Use gen2 fallback compatibility last PCI: rcar-gen2: Use gen2 fallback compatibility last PCI: rockchip: Move the deassert of pm/aclk/pclk after phy_init() ..
2016-12-12	PCI: Move config space size macros to pci_regs.h	Wang Sheng-Hui	1	-2/+0
	Move PCI configuration space size macros (PCI_CFG_SPACE_SIZE and PCI_CFG_SPACE_EXP_SIZE) from drivers/pci/pci.h to include/uapi/linux/pci_regs.h so they can be used by more drivers and eliminate duplicate definitions. [bhelgaas: Expand comment to include PCI-X details] Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-11-18	vfio/pci: Drop unnecessary pcibios_err_to_errno()	Cao jin	1	-5/+5
	As of commit d97ffe236894 ("PCI: Fix return value from pci_user_{read,write}_config_*()") it's unnecessary to call pcibios_err_to_errno() to fixup the return value from these functions. pcibios_err_to_errno() already does simple passthrough of -errno values, therefore no functional change is expected. [aw: changelog] Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-09-26	vfio-pci: Virtualize PCIe & AF FLR	Alex Williamson	1	-5/+77
	We use a BAR restore trick to try to detect when a user has performed a device reset, possibly through FLR or other backdoors, to put things back into a working state. This is important for backdoor resets, but we can actually just virtualize the "front door" resets provided via PCIe and AF FLR. Set these bits as virtualized + writable, allowing the default write to set them in vconfig, then we can simply check the bit, perform an FLR of our own, and clear the bit. We don't actually have the granularity in PCI to specify the type of reset we want to do, but generally devices don't implement both PCIe and AF FLR and we'll favor these over other types of reset, so we should generally lineup. We do test whether the device provides the requested FLR type to stay consistent with hardware capabilities though. This seems to fix several instance of devices getting into bad states with userspace drivers, like dpdk, running inside a VM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Rose <grose@lightfleet.com>
2016-08-29	vfio/pci: Fix typos in comments	Wei Jiangang	1	-4/+4
	Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-31	vfio/pci: Allow VPD short read	Alex Williamson	1	-1/+2
	The size of the VPD area is not necessarily 4-byte aligned, so a pci_vpd_read() might return less than 4 bytes. Zero our buffer and accept anything other than an error. Intel X710 NICs exercise this. Fixes: 4e1a635552d3 ("vfio/pci: Use kernel VPD access functions") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-05-19	vfio_pci: Test for extended capabilities if config space > 256 bytes	Alexey Kardashevskiy	1	-6/+11
	PCI-Express spec says that reading 4 bytes at offset 100h should return zero if there is no extended capability so VFIO reads this dword to know if there are extended capabilities. However it is not always possible to access the extended space so generic PCI code in pci_cfg_space_size_ext() checks if pci_read_config_dword() can read beyond 100h and if the check fails, it sets the config space size to 100h. VFIO does its own extended capabilities check by reading at offset 100h which may produce 0xffffffff which VFIO treats as the extended config space presense and calls vfio_ecap_init() which fails to parse capabilities (which is expected) but right before the exit, it writes zero at offset 100h which is beyond the buffer allocated for vdev->vconfig (which is 256 bytes) which leads to random memory corruption. This makes VFIO only check for the extended capabilities if the discovered config size is more than 256 bytes. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-04-28	vfio/pci: Add test for BAR restore	Alex Williamson	1	-1/+19
	If a device is reset without the memory or i/o bits enabled in the command register we may not detect it, potentially leaving the device without valid BAR programming. Add an additional test to check the BARs on each write to the command register. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-04-28	vfio/pci: Hide broken INTx support from user	Alex Williamson	1	-1/+8
	INTx masking has two components, the first is that we need the ability to prevent the device from continuing to assert INTx. This is provided via the DisINTx bit in the command register and is the only thing we can really probe for when testing if INTx masking is supported. The second component is that the device needs to indicate if INTx is asserted via the interrupt status bit in the device status register. With these two features we can generically determine if one of the devices we own is asserting INTx, signal the user, and mask the interrupt while the user services the device. Generally if one or both of these components is broken we resort to APIC level interrupt masking, which requires an exclusive interrupt since we have no way to determine the source of the interrupt in a shared configuration. This often makes it difficult or impossible to configure the system for userspace use of the device, for an interrupt mode that the user may not need. One possible configuration of broken INTx masking is that the DisINTx support is fully functional, but the interrupt status bit never signals interrupt assertion. In this case we do have the ability to prevent the device from asserting INTx, but lack the ability to identify the interrupt source. For this case we can simply pretend that the device lacks INTx support entirely, keeping DisINTx set on the physical device, virtualizing this bit for the user, and virtualizing the interrupt pin register to indicate no INTx support. We already support virtualization of the DisINTx bit and already virtualize the interrupt pin for platforms without INTx support. By tying these components together, setting DisINTx on open and reset, and identifying devices broken in this particular way, we can provide support for them w/o the handicap of APIC level INTx masking. Intel i40e (XL710/X710) 10/20/40GbE NICs have been identified as being broken in this specific way. We leave the vfio-pci.nointxmask option as a mechanism to bypass this support, enabling INTx on the device with all the requirements of APIC level masking. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: John Ronciak <john.ronciak@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
2016-02-22	vfio/pci: Expose shadow ROM as PCI option ROM	Alex Williamson	1	-3/+8
	Integrated graphics may have their ROM shadowed at 0xc0000 rather than implement a PCI option ROM. Make this ROM appear to the user using the ROM BAR. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-02-22	vfio/pci: Enable virtual register in PCI config space	Alex Williamson	1	-4/+30
	Typically config space for a device is mapped out into capability specific handlers and unassigned space. The latter allows direct read/write access to config space. Sometimes we know about registers living in this void space and would like an easy way to virtualize them, similar to how BAR registers are managed. To do this, create one more pseudo (fake) PCI capability to be handled as purely virtual space. Reads and writes are serviced entirely from virtual config space. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-11-09	vfio/pci: make an array larger	Dan Carpenter	1	-2/+2
	Smatch complains about a possible out of bounds error: drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init() error: buffer overflow 'pci_cap_length' 20 <= 20 The problem is that pci_cap_length[] was defined as large enough to hold "PCI_CAP_ID_AF + 1" elements. The code in vfio_cap_init() assumes it has PCI_CAP_ID_MAX + 1 elements. Originally, PCI_CAP_ID_AF and PCI_CAP_ID_MAX were the same but then we introduced PCI_CAP_ID_EA in commit f80b0ba95964 ("PCI: Add Enhanced Allocation register entries") so now the array is too small. Let's fix this by making the array size PCI_CAP_ID_MAX + 1. And let's make a similar change to pci_ext_cap_length[] for consistency. Also both these arrays can be made const. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-10-27	vfio/pci: Use kernel VPD access functions	Alex Williamson	1	-1/+69
	The PCI VPD capability operates on a set of window registers in PCI config space. Writing to the address register triggers either a read or write, depending on the setting of the PCI_VPD_ADDR_F bit within the address register. The data register provides either the source for writes or the target for reads. This model is susceptible to being broken by concurrent access, for which the kernel has adopted a set of access functions to serialize these registers. Additionally, commits like 932c435caba8 ("PCI: Add dev_flags bit to access VPD through function 0") and 7aa6ca4d39ed ("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate that VPD registers can be shared between functions on multifunction devices creating dependencies between otherwise independent devices. Fortunately it's quite easy to emulate the VPD registers, simply storing copies of the address and data registers in memory and triggering a VPD read or write on writes to the address register. This allows vfio users to avoid seeing spurious register changes from accesses on other devices and enables the use of shared quirks in the host kernel. We can theoretically still race with access through sysfs, but the window of opportunity is much smaller. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Mark Rustad <mark.d.rustad@intel.com>
2014-11-07	vfio: make vfio run on s390	Frank Blaschka	1	-0/+7
	add Kconfig switch to hide INTx add Kconfig switch to let vfio announce PCI BARs are not mapable Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2014-09-25	PCI/AER: Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND	Chen, Gong	1	-1/+1
	In PCIe r1.0, sec 5.10.2, bit 0 of the Uncorrectable Error Status, Mask, and Severity Registers was for "Training Error." In PCIe r1.1, sec 7.10.2, bit 0 was redefined to be "Undefined." Rename PCI_ERR_UNC_TRAIN to PCI_ERR_UNC_UND to reflect this change. No functional change. [bhelgaas: changelog] Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-05-30	vfio/pci: Fix sizing of DPA and THP express capabilities	Alex Williamson	1	-4/+3
	When sizing the TPH capability we store the register containing the table size into the 'dword' variable, but then use the uninitialized 'byte' variable to analyze the size. The table size is also actually reported as an N-1 value, so correct sizing to account for this. The round_up() for both TPH and DPA is unnecessary, remove it. Detected by Coverity: CID 714665 & 715156 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-12-17	PCI: Rename PCI_VC_PORT_REG1/2 to PCI_VC_PORT_CAP1/2	Alex Williamson	1	-6/+6
	These are set of two capability registers, it's pretty much given that they're registers, so reflect their purpose in the name. Suggested-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-09-04	vfio-pci: Test for extended config space	Alex Williamson	1	-3/+8
	Having PCIe/PCI-X capability isn't enough to assume that there are extended capabilities. Both specs define that the first capability header is all zero if there are no extended capabilities. Testing for this avoids an erroneous message about hiding capability 0x0 at offset 0x100. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-05-02	Merge tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio	Linus Torvalds	1	-78/+94
	Pull vfio updates from Alex Williamson: "Changes include extension to support PCI AER notification to userspace, byte granularity of PCI config space and access to unarchitected PCI config space, better protection around IOMMU driver accesses, default file mode fix, and a few misc cleanups." * tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio: vfio: Set container device mode vfio: Use down_reads to protect iommu disconnects vfio: Convert container->group_lock to rwsem PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register vfio-pci: Enable raw access to unassigned config space vfio-pci: Use byte granularity in config map vfio: make local function vfio_pci_intx_unmask_handler() static VFIO-AER: Vfio-pci driver changes for supporting AER VFIO: Wrapper for getting reference to vfio_device
2013-04-15	PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register	Yijing Wang	1	-5/+1
	Currently, we use pcie_flags_reg to cache PCI-E Capabilities Register, because PCI-E Capabilities Register bits are almost read-only. This patch use pcie_caps_reg() instead of another access PCI-E Capabilities Register. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2013-04-01	vfio-pci: Enable raw access to unassigned config space	Alex Williamson	1	-32/+46
	Devices like be2net hide registers between the gaps in capabilities and architected regions of PCI config space. Our choices to support such devices is to either build an ever growing and unmanageable white list or rely on hardware isolation to protect us. These registers are really no different than MMIO or I/O port space registers, which we don't attempt to regulate, so treat PCI config space in the same way. Reported-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
2013-04-01	vfio-pci: Use byte granularity in config map	Alex Williamson	1	-41/+47
	The config map previously used a byte per dword to map regions of config space to capabilities. Modulo a bug where we round the length of capabilities down instead of up, this theoretically works well and saves space so long as devices don't try to hide registers in the gaps between capabilities. Unfortunately they do exactly that so we need byte granularity on our config space map. Increase the allocation of the config map and split accesses at capability region boundaries. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>