aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2022-05-18vfio/pci: Virtualize PME related registers bits and initialize to zeroAbhishek Sahu1-1/+32
If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Change the PF power state to D0 before enabling VFsAbhishek Sahu1-0/+16
According to [PCIe v5 9.6.2] for PF Device Power Management States "The PF's power management state (D-state) has global impact on its associated VFs. If a VF does not implement the Power Management Capability, then it behaves as if it is in an equivalent power state of its associated PF. If a VF implements the Power Management Capability, the Device behavior is undefined if the PF is placed in a lower power state than the VF. Software should avoid this situation by placing all VFs in lower power state before lowering their associated PF's power state." From the vfio driver side, user can enable SR-IOV when the PF is in D3hot state. If VF does not implement the Power Management Capability, then the VF will be actually in D3hot state and then the VF BAR access will fail. If VF implements the Power Management Capability, then VF will assume that its current power state is D0 when the PF is D3hot and in this case, the behavior is undefined. To support PF power management, we need to create power management dependency between PF and its VF's. The runtime power management support may help with this where power management dependencies are supported through device links. But till we have such support in place, we can disallow the PF to go into low power state, if PF has VF enabled. There can be a case, where user first enables the VF's and then disables the VF's. If there is no user of PF, then the PF can put into D3hot state again. But with this patch, the PF will still be in D0 state after disabling VF's since detecting this case inside vfio_pci_core_sriov_configure() requires access to struct vfio_device::open_count along with its locks. But the subsequent patches related to runtime PM will handle this case since runtime PM maintains its own usage count. Also, vfio_pci_core_sriov_configure() can be called at any time (with and without vfio pci device user), so the power state change and SR-IOV enablement need to be protected with the required locks. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-3-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-18vfio/pci: Invalidate mmaps and block the access in D3hot power stateAbhishek Sahu1-2/+21
According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-2-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Change struct vfio_group::container_users to a non-atomic intJason Gunthorpe1-15/+13
Now that everything is fully locked there is no need for container_users to remain as an atomic, change it to an unsigned int. Use 'if (group->container)' as the test to determine if the container is present or not instead of using container_users. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/6-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Simplify the life cycle of the group FDJason Gunthorpe1-28/+24
Once userspace opens a group FD it is prevented from opening another instance of that same group FD until all the prior group FDs and users of the container are done. The first is done trivially by checking the group->opened during group FD open. However, things get a little weird if userspace creates a device FD and then closes the group FD. The group FD still cannot be re-opened, but this time it is because the group->container is still set and container_users is elevated by the device FD. Due to this mismatched lifecycle we have the vfio_group_try_dissolve_container() which tries to auto-free a container after the group FD is closed but the device FD remains open. Instead have the device FD hold onto a reference to the single group FD. This directly prevents vfio_group_fops_release() from being called when any device FD exists and makes the lifecycle model more understandable. vfio_group_try_dissolve_container() is removed as the only place a container is auto-deleted is during vfio_group_fops_release(). At this point the container_users is either 1 or 0 since all device FDs must be closed. Change group->opened to group->opened_file which points to the single struct file * that is open for the group. If the group->open_file is NULL then group->container == NULL. If all device FDs have closed then the group's notifier list must be empty. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/5-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Fully lock struct vfio_group::containerJason Gunthorpe1-26/+40
This is necessary to avoid various user triggerable races, for instance racing SET_CONTAINER/UNSET_CONTAINER: ioctl(VFIO_GROUP_SET_CONTAINER) ioctl(VFIO_GROUP_UNSET_CONTAINER) vfio_group_unset_container int users = atomic_cmpxchg(&group->container_users, 1, 0); // users == 1 container_users == 0 __vfio_group_unset_container(group); container = group->container; vfio_group_set_container() if (!atomic_read(&group->container_users)) down_write(&container->group_lock); group->container = container; up_write(&container->group_lock); down_write(&container->group_lock); group->container = NULL; up_write(&container->group_lock); vfio_container_put(container); /* woops we lost/leaked the new container */ This can then go on to NULL pointer deref since container == 0 and container_users == 1. Wrap all touches of container, except those on a performance path with a known open device, with the group_rwsem. The only user of vfio_group_add_container_user() holds the user count for a simple operation, change it to just hold the group_lock over the operation and delete vfio_group_add_container_user(). Containers now only gain a user when a device FD is opened. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/4-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Split up vfio_group_get_device_fd()Jason Gunthorpe1-23/+56
The split follows the pairing with the destroy functions: - vfio_group_get_device_fd() destroyed by close() - vfio_device_open() destroyed by vfio_device_fops_release() - vfio_device_assign_container() destroyed by vfio_group_try_dissolve_container() The next patch will put a lock around vfio_device_assign_container(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/3-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Change struct vfio_group::opened from an atomic to boolJason Gunthorpe1-19/+27
This is not a performance path, just use the group_rwsem to protect the value. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/2-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17vfio: Add missing locking for struct vfio_group::kvmJason Gunthorpe1-4/+15
Without locking userspace can trigger a UAF by racing KVM_DEV_VFIO_GROUP_DEL with VFIO_GROUP_GET_DEVICE_FD: CPU1 CPU2 ioctl(KVM_DEV_VFIO_GROUP_DEL) ioctl(VFIO_GROUP_GET_DEVICE_FD) vfio_group_get_device_fd open_device() intel_vgpu_open_device() vfio_register_notifier() vfio_register_group_notifier() blocking_notifier_call_chain(&group->notifier, VFIO_GROUP_NOTIFY_SET_KVM, group->kvm); set_kvm() group->kvm = NULL close() kfree(kvm) intel_vgpu_group_notifier() vdev->kvm = data [..] kvm_get_kvm(vgpu->kvm); // UAF! Add a simple rwsem in the group to protect the kvm while the notifier is using it. Note this doesn't fix the race internal to i915 where userspace can trigger two VFIO_GROUP_NOTIFY_SET_KVM's before we reach a consumer of vgpu->kvm and trigger this same UAF, it just makes the notifier self-consistent. Fixes: ccd46dbae77d ("vfio: support notifier chain in vfio_group") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/1-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-17kvm/vfio: Fix potential deadlock problem in vfioWan Jiabing1-1/+1
Fix following coccicheck warning: ./virt/kvm/vfio.c:258:1-7: preceding lock on line 236 If kvm_vfio_file_iommu_group() failed, code would goto err_fdput with mutex_lock acquired and then return ret. It might cause potential deadlock. Move mutex_unlock bellow err_fdput tag to fix it. Fixes: d55d9e7a45721 ("kvm/vfio: Store the struct file in the kvm_vfio_group") Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220517023441.4258-1-wanjiabing@vivo.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-16include/uapi/linux/vfio.h: Fix trivial typo - _IORW should be _IOWR insteadThomas Huth1-2/+2
There is no macro called _IORW, so use _IOWR in the comment instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20220516101202.88373-1-thuth@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio/pci: Use the struct file as the handle not the vfio_groupJason Gunthorpe3-75/+40
VFIO PCI does a security check as part of hot reset to prove that the user has permission to manipulate all the devices that will be impacted by the reset. Use a new API vfio_file_has_dev() to perform this security check against the struct file directly and remove the vfio_group from VFIO PCI. Since VFIO PCI was the last user of vfio_group_get_external_user() and vfio_group_put_external_user() remove it as well. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13kvm/vfio: Remove vfio_group from kvmJason Gunthorpe1-43/+8
None of the VFIO APIs take in the vfio_group anymore, so we can remove it completely. This has a subtle side effect on the enforced coherency tracking. The vfio_group_get_external_user() was holding on to the container_users which would prevent the iommu_domain and thus the enforced coherency value from changing while the group is registered with kvm. It changes the security proof slightly into 'user must hold a group FD that has a device that cannot enforce DMA coherence'. As opening the group FD, not attaching the container, is the privileged operation this doesn't change the security properties much. On the flip side it paves the way to changing the iommu_domain/container attached to a group at runtime which is something that will be required to support nested translation. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de>i Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm()Jason Gunthorpe3-18/+32
Just change the argument from struct vfio_group to struct file *. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent()Jason Gunthorpe3-13/+36
Instead of a general extension check change the function into a limited test if the iommu_domain has enforced coherency, which is the only thing kvm needs to query. Make the new op self contained by properly refcounting the container before touching it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio: Remove vfio_external_group_match_file()Jason Gunthorpe3-29/+1
vfio_group_fops_open() ensures there is only ever one struct file open for any struct vfio_group at any time: /* Do we need multiple instances of the group open? Seems not. */ opened = atomic_cmpxchg(&group->opened, 0, 1); if (opened) { vfio_group_put(group); return -EBUSY; Therefor the struct file * can be used directly to search the list of VFIO groups that KVM keeps instead of using the vfio_external_group_match_file() callback to try to figure out if the passed in FD matches the list or not. Delete vfio_external_group_match_file(). Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio: Change vfio_external_user_iommu_id() to vfio_file_iommu_group()Jason Gunthorpe3-33/+27
The only caller wants to get a pointer to the struct iommu_group associated with the VFIO group file. Instead of returning the group ID then searching sysfs for that string to get the struct iommu_group just directly return the iommu_group pointer already held by the vfio_group struct. It already has a safe lifetime due to the struct file kref, the vfio_group and thus the iommu_group cannot be destroyed while the group file is open. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13kvm/vfio: Store the struct file in the kvm_vfio_groupJason Gunthorpe1-30/+29
Following patches will change the APIs to use the struct file as the handle instead of the vfio_group, so hang on to a reference to it with the same duration of as the vfio_group. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13kvm/vfio: Move KVM_DEV_VFIO_GROUP_* ioctls into functionsJason Gunthorpe1-101/+124
To make it easier to read and change in following patches. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13vfio: Delete container_qJason Gunthorpe1-20/+0
Now that the iommu core takes care of isolation there is no race between driver attach and container unset. Once iommu_group_release_dma_owner() returns the device can immediately be re-used. Remove this mechanism. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-a1e8791d795b+6b-vfio_container_q_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-13iommu: iommu_group_claim_dma_owner() must always assign a domainJason Gunthorpe via iommu1-36/+91
Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a532b ("iommu: Add DMA ownership management interfaces") Reported-by: Qian Cai <quic_qiancai@quicinc.com> Tested-by: Baolu Lu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Co-developed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-05-11vfio/pci: Remove vfio_device_get_from_dev()Jason Gunthorpe5-70/+12
The last user of this function is in PCI callbacks that want to convert their struct pci_dev to a vfio_device. Instead of searching use the vfio_device available trivially through the drvdata. When a callback in the device_driver is called, the caller must hold the device_lock() on dev. The purpose of the device_lock is to prevent remove() from being called (see __device_release_driver), and allow the driver to safely interact with its drvdata without races. The PCI core correctly follows this and holds the device_lock() when calling error_detected (see report_error_detected) and sriov_configure (see sriov_numvfs_store). Further, since the drvdata holds a positive refcount on the vfio_device any access of the drvdata, under the device_lock(), from a driver callback needs no further protection or refcounting. Thus the remark in the vfio_device_get_from_dev() comment does not apply here, VFIO PCI drivers all call vfio_unregister_group_dev() from their remove callbacks under the device_lock() and cannot race with the remaining callers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio/pci: Have all VFIO PCI drivers store the vfio_pci_core_device in drvdataJason Gunthorpe4-9/+27
Having a consistent pointer in the drvdata will allow the next patch to make use of the drvdata from some of the core code helpers. Use a WARN_ON inside vfio_pci_core_register_device() to detect drivers that miss this. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio: Remove calls to vfio_group_add_container_user()Jason Gunthorpe1-63/+17
When the open_device() op is called the container_users is incremented and held incremented until close_device(). Thus, so long as drivers call functions within their open_device()/close_device() region they do not need to worry about the container_users. These functions can all only be called between open_device() and close_device(): vfio_pin_pages() vfio_unpin_pages() vfio_dma_rw() vfio_register_notifier() vfio_unregister_notifier() Eliminate the calls to vfio_group_add_container_user() and add vfio_assert_device_open() to detect driver mis-use. This causes the close_device() op to check device->open_count so always leave it elevated while calling the op. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio: Remove dead codeJason Gunthorpe2-162/+0
Now that callers have been updated to use the vfio_device APIs the driver facing group interface is no longer used, delete it: - vfio_group_get_external_user_from_dev() - vfio_group_pin_pages() - vfio_group_unpin_pages() - vfio_group_iommu_domain() -- Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11drm/i915/gvt: Change from vfio_group_(un)pin_pages to vfio_(un)pin_pagesJason Gunthorpe2-22/+6
Use the existing vfio_device versions of vfio_(un)pin_pages(). There is no reason to use a group interface here, kvmgt has easy access to a vfio_device. Delete kvmgt_vdev::vfio_group since these calls were the last users. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Zhi Wang <zhi.a.wang@intel.com> Link: https://lore.kernel.org/r/5-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio/mdev: Pass in a struct vfio_device * to vfio_dma_rw()Jason Gunthorpe3-16/+14
Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. Change vfio_dma_rw() to take in the struct vfio_device and move the container users that would have been held by vfio_group_get_external_user_from_dev() to vfio_dma_rw() directly, like vfio_pin/unpin_pages(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio/mdev: Pass in a struct vfio_device * to vfio_pin/unpin_pages()Jason Gunthorpe5-42/+27
Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. The struct vfio_device already contains the group we need so this avoids complexity, extra refcountings, and a confusing lifecycle model. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio/ccw: Remove mdev from struct channel_programJason Gunthorpe3-24/+30
The next patch wants the vfio_device instead. There is no reason to store a pointer here since we can container_of back to the vfio_device. Reviewed-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio: Make vfio_(un)register_notifier accept a vfio_deviceJason Gunthorpe5-44/+33
All callers have a struct vfio_device trivially available, pass it in directly and avoid calling the expensive vfio_group_get_from_dev(). Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio: Stop using iommu_present()Robin Murphy1-3/+3
IOMMU groups have been mandatory for some time now, so a device without one is necessarily a device without any usable IOMMU, therefore the iommu_present() check is redundant (or at best unhelpful). Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/537103bbd7246574f37f2c88704d7824a3a889f2.1649160714.git.robin.murphy@arm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-11vfio/mlx5: Run the SAVE state command in an async modeYishai Hadas3-9/+131
Use the PF asynchronous command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Link: https://lore.kernel.org/r/20220510090206.90374-5-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-11vfio/mlx5: Refactor to enable VFs migration in parallelYishai Hadas3-99/+59
Refactor to enable different VFs to run their commands over the PF command interface in parallel and to not block one each other. This is done by not using the global PF lock that was used before but relying on the VF attach/detach mechanism to sync. Link: https://lore.kernel.org/r/20220510090206.90374-4-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-11vfio/mlx5: Manage the VF attach/detach callback from the PFYishai Hadas3-34/+92
Manage the VF attach/detach callback from the PF. This lets the driver to enable parallel VFs migration as will be introduced in the next patch. As part of this, reorganize the VF is migratable code to be in a separate function and rename it to be set_migratable() to match its functionality. Link: https://lore.kernel.org/r/20220510090206.90374-3-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-10net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIsYishai Hadas2-1/+76
Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a VF register to be notified for its enablement / disablement by the PF. Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with its notifier block and upon VF remove it will call mlx5_sriov_blocking_notifier_unregister() to drop its registration. This can give a VF the ability to clean some resources upon disable before that the command interface goes down and on the other hand sets some stuff before that it's enabled. This may be used by a VF which is migration capable in few cases.(e.g. PF load/unload upon an health recovery). Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-08Linux 5.18-rc6Linus Torvalds1-1/+1
2022-05-08Revert "parisc: Increase parisc_cache_flush_threshold setting"Helge Deller1-15/+3
This reverts commit a58e9d0984e8dad53f17ec73ae3c1cc7f8d88151. Triggers segfaults with 32-bit kernels on PA8500 machines. Signed-off-by: Helge Deller <deller@gmx.de>
2022-05-08parisc: Mark cr16 clock unstable on all SMP machinesHelge Deller1-23/+4
The cr16 interval timers are not synchronized across CPUs, even with just one dual-core CPU. This becomes visible if the machines have a longer uptime. Signed-off-by: Helge Deller <deller@gmx.de>
2022-05-08parisc: Fix typos in commentsJulia Lawall6-6/+6
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Helge Deller <deller@gmx.de>
2022-05-08parisc: Change MAX_ADDRESS to become unsigned long longHelge Deller1-1/+1
Dave noticed that for the 32-bit kernel MAX_ADDRESS should be a ULL, otherwise this define would become 0: MAX_ADDRESS (1UL << MAX_ADDRBITS) It has no real effect on the kernel. Signed-off-by: Helge Deller <deller@gmx.de> Noticed-by: John David Anglin <dave.anglin@bell.net>
2022-05-08parisc: Merge model and model name into one line in /proc/cpuinfoHelge Deller1-2/+1
The Linux tool "lscpu" shows the double amount of CPUs if we have "model" and "model name" in two different lines in /proc/cpuinfo. This change combines the model and the model name into one line. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org
2022-05-08parisc: Re-enable GENERIC_CPU_DEVICES for !SMPHelge Deller1-0/+1
In commit 62773112acc5 ("parisc: Switch from GENERIC_CPU_DEVICES to GENERIC_ARCH_TOPOLOGY") GENERIC_CPU_DEVICES was unconditionally turned off, but this triggers a warning in topology_add_dev(). Turning it back on for the !SMP case avoids this warning. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Fixes: 62773112acc5 ("parisc: Switch from GENERIC_CPU_DEVICES to GENERIC_ARCH_TOPOLOGY") Signed-off-by: Helge Deller <deller@gmx.de>
2022-05-08parisc: Update 32- and 64-bit defconfigsHelge Deller2-2/+5
Enable CONFIG_CGROUPS=y on 32-bit defconfig for systemd-support, and enable CONFIG_NAMESPACES and CONFIG_USER_NS. Signed-off-by: Helge Deller <deller@gmx.de>
2022-05-08parisc: Only list existing CPUs in cpu_possible_maskHelge Deller1-0/+8
The inventory knows which CPUs are in the system, so this bitmask should be in cpu_possible_mask instead of the bitmask based on CONFIG_NR_CPUS. Reset the cpu_possible_mask before scanning the system for CPUs, and mark each existing CPU as possible during initialization of that CPU. This avoids those warnings later on too: register_cpu_capacity_sysctl: too early to get CPU4 device! Signed-off-by: Helge Deller <deller@gmx.de> Noticed-by: John David Anglin <dave.anglin@bell.net>
2022-05-08Revert "parisc: Fix patch code locking and flushing"Helge Deller1-11/+14
This reverts commit a9fe7fa7d874a536e0540469f314772c054a0323. Leads to segfaults on 32bit kernel. Signed-off-by: Helge Deller <deller@gmx.de>
2022-05-08Revert "parisc: Mark sched_clock unstable only if clocks are not syncronized"Helge Deller2-3/+6
This reverts commit d97180ad68bdb7ee10f327205a649bc2f558741d. It triggers RCU stalls at boot with a 32-bit kernel. Signed-off-by: Helge Deller <deller@gmx.de> Noticed-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # v5.15+
2022-05-08Revert "parisc: Mark cr16 CPU clocksource unstable on all SMP machines"Helge Deller1-8/+22
This reverts commit afdb4a5b1d340e4afffc65daa21cc71890d7d589. It triggers RCU stalls at boot with a 32-bit kernel. Signed-off-by: Helge Deller <deller@gmx.de> Noticed-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # v5.16+
2022-05-08blk-mq: remove the error_count from struct requestWilly Tarreau1-1/+0
The last two users were floppy.c and ataflop.c respectively, it was verified that no other drivers makes use of this, so let's remove it. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Minh Yuan <yuanmingbuaa@gmail.com> Cc: Denis Efremov <efremov@linux.com>, Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-08ataflop: use a statically allocated error countersWilly Tarreau1-4/+6
This is the last driver making use of fd_request->error_count, which is easy to get wrong as was shown in floppy.c. We don't need to keep it there, it can be moved to the atari_floppy_struct instead, so let's do this. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Minh Yuan <yuanmingbuaa@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-08floppy: use a statically allocated error counterWilly Tarreau1-10/+8
Interrupt handler bad_flp_intr() may cause a UAF on the recently freed request just to increment the error count. There's no point keeping that one in the request anyway, and since the interrupt handler uses a static pointer to the error which cannot be kept in sync with the pending request, better make it use a static error counter that's reset for each new request. This reset now happens when entering redo_fd_request() for a new request via set_next_request(). One initial concern about a single error counter was that errors on one floppy drive could be reported on another one, but this problem is not real given that the driver uses a single drive at a time, as that PC-compatible controllers also have this limitation by using shared signals. As such the error count is always for the "current" drive. Reported-by: Minh Yuan <yuanmingbuaa@gmail.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Tested-by: Denis Efremov <efremov@linux.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>