aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/vfio (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-05-01Merge tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds1-18/+13
Pull iommu updates from Joerg Roedel: - Big cleanup of almost unsused parts of the IOMMU API by Christoph Hellwig. This mostly affects the Freescale PAMU driver. - New IOMMU driver for Unisoc SOCs - ARM SMMU Updates from Will: - Drop vestigial PREFETCH_ADDR support (SMMUv3) - Elide TLB sync logic for empty gather (SMMUv3) - Fix "Service Failure Mode" handling (SMMUv3) - New Qualcomm compatible string (SMMUv2) - Removal of the AMD IOMMU performance counter writeable check on AMD. It caused long boot delays on some machines and is only needed to work around an errata on some older (possibly pre-production) chips. If someone is still hit by this hardware issue anyway the performance counters will just return 0. - Support for targeted invalidations in the AMD IOMMU driver. Before that the driver only invalidated a single 4k page or the whole IO/TLB for an address space. This has been extended now and is mostly useful for emulated AMD IOMMUs. - Several fixes for the Shared Virtual Memory support in the Intel VT-d driver - Mediatek drivers can now be built as modules - Re-introduction of the forcedac boot option which got lost when converting the Intel VT-d driver to the common dma-iommu implementation. - Extension of the IOMMU device registration interface and support iommu_ops to be const again when drivers are built as modules. * tag 'iommu-updates-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (84 commits) iommu: Streamline registration interface iommu: Statically set module owner iommu/mediatek-v1: Add error handle for mtk_iommu_probe iommu/mediatek-v1: Avoid build fail when build as module iommu/mediatek: Always enable the clk on resume iommu/fsl-pamu: Fix uninitialized variable warning iommu/vt-d: Force to flush iotlb before creating superpage iommu/amd: Put newline after closing bracket in warning iommu/vt-d: Fix an error handling path in 'intel_prepare_irq_remapping()' iommu/vt-d: Fix build error of pasid_enable_wpe() with !X86 iommu/amd: Remove performance counter pre-initialization test Revert "iommu/amd: Fix performance counter initialization" iommu/amd: Remove duplicate check of devid iommu/exynos: Remove unneeded local variable initialization iommu/amd: Page-specific invalidations for more than one page iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command iommu/vt-d: Avoid unnecessary cache flush in pasid entry teardown iommu/vt-d: Invalidate PASID cache when root/context entry changed iommu/vt-d: Remove WO permissions on second-level paging entries iommu/vt-d: Report the right page fault address ...
2021-04-28Merge tag 'vfio-v5.13-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds21-1109/+572
Pull VFIO updates from Alex Williamson: - Embed struct vfio_device into vfio driver structures (Jason Gunthorpe) - Make vfio_mdev type safe (Jason Gunthorpe) - Remove vfio-pci NVLink2 extensions for POWER9 (Christoph Hellwig) - Update vfio-pci IGD extensions for OpRegion 2.1+ (Fred Gao) - Various spelling/blank line fixes (Zhen Lei, Zhou Wang, Bhaskar Chowdhury) - Simplify unpin_pages error handling (Shenming Lu) - Fix i915 mdev Kconfig dependency (Arnd Bergmann) - Remove unused structure member (Keqian Zhu) * tag 'vfio-v5.13-rc1' of git://github.com/awilliam/linux-vfio: (43 commits) vfio/gvt: fix DRM_I915_GVT dependency on VFIO_MDEV vfio/iommu_type1: Remove unused pinned_page_dirty_scope in vfio_iommu vfio/mdev: Correct the function signatures for the mdev_type_attributes vfio/mdev: Remove kobj from mdev_parent_ops->create() vfio/gvt: Use mdev_get_type_group_id() vfio/gvt: Make DRM_I915_GVT depend on VFIO_MDEV vfio/mbochs: Use mdev_get_type_group_id() vfio/mdpy: Use mdev_get_type_group_id() vfio/mtty: Use mdev_get_type_group_id() vfio/mdev: Add mdev/mtype_get_type_group_id() vfio/mdev: Remove duplicate storage of parent in mdev_device vfio/mdev: Add missing error handling to dev_set_name() vfio/mdev: Reorganize mdev_device_create() vfio/mdev: Add missing reference counting to mdev_type vfio/mdev: Expose mdev_get/put_parent to mdev_private.h vfio/mdev: Use struct mdev_type in struct mdev_device vfio/mdev: Simplify driver registration vfio/mdev: Add missing typesafety around mdev_device vfio/mdev: Do not allow a mdev_type to have a NULL parent pointer vfio/mdev: Fix missing static's on MDEV_TYPE_ATTR's ...
2021-04-16Merge branches 'iommu/fixes', 'arm/mediatek', 'arm/smmu', 'arm/exynos', 'unisoc', 'x86/vt-d', 'x86/amd' and 'core' into nextJoerg Roedel1-18/+13
2021-04-14vfio/iommu_type1: Remove unused pinned_page_dirty_scope in vfio_iommuKeqian Zhu1-1/+0
pinned_page_dirty_scope is optimized out by commit 010321565a7d ("vfio/iommu_type1: Mantain a counter for non_pinned_groups"), but appears again due to some issues during merging branches. We can safely remove it here. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Message-Id: <20210412024415.30676-1-zhukeqian1@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-13vfio/pci: Add missing range check in vfio_pci_mmapChristian A. Ehrhardt1-1/+3
When mmaping an extra device region verify that the region index derived from the mmap offset is valid. Fixes: a15b1883fee1 ("vfio_pci: Allow mapping extra regions") Cc: stable@vger.kernel.org Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Message-Id: <20210412214124.GA241759@lisa.in-ulm.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-12vfio/mdev: Correct the function signatures for the mdev_type_attributesJason Gunthorpe2-7/+18
The driver core standard is to pass in the properly typed object, the properly typed attribute and the buffer data. It stems from the root kobject method: ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,..) Each subclass of kobject should provide their own function with the same signature but more specific types, eg struct device uses: ssize_t (*show)(struct device *dev, struct device_attribute *attr,..) In this case the existing signature is: ssize_t (*show)(struct kobject *kobj, struct device *dev,..) Where kobj is a 'struct mdev_type *' and dev is 'mdev_type->parent->dev'. Change the mdev_type related sysfs attribute functions to: ssize_t (*show)(struct mdev_type *mtype, struct mdev_type_attribute *attr,..) In order to restore type safety and match the driver core standard There are no current users of 'attr', but if it is ever needed it would be hard to add in retroactively, so do it now. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <18-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-12vfio/mdev: Remove kobj from mdev_parent_ops->create()Jason Gunthorpe1-1/+1
The kobj here is a type-erased version of mdev_type, which is already stored in the struct mdev_device being passed in. It was only ever used to compute the type_group_id, which is now extracted directly from the mdev. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <17-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Add mdev/mtype_get_type_group_id()Jason Gunthorpe3-7/+30
This returns the index in the supported_type_groups array that is associated with the mdev_type attached to the struct mdev_device or its containing struct kobject. Each mdev_device can be spawned from exactly one mdev_type, which in turn originates from exactly one supported_type_group. Drivers are using weird string calculations to try and get back to this index, providing a direct access to the index removes a bunch of wonky driver code. mdev_type->group can be deleted as the group is obtained using the type_group_id. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <11-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Remove duplicate storage of parent in mdev_deviceJason Gunthorpe2-15/+12
mdev_device->type->parent is the same thing. The struct mdev_device was relying on the kref on the mdev_parent to also indirectly hold a kref on the mdev_type pointer. Now that the type holds a kref on the parent we can directly kref the mdev_type and remove this implicit relationship. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <10-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Add missing error handling to dev_set_name()Jason Gunthorpe1-1/+3
This can fail, and seems to be a popular target for syzkaller error injection. Check the error return and unwind with put_device(). Fixes: 7b96953bc640 ("vfio: Mediated device Core driver") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <9-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Reorganize mdev_device_create()Jason Gunthorpe1-33/+27
Once the memory for the struct mdev_device is allocated it should immediately be device_initialize()'d and filled in so that put_device() can always be used to undo the allocation. Place the mdev_get/put_parent() so that they are clearly protecting the mdev->parent pointer. Move the final put to the release function so that the lifetime rules are trivial to understand. Update the goto labels to follow the normal convention. Remove mdev_device_free() as the release function via device_put() is now usable in all cases. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <8-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Add missing reference counting to mdev_typeJason Gunthorpe1-0/+4
struct mdev_type holds a pointer to the kref'd object struct mdev_parent, but doesn't hold the kref. The lifetime of the parent becomes implicit because parent_remove_sysfs_files() is supposed to remove all the access before the parent can be freed, but this is very hard to reason about. Make it obviously correct by adding the missing get. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <7-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Expose mdev_get/put_parent to mdev_private.hJason Gunthorpe2-20/+15
The next patch will use these in mdev_sysfs.c While here remove the now dead code checks for NULL, a mdev_type can never have a NULL parent. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <6-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Use struct mdev_type in struct mdev_deviceJason Gunthorpe3-19/+15
The kobj pointer in mdev_device is actually pointing at a struct mdev_type. Use the proper type so things are understandable. There are a number of places that are confused and passing both the mdev and the mtype as function arguments, fix these to derive the mtype directly from the mdev to remove the redundancy. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <5-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Simplify driver registrationJason Gunthorpe2-10/+13
This is only done once, we don't need to generate code to initialize a structure stored in the ELF .data segment. Fill in the three required .driver members directly instead of copying data into them during mdev_register_driver(). Further the to_mdev_driver() function doesn't belong in a public header, just inline it into the two places that need it. Finally, we can now clearly see that 'drv' derived from dev->driver cannot be NULL, firstly because the driver core forbids it, and secondly because NULL won't pass through the container_of(). Remove the dead code. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <4-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Add missing typesafety around mdev_deviceJason Gunthorpe6-114/+35
The mdev API should accept and pass a 'struct mdev_device *' in all places, not pass a 'struct device *' and cast it internally with to_mdev_device(). Particularly in its struct mdev_driver functions, the whole point of a bus's struct device_driver wrapper is to provide type safety compared to the default struct device_driver. Further, the driver core standard is for bus drivers to expose their device structure in their public headers that can be used with container_of() inlines and '&foo->dev' to go between the class levels, and '&foo->dev' to be used with dev_err/etc driver core helper functions. Move 'struct mdev_device' to mdev.h Once done this allows moving some one instruction exported functions to static inlines, which in turns allows removing one of the two grotesque symbol_get()'s related to mdev in the core code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Message-Id: <3-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07vfio/mdev: Do not allow a mdev_type to have a NULL parent pointerJason Gunthorpe1-1/+1
There is a small race where the parent is NULL even though the kobj has already been made visible in sysfs. For instance the attribute_group is made visible in sysfs_create_files() and the mdev_type_attr_show() does: ret = attr->show(kobj, type->parent->dev, buf); Which will crash on NULL parent. Move the parent setup to before the type pointer leaves the stack frame. Fixes: 7b96953bc640 ("vfio: Mediated device Core driver") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <2-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-07iommu: remove DOMAIN_ATTR_NESTINGChristoph Hellwig1-4/+1
Use an explicit enable_nesting method instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Li Yang <leoyang.li@nxp.com> Link: https://lore.kernel.org/r/20210401155256.298656-17-hch@lst.de Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-07iommu: remove DOMAIN_ATTR_GEOMETRYChristoph Hellwig1-14/+12
The geometry information can be trivially queried from the iommu_domain struture. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Li Yang <leoyang.li@nxp.com> Link: https://lore.kernel.org/r/20210401155256.298656-16-hch@lst.de Signed-off-by: Joerg Roedel <jroedel@suse.de>
2021-04-06Merge branches 'v5.13/vfio/embed-vfio_device', 'v5.13/vfio/misc' and 'v5.13/vfio/nvlink' into v5.13/vfio/nextAlex Williamson10-541/+64
Spelling fixes merged with file deletion. Conflicts: drivers/vfio/pci/vfio_pci_nvlink2.c Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Remove device_data from the vfio bus driver APIJason Gunthorpe5-16/+7
There are no longer any users, so it can go away. Everything is using container_of now. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <14-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/pci: Replace uses of vfio_device_data() with container_ofJason Gunthorpe1-43/+24
This tidies a few confused places that think they can have a refcount on the vfio_device but the device_data could be NULL, that isn't possible by design. Most of the change falls out when struct vfio_devices is updated to just store the struct vfio_pci_device itself. This wasn't possible before because there was no easy way to get from the 'struct vfio_pci_device' to the 'struct vfio_device' to put back the refcount. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <13-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Make vfio_device_ops pass a 'struct vfio_device *' instead of 'void *'Jason Gunthorpe5-70/+99
This is the standard kernel pattern, the ops associated with a struct get the struct pointer in for typesafety. The expected design is to use container_of to cleanly go from the subsystem level type to the driver level type without having any type erasure in a void *. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <12-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/mdev: Make to_mdev_device() into a static inlineJason Gunthorpe1-1/+4
The macro wrongly uses 'dev' as both the macro argument and the member name, which means it fails compilation if any caller uses a word other than 'dev' as the single argument. Fix this defect by making it into proper static inline, which is more clear and typesafe anyhow. Fixes: 99e3123e3d72 ("vfio-mdev: Make mdev_device private and abstract interfaces") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <11-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/mdev: Use vfio_init/register/unregister_group_devJason Gunthorpe2-39/+20
mdev gets little benefit because it doesn't actually do anything, however it is the last user, so move the vfio_init/register/unregister_group_dev() code here for now. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <10-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/pci: Use vfio_init/register/unregister_group_devJason Gunthorpe2-5/+6
pci already allocates a struct vfio_pci_device with exactly the same lifetime as vfio_device, switch to the new API and embed vfio_device in vfio_pci_device. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <9-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/pci: Re-order vfio_pci_probe()Jason Gunthorpe1-8/+9
vfio_add_group_dev() must be called only after all of the private data in vdev is fully setup and ready, otherwise there could be races with user space instantiating a device file descriptor and starting to call ops. For instance vfio_pci_reflck_attach() sets vdev->reflck and vfio_pci_open(), called by fops open, unconditionally derefs it, which will crash if things get out of order. Fixes: cc20d7999000 ("vfio/pci: Introduce VF token") Fixes: e309df5b0c9e ("vfio/pci: Parallelize device open and release") Fixes: 6eb7018705de ("vfio-pci: Move idle devices to D3hot power state") Fixes: ecaa1f6a0154 ("vfio-pci: Add VGA arbiter client") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <8-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/pci: Move VGA and VF initialization to functionsJason Gunthorpe1-42/+74
vfio_pci_probe() is quite complicated, with optional VF and VGA sub components. Move these into clear init/uninit functions and have a linear flow in probe/remove. This fixes a few little buglets: - vfio_pci_remove() is in the wrong order, vga_client_register() removes a notifier and is after kfree(vdev), but the notifier refers to vdev, so it can use after free in a race. - vga_client_register() can fail but was ignored Organize things so destruction order is the reverse of creation order. Fixes: ecaa1f6a0154 ("vfio-pci: Add VGA arbiter client") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <7-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/fsl-mc: Use vfio_init/register/unregister_group_devJason Gunthorpe2-9/+12
fsl-mc already allocates a struct vfio_fsl_mc_device with exactly the same lifetime as vfio_device, switch to the new API and embed vfio_device in vfio_fsl_mc_device. While here remove the devm usage for the vdev, this code is clean and doesn't need devm. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <6-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/fsl-mc: Re-order vfio_fsl_mc_probe()Jason Gunthorpe1-27/+47
vfio_add_group_dev() must be called only after all of the private data in vdev is fully setup and ready, otherwise there could be races with user space instantiating a device file descriptor and starting to call ops. For instance vfio_fsl_mc_reflck_attach() sets vdev->reflck and vfio_fsl_mc_open(), called by fops open, unconditionally derefs it, which will crash if things get out of order. This driver started life with the right sequence, but two commits added stuff after vfio_add_group_dev(). Fixes: 2e0d29561f59 ("vfio/fsl-mc: Add irq infrastructure for fsl-mc devices") Fixes: f2ba7e8c947b ("vfio/fsl-mc: Added lock support in preparation for interrupt handling") Co-developed-by: Diana Craciun OSS <diana.craciun@oss.nxp.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <5-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/platform: Use vfio_init/register/unregister_group_devJason Gunthorpe4-31/+25
platform already allocates a struct vfio_platform_device with exactly the same lifetime as vfio_device, switch to the new API and embed vfio_device in vfio_platform_device. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <4-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Split creation of a vfio_device into init and register opsJason Gunthorpe1-60/+65
This makes the struct vfio_device part of the public interface so it can be used with container_of and so forth, as is typical for a Linux subystem. This is the first step to bring some type-safety to the vfio interface by allowing the replacement of 'void *' and 'struct device *' inputs with a simple and clear 'struct vfio_device *' For now the self-allocating vfio_add_group_dev() interface is kept so each user can be updated as a separate patch. The expected usage pattern is driver core probe() function: my_device = kzalloc(sizeof(*mydevice)); vfio_init_group_dev(&my_device->vdev, dev, ops, mydevice); /* other driver specific prep */ vfio_register_group_dev(&my_device->vdev); dev_set_drvdata(dev, my_device); driver core remove() function: my_device = dev_get_drvdata(dev); vfio_unregister_group_dev(&my_device->vdev); /* other driver specific tear down */ kfree(my_device); Allowing the driver to be able to use the drvdata and vfio_device to go to/from its own data. The pattern also makes it clear that vfio_register_group_dev() must be last in the sequence, as once it is called the core code can immediately start calling ops. The init/register gap is provided to allow for the driver to do setup before ops can be called and thus avoid races. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Liu Yi L <yi.l.liu@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <3-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Simplify the lifetime logic for vfio_deviceJason Gunthorpe1-54/+25
The vfio_device is using a 'sleep until all refs go to zero' pattern for its lifetime, but it is indirectly coded by repeatedly scanning the group list waiting for the device to be removed on its own. Switch this around to be a direct representation, use a refcount to count the number of places that are blocking destruction and sleep directly on a completion until that counter goes to zero. kfree the device after other accesses have been excluded in vfio_del_group_dev(). This is a fairly common Linux idiom. Due to this we can now remove kref_put_mutex(), which is very rarely used in the kernel. Here it is being used to prevent a zero ref device from being seen in the group list. Instead allow the zero ref device to continue to exist in the device_list and use refcount_inc_not_zero() to exclude it once refs go to zero. This patch is organized so the next patch will be able to alter the API to allow drivers to provide the kfree. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <2-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: Remove extra put/gets around vfio_device->groupJason Gunthorpe1-19/+2
The vfio_device->group value has a get obtained during vfio_add_group_dev() which gets moved from the stack to vfio_device->group in vfio_group_create_device(). The reference remains until we reach the end of vfio_del_group_dev() when it is put back. Thus anything that already has a kref on the vfio_device is guaranteed a valid group pointer. Remove all the extra reference traffic. It is tricky to see, but the get at the start of vfio_del_group_dev() is actually pairing with the put hidden inside vfio_device_put() a few lines below. A later patch merges vfio_group_create_device() into vfio_add_group_dev() which makes the ownership and error flow on the create side easier to follow. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <1-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/pci: remove vfio_pci_nvlink2Christoph Hellwig5-529/+0
This driver never had any open userspace (which for VFIO would include VM kernel drivers) that use it, and thus should never have been added by our normal userspace ABI rules. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <20210326061311.1497642-2-hch@lst.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/type1: Remove the almost unused check in vfio_iommu_type1_unpin_pagesShenming Lu1-4/+4
The check i > npage at the end of vfio_iommu_type1_unpin_pages is unused unless npage < 0, but if npage < 0, this function will return npage, which should return -EINVAL instead. So let's just check the parameter npage at the start of the function. By the way, replace unpin_exit with break. Signed-off-by: Shenming Lu <lushenming@huawei.com> Message-Id: <20210406135009.1707-1-lushenming@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/platform: Fix spelling mistake "registe" -> "register"Zhen Lei1-1/+1
There is a spelling mistake in a comment, fix it. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Acked-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-5-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/pci: fix a couple of spelling mistakesZhen Lei2-3/+3
There are several spelling mistakes, as follows: thru ==> through presense ==> presence Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-4-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/mdev: Fix spelling mistake "interal" -> "internal"Zhen Lei1-1/+1
There is a spelling mistake in a comment, fix it. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-3-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/type1: fix a couple of spelling mistakesZhen Lei1-3/+3
There are several spelling mistakes, as follows: userpsace ==> userspace Accouting ==> Accounting exlude ==> exclude Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20210326083528.1329-2-thunder.leizhen@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/pci: Add support for opregion v2.1+Fred Gao1-0/+53
Before opregion version 2.0 VBT data is stored in opregion mailbox #4, but when VBT data exceeds 6KB size and cannot be within mailbox #4 then from opregion v2.0+, Extended VBT region, next to opregion is used to hold the VBT data, so the total size will be opregion size plus extended VBT region size. Since opregion v2.0 with physical host VBT address would not be practically available for end user and guest can not directly access host physical address, so it is not supported. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Swee Yee Fonn <swee.yee.fonn@intel.com> Signed-off-by: Fred Gao <fred.gao@intel.com> Message-Id: <20210325170953.24549-1-fred.gao@intel.com> Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio/pci: Remove an unnecessary blank line in vfio_pci_enableZhou Wang1-1/+0
This blank line is unnecessary, so remove it. Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Message-Id: <1615808073-178604-1-git-send-email-wangzhou1@hisilicon.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-04-06vfio: pci: Spello fix in the file vfio_pci.cBhaskar Chowdhury1-1/+1
s/permision/permission/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Message-Id: <20210314052925.3560-1-unixbhaskar@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-29vfio/nvlink: Add missing SPAPR_TCE_IOMMU dependsJason Gunthorpe1-1/+1
Compiling the nvlink stuff relies on the SPAPR_TCE_IOMMU otherwise there are compile errors: drivers/vfio/pci/vfio_pci_nvlink2.c:101:10: error: implicit declaration of function 'mm_iommu_put' [-Werror,-Wimplicit-function-declaration] ret = mm_iommu_put(data->mm, data->mem); As PPC only defines these functions when the config is set. Previously this wasn't a problem by chance as SPAPR_TCE_IOMMU was the only IOMMU that could have satisfied IOMMU_API on POWERNV. Fixes: 179209fa1270 ("vfio: IOMMU_API should be selected") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <0-v1-83dba9768fc3+419-vfio_nvlink2_kconfig_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-25vfio/type1: Empty batch for pfnmap pagesDaniel Jordan1-0/+6
When vfio_pin_pages_remote() returns with a partial batch consisting of a single VM_PFNMAP pfn, a subsequent call will unfortunately try restoring it from batch->pages, resulting in vfio mapping the wrong page and unbalancing the page refcount. Prevent the function from returning with this kind of partial batch to avoid the issue. There's no explicit check for a VM_PFNMAP pfn because it's awkward to do so, so infer it from characteristics of the batch instead. This may result in occasional false positives but keeps the code simpler. Fixes: 4d83de6da265 ("vfio/type1: Batch page pinning") Link: https://lkml.kernel.org/r/20210323133254.33ed9161@omen.home.shazbot.org/ Reported-by: Alex Williamson <alex.williamson@redhat.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Message-Id: <20210325010552.185481-1-daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16vfio/type1: fix vaddr_get_pfns() return in vfio_pin_page_external()Daniel Jordan1-1/+7
vaddr_get_pfns() now returns the positive number of pfns successfully gotten instead of zero. vfio_pin_page_external() might return 1 to vfio_iommu_type1_pin_pages(), which will treat it as an error, if vaddr_get_pfns() is successful but vfio_pin_page_external() doesn't reach vfio_lock_acct(). Fix it up in vfio_pin_page_external(). Found by inspection. Fixes: be16c1fd99f4 ("vfio/type1: Change success value of vaddr_get_pfn()") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Message-Id: <20210308172452.38864-1-daniel.m.jordan@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16vfio: Depend on MMUJason Gunthorpe1-1/+1
VFIO_IOMMU_TYPE1 does not compile with !MMU: ../drivers/vfio/vfio_iommu_type1.c: In function 'follow_fault_pfn': ../drivers/vfio/vfio_iommu_type1.c:536:22: error: implicit declaration of function 'pte_write'; did you mean 'vfs_write'? [-Werror=implicit-function-declaration] So require it. Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <0-v1-02cb5500df6e+78-vfio_no_mmu_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16ARM: amba: Allow some ARM_AMBA users to compile with COMPILE_TESTJason Gunthorpe1-1/+1
CONFIG_VFIO_AMBA has a light use of AMBA, adding some inline fallbacks when AMBA is disabled will allow it to be compiled under COMPILE_TEST and make VFIO easier to maintain. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <3-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16vfio-platform: Add COMPILE_TEST to VFIO_PLATFORMJason Gunthorpe1-1/+1
x86 can build platform bus code too, so vfio-platform and all the platform reset implementations compile successfully on x86. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <2-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-03-16vfio: IOMMU_API should be selectedJason Gunthorpe1-1/+1
As IOMMU_API is a kconfig without a description (eg does not show in the menu) the correct operator is select not 'depends on'. Using 'depends on' for this kind of symbol means VFIO is not selectable unless some other random kconfig has already enabled IOMMU_API for it. Fixes: cba3345cc494 ("vfio: VFIO core") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Message-Id: <1-v1-df057e0f92c3+91-vfio_arm_compile_test_jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>