aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers/iommu (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-27Merge tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linuxLinus Torvalds2-92/+474
Pull dma-mapping updates from Marek Szyprowski: "New two step DMA mapping API, which is is a first step to a long path to provide alternatives to scatterlist and to remove hacks, abuses and design mistakes related to scatterlists. This new approach optimizes some calls to DMA-IOMMU layer and cache maintenance by batching them, reduces memory usage as it is no need to store mapped DMA addresses to unmap them, and reduces some function call overhead. It is a combination effort of many people, lead and developed by Christoph Hellwig and Leon Romanovsky" * tag 'dma-mapping-6.16-2025-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: docs: core-api: document the IOVA-based API dma-mapping: add a dma_need_unmap helper dma-mapping: Implement link/unlink ranges API iommu/dma: Factor out a iommu_dma_map_swiotlb helper dma-mapping: Provide an interface to allow allocate IOVA iommu: add kernel-doc for iommu_unmap_fast iommu: generalize the batched sync after map interface dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.h PCI/P2PDMA: Refactor the p2pdma mapping helpers
2025-05-22iommu: Skip PASID validation for devices without PASID capabilityTushar Dave1-15/+28
Generally PASID support requires ACS settings that usually create single device groups, but there are some niche cases where we can get multi-device groups and still have working PASID support. The primary issue is that PCI switches are not required to treat PASID tagged TLPs specially so appropriate ACS settings are required to route all TLPs to the host bridge if PASID is going to work properly. pci_enable_pasid() does check that each device that will use PASID has the proper ACS settings to achieve this routing. However, no-PASID devices can be combined with PASID capable devices within the same topology using non-uniform ACS settings. In this case the no-PASID devices may not have strict route to host ACS flags and end up being grouped with the PASID devices. This configuration fails to allow use of the PASID within the iommu core code which wrongly checks if the no-PASID device supports PASID. Fix this by ignoring no-PASID devices during the PASID validation. They will never issue a PASID TLP anyhow so they can be ignored. Fixes: c404f55c26fc ("iommu: Validate the PASID in iommu_attach_device_pasid()") Cc: stable@vger.kernel.org Signed-off-by: Tushar Dave <tdave@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250520011937.3230557-1-tdave@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-05-06dma-mapping: Implement link/unlink ranges APILeon Romanovsky1-1/+274
Introduce new DMA APIs to perform DMA linkage of buffers in layers higher than DMA. In proposed API, the callers will perform the following steps. In map path: if (dma_can_use_iova(...)) dma_iova_alloc() for (page in range) dma_iova_link_next(...) dma_iova_sync(...) else /* Fallback to legacy map pages */ for (all pages) dma_map_page(...) In unmap path: if (dma_can_use_iova(...)) dma_iova_destroy() else for (all pages) dma_unmap_page(...) Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06iommu/dma: Factor out a iommu_dma_map_swiotlb helperChristoph Hellwig1-32/+41
Split the iommu logic from iommu_dma_map_page into a separate helper. This not only keeps the code neatly separated, but will also allow for reuse in another caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: Provide an interface to allow allocate IOVALeon Romanovsky1-0/+86
The existing .map_pages() callback provides both allocating of IOVA and linking DMA pages. That combination works great for most of the callers who use it in control paths, but is less effective in fast paths where there may be multiple calls to map_page(). These advanced callers already manage their data in some sort of database and can perform IOVA allocation in advance, leaving range linkage operation to be in fast path. Provide an interface to allocate/deallocate IOVA and next patch link/unlink DMA ranges to that specific IOVA. In the new API a DMA mapping transaction is identified by a struct dma_iova_state, which holds some recomputed information for the transaction which does not change for each page being mapped, so add a check if IOVA can be used for the specific transaction. The API is exported from dma-iommu as it is the only implementation supported, the namespace is clearly different from iommu_* functions which are not allowed to be used. This code layout allows us to save function call per API call used in datapath as well as a lot of boilerplate code. Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06iommu: add kernel-doc for iommu_unmap_fastLeon Romanovsky1-0/+19
Add kernel-doc section for iommu_unmap_fast to document existing limitation of underlying functions which can't split individual ranges. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06iommu: generalize the batched sync after map interfaceChristoph Hellwig1-36/+29
For the upcoming IOVA-based DMA API we want to batch the ops->iotlb_sync_map() call after mapping multiple IOVAs from dma-iommu without having a scatterlist. Improve the API. Add a wrapper for the map_sync as iommu_sync_map() so that callers don't need to poke into the methods directly. Formalize __iommu_map() into iommu_map_nosync() which requires the caller to call iommu_sync_map() after all maps are completed. Refactor the existing sanity checks from all the different layers into iommu_map_nosync(). Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Will Deacon <will@kernel.org> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06dma-mapping: move the PCI P2PDMA mapping helpers to pci-p2pdma.hChristoph Hellwig1-0/+1
To support the upcoming non-scatterlist mapping helpers, we need to go back to have them called outside of the DMA API. Thus move them out of dma-map-ops.h, which is only for DMA API implementations to pci-p2pdma.h, which is for driver use. Note that the core helper is still not exported as the mapping is expected to be done only by very highlevel subsystem code at least for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-06PCI/P2PDMA: Refactor the p2pdma mapping helpersChristoph Hellwig1-23/+24
The current scheme with a single helper to determine the P2P status and map a scatterlist segment force users to always use the map_sg helper to DMA map, which we're trying to get away from because they are very cache inefficient. Refactor the code so that there is a single helper that checks the P2P state for a page, including the result that it is not a P2P page to simplify the callers, and a second one to perform the address translation for a bus mapped P2P transfer that does not depend on the scatterlist structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2025-05-02Merge tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linuxLinus Torvalds4-18/+54
Pull iommu fixes from Joerg Roedel: "ARM-SMMU fixes: - Fix broken detection of the S2FWB feature - Ensure page-size bitmap is initialised for SVA domains - Fix handling of SMMU client devices with duplicate Stream IDs - Don't fail SMMU probe if Stream IDs are aliased across clients Intel VT-d fixes: - Add quirk for IGFX device - Revert an ATS change to fix a boot failure AMD IOMMU: - Fix potential buffer overflow Core: - Fix for iommu_copy_struct_from_user()" * tag 'iommu-fixes-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57) iommu/vt-d: Revert ATS timing change to fix boot failure iommu: Fix two issues in iommu_copy_struct_from_user() iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefully iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids iommu/arm-smmu-v3: Fix pgsize_bit for sva domains iommu/arm-smmu-v3: Add missing S2FWB feature detection
2025-04-28iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57)Mingcong Bai1-1/+3
On the Lenovo ThinkPad X201, when Intel VT-d is enabled in the BIOS, the kernel boots with errors related to DMAR, the graphical interface appeared quite choppy, and the system resets erratically within a minute after it booted: DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xb97ff000 [fault reason 0x05] PTE Write access is not set Upon comparing boot logs with VT-d on/off, I found that the Intel Calpella quirk (`quirk_calpella_no_shadow_gtt()') correctly applied the igfx IOMMU disable/quirk correctly: pci 0000:00:00.0: DMAR: BIOS has allocated no shadow GTT; disabling IOMMU for graphics Whereas with VT-d on, it went into the "else" branch, which then triggered the DMAR handling fault above: ... else if (!disable_igfx_iommu) { /* we have to ensure the gfx device is idle before we flush */ pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); iommu_set_dma_strict(); } Now, this is not exactly scientific, but moving 0x0044 to quirk_iommu_igfx seems to have fixed the aforementioned issue. Running a few `git blame' runs on the function, I have found that the quirk was originally introduced as a fix specific to ThinkPad X201: commit 9eecabcb9a92 ("intel-iommu: Abort IOMMU setup for igfx if BIOS gave no shadow GTT space") Which was later revised twice to the "else" branch we saw above: - 2011: commit 6fbcfb3e467a ("intel-iommu: Workaround IOTLB hang on Ironlake GPU") - 2024: commit ba00196ca41c ("iommu/vt-d: Decouple igfx_off from graphic identity mapping") I'm uncertain whether further testings on this particular laptops were done in 2011 and (honestly I'm not sure) 2024, but I would be happy to do some distro-specific testing if that's what would be required to verify this patch. P.S., I also see IDs 0x0040, 0x0062, and 0x006a listed under the same `quirk_calpella_no_shadow_gtt()' quirk, but I'm not sure how similar these chipsets are (if they share the same issue with VT-d or even, indeed, if this issue is specific to a bug in the Lenovo BIOS). With regards to 0x0062, it seems to be a Centrino wireless card, but not a chipset? I have also listed a couple (distro and kernel) bug reports below as references (some of them are from 7-8 years ago!), as they seem to be similar issue found on different Westmere/Ironlake, Haswell, and Broadwell hardware setups. Cc: stable@vger.kernel.org Fixes: 6fbcfb3e467a ("intel-iommu: Workaround IOTLB hang on Ironlake GPU") Fixes: ba00196ca41c ("iommu/vt-d: Decouple igfx_off from graphic identity mapping") Link: https://groups.google.com/g/qubes-users/c/4NP4goUds2c?pli=1 Link: https://bugs.archlinux.org/task/65362 Link: https://bbs.archlinux.org/viewtopic.php?id=230323 Reported-by: Wenhao Sun <weiguangtwk@outlook.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=197029 Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Link: https://lore.kernel.org/r/20250415133330.12528-1-jeffbai@aosc.io Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-24iommu/amd: WARN if KVM attempts to set vCPU affinity without posted intrruptsSean Christopherson1-10/+3
WARN if KVM attempts to set vCPU affinity when posted interrupts aren't enabled, as KVM shouldn't try to enable posting when they're unsupported, and the IOMMU driver darn well should only advertise posting support when AMD_IOMMU_GUEST_IR_VAPIC() is true. Note, KVM consumes is_guest_mode only on success. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-24iommu/amd: Return an error if vCPU affinity is set for non-vCPU IRTESean Christopherson1-1/+1
Return -EINVAL instead of success if amd_ir_set_vcpu_affinity() is invoked without use_vapic; lying to KVM about whether or not the IRTE was configured to post IRQs is all kinds of bad. Fixes: d98de49a53e4 ("iommu/amd: Enable vAPIC interrupt remapping mode by default") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250404193923.1413163-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-04-17iommu/vt-d: Revert ATS timing change to fix boot failureLu Baolu1-12/+19
Commit <5518f239aff1> ("iommu/vt-d: Move scalable mode ATS enablement to probe path") changed the PCI ATS enablement logic to run earlier, specifically before the default domain attachment. On some client platforms, this change resulted in boot failures, causing the kernel to panic with the following message and call trace: Kernel panic - not syncing: DMAR hardware is malfunctioning CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3+ #175 Call Trace: <TASK> dump_stack_lvl+0x6f/0xb0 dump_stack+0x10/0x16 panic+0x10a/0x2b7 iommu_enable_translation.cold+0xc/0xc intel_iommu_init+0xe39/0xec0 ? trace_hardirqs_on+0x1e/0xd0 ? __pfx_pci_iommu_init+0x10/0x10 pci_iommu_init+0xd/0x40 do_one_initcall+0x5b/0x390 kernel_init_freeable+0x26d/0x2b0 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x15/0x120 ret_from_fork+0x35/0x60 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1a/0x30 RIP: 1f0f:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0000:0000000000000000 EFLAGS: 841f0f2e66 ORIG_RAX: 1f0f2e6600000000 RAX: 0000000000000000 RBX: 1f0f2e6600000000 RCX: 2e66000000000084 RDX: 0000000000841f0f RSI: 000000841f0f2e66 RDI: 00841f0f2e660000 RBP: 00841f0f2e660000 R08: 00841f0f2e660000 R09: 000000841f0f2e66 R10: 0000000000841f0f R11: 2e66000000000084 R12: 000000841f0f2e66 R13: 0000000000841f0f R14: 2e66000000000084 R15: 1f0f2e6600000000 </TASK> ---[ end Kernel panic - not syncing: DMAR hardware is malfunctioning ]--- Fix this by reverting the timing change for ATS enablement introduced by the offending commit and restoring the previous behavior. Fixes: 5518f239aff1 ("iommu/vt-d: Move scalable mode ATS enablement to probe path") Reported-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Closes: https://lore.kernel.org/linux-iommu/01b9c72f-460d-4f77-b696-54c6825babc9@linux.intel.com/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250416073608.1799578-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihidPavel Paklov1-0/+8
There is a string parsing logic error which can lead to an overflow of hid or uid buffers. Comparing ACPIID_LEN against a total string length doesn't take into account the lengths of individual hid and uid buffers so the check is insufficient in some cases. For example if the length of hid string is 4 and the length of the uid string is 260, the length of str will be equal to ACPIID_LEN + 1 but uid string will overflow uid buffer which size is 256. The same applies to the hid string with length 13 and uid string with length 250. Check the length of hid and uid strings separately to prevent buffer overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Cc: stable@vger.kernel.org Signed-off-by: Pavel Paklov <Pavel.Paklov@cyberprotect.ru> Link: https://lore.kernel.org/r/20250325092259.392844-1-Pavel.Paklov@cyberprotect.ru Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefullyRobin Murphy1-3/+3
We've never supported StreamID aliasing between devices, and as such they will never have had functioning DMA, but this is not fatal to the SMMU itself. Although aliasing between hard-wired platform device StreamIDs would tend to raise questions about the whole system, in practice it's far more likely to occur relatively innocently due to legacy PCI bridges, where the underlying StreamID mappings are still perfectly reasonable. As such, return a more benign -ENODEV when failing probe for such an unsupported device (and log a more obvious error message), so that it doesn't break the entire SMMU probe now that bus_iommu_probe() runs in the right order and can propagate that error back. The end result is still that the device doesn't get an IOMMU group and probably won't work, same as before. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/39d54e49c8476efc4653e352150d44b185d6d50f.1744380554.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream idsNicolin Chen1-4/+15
ASPEED VGA card has two built-in devices: 0008:06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 06) 0008:07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) Its toplogy looks like this: +-[0008:00]---00.0-[01-09]--+-00.0-[02-09]--+-00.0-[03]----00.0 Sandisk Corp Device 5017 | +-01.0-[04]-- | +-02.0-[05]----00.0 NVIDIA Corporation Device | +-03.0-[06-07]----00.0-[07]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family | +-04.0-[08]----00.0 Renesas Technology Corp. uPD720201 USB 3.0 Host Controller | \-05.0-[09]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller \-00.1 PMC-Sierra Inc. Device 4028 The IORT logic populaties two identical IDs into the fwspec->ids array via DMA aliasing in iort_pci_iommu_init() called by pci_for_each_dma_alias(). Though the SMMU driver had been able to handle this situation since commit 563b5cbe334e ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs"), that got broken by the later commit cdf315f907d4 ("iommu/arm-smmu-v3: Maintain a SID->device structure"), which ended up with allocating separate streams with the same stuffing. On a kernel prior to v6.15-rc1, there has been an overlooked warning: pci 0008:07:00.0: vgaarb: setting as boot VGA device pci 0008:07:00.0: vgaarb: bridge control possible pci 0008:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none pcieport 0008:06:00.0: Adding to iommu group 14 ast 0008:07:00.0: stream 67328 already in tree <===== WARNING ast 0008:07:00.0: enabling device (0002 -> 0003) ast 0008:07:00.0: Using default configuration ast 0008:07:00.0: AST 2600 detected ast 0008:07:00.0: [drm] Using analog VGA ast 0008:07:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16 [drm] Initialized ast 0.1.0 for 0008:07:00.0 on minor 0 ast 0008:07:00.0: [drm] fb0: astdrmfb frame buffer device With v6.15-rc, since the commit bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path"), the error returned with the warning is moved to the SMMU device probe flow: arm_smmu_probe_device+0x15c/0x4c0 __iommu_probe_device+0x150/0x4f8 probe_iommu_group+0x44/0x80 bus_for_each_dev+0x7c/0x100 bus_iommu_probe+0x48/0x1a8 iommu_device_register+0xb8/0x178 arm_smmu_device_probe+0x1350/0x1db0 which then fails the entire SMMU driver probe: pci 0008:06:00.0: Adding to iommu group 21 pci 0008:07:00.0: stream 67328 already in tree arm-smmu-v3 arm-smmu-v3.9.auto: Failed to register iommu arm-smmu-v3 arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22 Since SMMU driver had been already expecting a potential duplicated Stream ID in arm_smmu_install_ste_for_dev(), change the arm_smmu_insert_master() routine to ignore a duplicated ID from the fwspec->sids array as well. Note: this has been failing the iommu_device_probe() since 2021, although a recent iommu commit in v6.15-rc1 that moves iommu_device_probe() started to fail the SMMU driver probe. Since nobody has cared about DMA Alias support, leave that as it was but fix the fundamental iommu_device_probe() breakage. Fixes: cdf315f907d4 ("iommu/arm-smmu-v3: Maintain a SID->device structure") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20250415185620.504299-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17iommu/arm-smmu-v3: Fix pgsize_bit for sva domainsBalbir Singh1-0/+6
UBSan caught a bug with IOMMU SVA domains, where the reported exponent value in __arm_smmu_tlb_inv_range() was >= 64. __arm_smmu_tlb_inv_range() uses the domain's pgsize_bitmap to compute the number of pages to invalidate and the invalidation range. Currently arm_smmu_sva_domain_alloc() does not setup the iommu domain's pgsize_bitmap. This leads to __ffs() on the value returning 64 and that leads to undefined behaviour w.r.t. shift operations Fix this by initializing the iommu_domain's pgsize_bitmap to PAGE_SIZE. Effectively the code needs to use the smallest page size for invalidation Cc: stable@vger.kernel.org Fixes: eb6c97647be2 ("iommu/arm-smmu-v3: Avoid constructing invalid range commands") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250412002354.3071449-1-balbirs@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17iommu/arm-smmu-v3: Add missing S2FWB feature detectionAneesh Kumar K.V (Arm)1-0/+2
Commit 67e4fe398513 ("iommu/arm-smmu-v3: Use S2FWB for NESTED domains") introduced S2FWB usage but omitted the corresponding feature detection. As a result, vIOMMU allocation fails on FVP in arm_vsmmu_alloc(), due to the following check: if (!arm_smmu_master_canwbs(master) && !(smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP); This patch adds the missing detection logic to prevent allocation failure when S2FWB is supported. Fixes: 67e4fe398513 ("iommu/arm-smmu-v3: Use S2FWB for NESTED domains") Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250408033351.1012411-1-aneesh.kumar@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-04-11iommu/tegra241-cmdqv: Fix warnings due to dmam_free_coherent()Nicolin Chen1-27/+5
Two WARNINGs are observed when SMMU driver rolls back upon failure: arm-smmu-v3.9.auto: Failed to register iommu arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22 ------------[ cut here ]------------ WARNING: CPU: 5 PID: 1 at kernel/dma/mapping.c:74 dmam_free_coherent+0xc0/0xd8 Call trace: dmam_free_coherent+0xc0/0xd8 (P) tegra241_vintf_free_lvcmdq+0x74/0x188 tegra241_cmdqv_remove_vintf+0x60/0x148 tegra241_cmdqv_remove+0x48/0xc8 arm_smmu_impl_remove+0x28/0x60 devm_action_release+0x1c/0x40 ------------[ cut here ]------------ 128 pages are still in use! WARNING: CPU: 16 PID: 1 at mm/page_alloc.c:6902 free_contig_range+0x18c/0x1c8 Call trace: free_contig_range+0x18c/0x1c8 (P) cma_release+0x154/0x2f0 dma_free_contiguous+0x38/0xa0 dma_direct_free+0x10c/0x248 dma_free_attrs+0x100/0x290 dmam_free_coherent+0x78/0xd8 tegra241_vintf_free_lvcmdq+0x74/0x160 tegra241_cmdqv_remove+0x98/0x198 arm_smmu_impl_remove+0x28/0x60 devm_action_release+0x1c/0x40 This is because the LVCMDQ queue memory are managed by devres, while that dmam_free_coherent() is called in the context of devm_action_release(). Jason pointed out that "arm_smmu_impl_probe() has mis-ordered the devres callbacks if ops->device_remove() is going to be manually freeing things that probe allocated": https://lore.kernel.org/linux-iommu/20250407174408.GB1722458@nvidia.com/ In fact, tegra241_cmdqv_init_structures() only allocates memory resources which means any failure that it generates would be similar to -ENOMEM, so there is no point in having that "falling back to standard SMMU" routine, as the standard SMMU would likely fail to allocate memory too. Remove the unwind part in tegra241_cmdqv_init_structures(), and return a proper error code to ask SMMU driver to call tegra241_cmdqv_remove() via impl_ops->device_remove(). Then, drop tegra241_vintf_free_lvcmdq() since devres will take care of that. Fixes: 483e0bd8883a ("iommu/tegra241-cmdqv: Do not allocate vcmdq until dma_set_mask_and_coherent") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250407201908.172225-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu: remove unneeded semicolonPei Xiao1-2/+2
cocci warnings: drivers/iommu/dma-iommu.c:1788:2-3: Unneeded semicolon so remove unneeded semicolon to fix cocci warnings. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/tencent_73EEE47E6ECCF538229C9B9E6A0272DA2B05@qq.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/mediatek: Fix NULL pointer deference in mtk_iommu_device_groupLouis-Alexis Eyraud1-13/+13
Currently, mtk_iommu calls during probe iommu_device_register before the hw_list from driver data is initialized. Since iommu probing issue fix, it leads to NULL pointer dereference in mtk_iommu_device_group when hw_list is accessed with list_first_entry (not null safe). So, change the call order to ensure iommu_device_register is called after the driver data are initialized. Fixes: 9e3a2a643653 ("iommu/mediatek: Adapt sharing and non-sharing pgtable case") Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Reviewed-by: Yong Wu <yong.wu@mediatek.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> # MT8183 Juniper, MT8186 Tentacruel Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com> Link: https://lore.kernel.org/r/20250403-fix-mtk-iommu-error-v2-1-fe8b18f8b0a8@collabora.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/exynos: Fix suspend/resume with IDENTITY domainMarek Szyprowski1-2/+2
Commit bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") changed the sequence of probing the SYSMMU controller devices and calls to arm_iommu_attach_device(), what results in resuming SYSMMU controller earlier, when it is still set to IDENTITY mapping. Such change revealed the bug in IDENTITY handling in the exynos-iommu driver. When SYSMMU controller is set to IDENTITY mapping, data->domain is NULL, so adjust checks in suspend & resume callbacks to handle this case correctly. Fixes: b3d14960e629 ("iommu/exynos: Implement an IDENTITY domain") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250401202731.2810474-1-m.szyprowski@samsung.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/ipmmu-vmsa: Register in a sensible orderRobin Murphy1-17/+10
IPMMU registers almost-initialised instances, but misses assigning the drvdata to make them fully functional, so initial calls back into ipmmu_probe_device() are likely to fail unnecessarily. Reorder this to work as it should, also pruning the long-out-of-date comment and adding the missing sysfs cleanup on error for good measure. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/53be6667544de65a15415b699e38a9a965692e45.1742481687.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu: Clear iommu-dma ops on cleanupRobin Murphy1-0/+3
If iommu_device_register() encounters an error, it can end up tearing down already-configured groups and default domains, however this currently still leaves devices hooked up to iommu-dma (and even historically the behaviour in this area was at best inconsistent across architectures/drivers...) Although in the case that an IOMMU is present whose driver has failed to probe, users cannot necessarily expect DMA to work anyway, it's still arguable that we should do our best to put things back as if the IOMMU driver was never there at all, and certainly the potential for crashing in iommu-dma itself is undesirable. Make sure we clean up the dev->dma_iommu flag along with everything else. Reported-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Closes: https://lore.kernel.org/all/CAGXv+5HJpTYmQ2h-GD7GjyeYT7bL9EBCvu0mz5LgpzJZtzfW0w@mail.gmail.com/ Tested-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/e788aa927f6d827dd4ea1ed608fada79f2bab030.1744284228.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/vt-d: Remove an unnecessary call set_dma_ops()Petr Tesarik1-1/+0
Do not touch per-device DMA ops when the driver has been converted to use the dma-iommu API. Fixes: c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops") Signed-off-by: Petr Tesarik <ptesarik@suse.com> Link: https://lore.kernel.org/r/20250403165605.278541-1-ptesarik@suse.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/vt-d: Wire up irq_ack() to irq_move_irq() for posted MSIsSean Christopherson1-14/+15
Set the posted MSI irq_chip's irq_ack() hook to irq_move_irq() instead of a dummy/empty callback so that posted MSIs process pending changes to the IRQ's SMP affinity. Failure to honor a pending set-affinity results in userspace being unable to change the effective affinity of the IRQ, as IRQD_SETAFFINITY_PENDING is never cleared and so irq_set_affinity_locked() always defers moving the IRQ. The issue is most easily reproducible by setting /proc/irq/xx/smp_affinity multiple times in quick succession, as only the first update is likely to be handled in process context. Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs") Cc: Robert Lippert <rlippert@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Wentao Yang <wentaoyang@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250321194249.1217961-1-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu: Fix crash in report_iommu_fault()Fedor Pchelkin1-1/+2
The following crash is observed while handling an IOMMU fault with a recent kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: ffff8c708299f700 PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3 Oops: Oops: 0011 [#1] SMP NOPTI CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20 PREEMPT(lazy) Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024 RIP: 0010:0xffff8c708299f700 Call Trace: <TASK> ? report_iommu_fault+0x78/0xd3 ? amd_iommu_report_page_fault+0x91/0x150 ? amd_iommu_int_thread+0x77/0x180 ? __pfx_irq_thread_fn+0x10/0x10 ? irq_thread_fn+0x23/0x60 ? irq_thread+0xf9/0x1e0 ? __pfx_irq_thread_dtor+0x10/0x10 ? __pfx_irq_thread+0x10/0x10 ? kthread+0xfc/0x240 ? __pfx_kthread+0x10/0x10 ? ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ? ret_from_fork_asm+0x1a/0x30 </TASK> report_iommu_fault() checks for an installed handler comparing the corresponding field to NULL. It can (and could before) be called for a domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA, specifically. Cookie is represented as a union so we may end up with a garbage value treated there if this happens for a domain with another cookie type. Formerly there were two exclusive cookie types in the union. IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault(). Call the fault handler only if the passed domain has a required cookie type. Found by Linux Verification Center (linuxtesting.org). Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250408213342.285955-1-pchelkin@ispras.ru Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner1-1/+1
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-01Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufdLinus Torvalds22-768/+1993
Pull iommufd updates from Jason Gunthorpe: "Two significant new items: - Allow reporting IOMMU HW events to userspace when the events are clearly linked to a device. This is linked to the VIOMMU object and is intended to be used by a VMM to forward HW events to the virtual machine as part of emulating a vIOMMU. ARM SMMUv3 is the first driver to use this mechanism. Like the existing fault events the data is delivered through a simple FD returning event records on read(). - PASID support in VFIO. The "Process Address Space ID" is a PCI feature that allows the device to tag all PCI DMA operations with an ID. The IOMMU will then use the ID to select a unique translation for those DMAs. This is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to manage each PASID entry. The support is generic so any VFIO user could attach any translation to a PASID, and the support should work on ARM SMMUv3 as well. AMD requires additional driver work. Some minor updates, along with fixes: - Prevent using nested parents with fault's, no driver support today - Put a single "cookie_type" value in the iommu_domain to indicate what owns the various opaque owner fields" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits) iommufd: Test attach before detaching pasid iommufd: Fix iommu_vevent_header tables markup iommu: Convert unreachable() to BUG() iommufd: Balance veventq->num_events inc/dec iommufd: Initialize the flags of vevent in iommufd_viommu_report_event() iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices ida: Add ida_find_first_range() iommufd/selftest: Add coverage for iommufd pasid attach/detach iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add a helper to get test device iommufd/selftest: Add set_dev_pasid in mock iommu iommufd: Allow allocating PASID-compatible domain iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support iommufd: Enforce PASID-compatible domain for RID iommufd: Support pasid attach/replace iommufd: Enforce PASID-compatible domain in PASID path iommufd/device: Add pasid_attach array to track per-PASID attach ...
2025-03-28iommufd: Test attach before detaching pasidYi Liu1-0/+7
Check if the pasid has been attached before going further in the detach path. This fixes a crash found by syzkaller. Add a selftest as well. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 668 Comm: repro Not tainted 6.14.0-next-20250325-eb4bc4b07f66 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org4 RIP: 0010:iommufd_hw_pagetable_detach+0x8a/0x4d0 Code: 00 00 00 44 89 ee 48 89 c7 48 89 75 c8 48 89 45 c0 e8 ca 55 17 02 48 89 c2 49 89 c4 48 b8 00 00 00b RSP: 0018:ffff888021b17b78 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff888014b5a000 RCX: ffff888021b17a64 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88801dad07fc RBP: ffff888021b17bc8 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: ffff88801dad0e58 R12: 0000000000000000 R13: 0000000000000001 R14: ffff888021b17e18 R15: ffff8880132d3008 FS: 00007fca52013600(0000) GS:ffff8880e3684000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200006c0 CR3: 00000000112d0005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> iommufd_device_detach+0x2a/0x2e0 iommufd_test+0x2f99/0x5cd0 iommufd_fops_ioctl+0x38e/0x520 __x64_sys_ioctl+0x1ba/0x220 x64_sys_call+0x122e/0x2150 do_syscall_64+0x6d/0x150 entry_SYSCALL_64_after_hwframe+0x76/0x7e Link: https://patch.msgid.link/r/20250328133448.22052-1-yi.l.liu@intel.com Reported-by: Lai Yi <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/linux-iommu/Z+X0tzxhiaupJT7b@ly-workstation Fixes: c0e301b2978d ("iommufd/device: Add pasid_attach array to track per-PASID attach") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28iommu: Convert unreachable() to BUG()Josh Poimboeuf1-2/+2
Bare unreachable() should be avoided as it generates undefined behavior, e.g. falling through to the next function. Use BUG() instead so the error is defined. Fixes the following warnings: drivers/iommu/dma-iommu.o: warning: objtool: iommu_dma_sw_msi+0x92: can't find jump dest instruction at .text+0x54d5 vmlinux.o: warning: objtool: iommu_dma_get_msi_page() falls through to next function __iommu_dma_unmap() Link: https://patch.msgid.link/r/0c801ae017ec078cacd39f8f0898fc7780535f85.1743053325.git.jpoimboe@kernel.org Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/314f8809-cd59-479b-97d7-49356bf1c8d1@infradead.org Reported-by: Paul E. McKenney <paulmck@kernel.org> Closes: https://lore.kernel.org/5dd1f35e-8ece-43b7-ad6d-86d02d2718f6@paulmck-laptop Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28iommufd: Balance veventq->num_events inc/decYi Liu1-1/+2
iommufd_veventq_fops_read() decrements veventq->num_events when a vevent is read out. However, the report path ony increments veventq->num_events for normal events. To be balanced, make the read path decrement num_events only for normal vevents. Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC") Link: https://patch.msgid.link/r/20250324120034.5940-3-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()Yi Liu1-1/+1
The vevent->header.flags is not initialized per allocation, hence the vevent read path may treat the vevent as lost_events_header wrongly. Use kzalloc() to alloc memory for new vevent. Fixes: e8e1ef9b77a7 ("iommufd/viommu: Add iommufd_viommu_report_event helper") Link: https://patch.msgid.link/r/20250324120034.5940-2-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28iommufd: Extend IOMMU_GET_HW_INFO to report PASID capabilityYi Liu1-1/+33
PASID usage requires PASID support in both device and IOMMU. Since the iommu drivers always enable the PASID capability for the device if it is supported, this extends the IOMMU_GET_HW_INFO to report the PASID capability to userspace. Also, enhances the selftest accordingly. Link: https://patch.msgid.link/r/20250321180143.8468-5-yi.l.liu@intel.com Cc: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-26Merge tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linuxLinus Torvalds30-734/+926
Pull iommu updates from Joerg Roedel: "Core iommufd dependencies from Jason: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API Core fixes for probe-issues from Robin Intel VT-d changes: - Checking for SVA support in domain allocation and attach paths - Move PCI ATS and PRI configuration into probe paths - Fix a pentential hang on reboot -f - Miscellaneous cleanups AMD-Vi changes: - Support for up to 2k IRQs per PCI device function - Set of smaller fixes ARM-SMMU changes: - SMMUv2 devicetree binding updates for Qualcomm implementations (QCS8300 GPU and MSM8937) - Clean up SMMUv2 runtime PM implementation to help with wider rework of pm_runtime_put_autosuspend() Rockchip driver changes: - Driver adjustments for recent DT probing changes S390 IOMMU changes: - Support for IOMMU passthrough Apple Dart changes: - Driver adjustments to meet ISP device requirements - Null-ptr deref fix - Disable subpage protection for DART 1" * tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits) iommu/vt-d: Fix possible circular locking dependency iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled iommu: apple-dart: fix potential null pointer deref iommu/rockchip: Retire global dma_dev workaround iommu/rockchip: Register in a sensible order iommu/rockchip: Allocate per-device data sensibly iommu/mediatek-v1: Support COMPILE_TEST iommu/amd: Enable support for up to 2K interrupts per function iommu/amd: Rename DTE_INTTABLEN* and MAX_IRQS_PER_TABLE macro iommu/amd: Replace slab cache allocator with page allocator iommu/amd: Introduce generic function to set multibit feature value iommu: Don't warn prematurely about dodgy probes iommu/arm-smmu: Set rpm auto_suspend once during probe dt-bindings: arm-smmu: Document QCS8300 GPU SMMU iommu: Get DT/ACPI parsing into the proper probe path iommu: Keep dev->iommu state consistent iommu: Resolve ops in iommu_init_device() iommu: Handle race with default domain setup iommu: Unexport iommu_fwspec_free() ...
2025-03-25Merge tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linuxLinus Torvalds1-4/+4
Pull hyperv updates from Wei Liu: - Add support for running as the root partition in Hyper-V (Microsoft Hypervisor) by exposing /dev/mshv (Nuno and various people) - Add support for CPU offlining in Hyper-V (Hamza Mahfooz) - Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael Kelley, Thorsten Blum) * tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: fix an indentation issue in mshyperv.h x86/hyperv: Add comments about hv_vpset and var size hypercall input args Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs hyperv: Add definitions for root partition driver to hv headers x86: hyperv: Add mshv_handler() irq handler and setup function Drivers: hv: Introduce per-cpu event ring tail Drivers: hv: Export some functions for use by root partition module acpi: numa: Export node_to_pxm() hyperv: Introduce hv_recommend_using_aeoi() arm64/hyperv: Add some missing functions to arm64 x86/mshyperv: Add support for extended Hyper-V features hyperv: Log hypercall status codes as strings x86/hyperv: Fix check of return value from snp_set_vmsa() x86/hyperv: Add VTL mode callback for restarting the system x86/hyperv: Add VTL mode emergency restart callback hyperv: Remove unused union and structs hyperv: Add CONFIG_MSHV_ROOT to gate root partition support hyperv: Change hv_root_partition into a function hyperv: Convert hypercall statuses to linux error codes drivers/hv: add CPU offlining support ...
2025-03-25iommufd/selftest: Add test ops to test pasid attach/detachYi Liu2-0/+188
This adds 4 test ops for pasid attach/replace/detach testing. There are ops to attach/detach pasid, and also op to check the attached hwpt of a pasid. Link: https://patch.msgid.link/r/20250321171940.7213-18-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/selftest: Add a helper to get test deviceYi Liu1-13/+23
There is need to get the selftest device (sobj->type == TYPE_IDEV) in multiple places, so have a helper to for it. Link: https://patch.msgid.link/r/20250321171940.7213-17-yi.l.liu@intel.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/selftest: Add set_dev_pasid in mock iommuYi Liu2-5/+36
The callback is needed to make pasid_attach/detach path complete for mock device. A nop is enough for set_dev_pasid. A MOCK_FLAGS_DEVICE_PASID is added to indicate a pasid-capable mock device for the pasid test cases. Other test cases will still create a non-pasid mock device. While the mock iommu always pretends to be pasid-capable. Link: https://patch.msgid.link/r/20250321171940.7213-16-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Allow allocating PASID-compatible domainYi Liu1-3/+4
The underlying infrastructure has supported the PASID attach and related enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This extends iommufd to support PASID compatible domain requested by userspace. Link: https://patch.msgid.link/r/20250321171940.7213-15-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID supportYi Liu2-2/+3
Intel iommu driver just treats it as a nop since Intel VT-d does not have special requirement on domains attached to either the PASID or RID of a PASID-capable device. Link: https://patch.msgid.link/r/20250321171940.7213-14-yi.l.liu@intel.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Enforce PASID-compatible domain for RIDYi Liu1-4/+22
Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to enforce the RID to use PASID-compatible domain if PASID has been attached, and vice versa. The PASID path has already enforced it. This adds the enforcement in the RID path. This enforcement requires a lock across the RID and PASID attach path, the idev->igroup->lock is used as both the RID and the PASID path holds it. Link: https://patch.msgid.link/r/20250321171940.7213-13-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Support pasid attach/replaceYi Liu2-26/+41
This extends the below APIs to support PASID. Device drivers to manage pasid attach/replace/detach. int iommufd_device_attach(struct iommufd_device *idev, ioasid_t pasid, u32 *pt_id); int iommufd_device_replace(struct iommufd_device *idev, ioasid_t pasid, u32 *pt_id); void iommufd_device_detach(struct iommufd_device *idev, ioasid_t pasid); The pasid operations share underlying attach/replace/detach infrastructure with the device operations, but still have some different implications: - no reserved region per pasid otherwise SVA architecture is already broken (CPU address space doesn't count device reserved regions); - accordingly no sw_msi trick; Cache coherency enforcement is still applied to pasid operations since it is about memory accesses post page table walking (no matter the walk is per RID or per PASID). Link: https://patch.msgid.link/r/20250321171940.7213-12-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Enforce PASID-compatible domain in PASID pathYi Liu3-0/+21
AMD IOMMU requires attaching PASID-compatible domains to PASID-capable devices. This includes the domains attached to RID and PASIDs. Related discussions in link [1] and [2]. ARM also has such a requirement, Intel does not need it, but can live up with it. Hence, iommufd is going to enforce this requirement as it is not harmful to vendors that do not need it. Mark the PASID-compatible domains and enforce it in the PASID path. [1] https://lore.kernel.org/linux-iommu/20240709182303.GK14050@ziepe.ca/ [2] https://lore.kernel.org/linux-iommu/20240822124433.GD3468552@ziepe.ca/ Link: https://patch.msgid.link/r/20250321171940.7213-11-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Add pasid_attach array to track per-PASID attachYi Liu2-20/+41
PASIDs of PASID-capable device can be attached to hwpt separately, hence a pasid array to track per-PASID attachment is necessary. The index IOMMU_NO_PASID is used by the RID path. Hence drop the igroup->attach. Link: https://patch.msgid.link/r/20250321171940.7213-10-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Replace device_list with device_arrayYi Liu1-19/+39
igroup->attach->device_list is used to track attached device of a group in the RID path. Such tracking is also needed in the PASID path in order to share path with the RID path. While there is only one list_head in the iommufd_device. It cannot work if the device has been attached in both RID path and PASID path. To solve it, replacing the device_list with an xarray. The attached iommufd_device is stored in the entry indexed by the idev->obj.id. Link: https://patch.msgid.link/r/20250321171940.7213-9-yi.l.liu@intel.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach structYi Liu2-23/+58
The igroup->hwpt and igroup->device_list are used to track the hwpt attach of a group in the RID path. While the coming PASID path also needs such tracking. To be prepared, wrap igroup->hwpt and igroup->device_list into attach struct which is allocated per attaching the first device of the group and freed per detaching the last device of the group. Link: https://patch.msgid.link/r/20250321171940.7213-8-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Add helper to detect the first attach of a groupYi Liu1-2/+9
The existing code detects the first attach by checking the igroup->device_list. However, the igroup->hwpt can also be used to detect the first attach. In future modifications, it is better to check the igroup->hwpt instead of the device_list. To improve readbility and also prepare for further modifications on this part, this adds a helper for it. Link: https://patch.msgid.link/r/20250321171940.7213-7-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Replace idev->igroup with local variableYi Liu1-20/+23
With more use of the fields of igroup, use a local vairable instead of using the idev->igroup heavily. No functional change expected. Link: https://patch.msgid.link/r/20250321171940.7213-6-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>