aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/iommu (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-21iommu/amd: fix a crash in iova_magazine_free_pfnsQian Cai1-1/+1
The commit b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method") incorrectly changed the checking from dma_ops_alloc_iova() in map_sg() causes a crash under memory pressure as dma_ops_alloc_iova() never return DMA_MAPPING_ERROR on failure but 0, so the error handling is all wrong. kernel BUG at drivers/iommu/iova.c:801! Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0 Call Trace: free_cpu_cached_iovas+0xbd/0x150 alloc_iova_fast+0x8c/0xba dma_ops_alloc_iova.isra.6+0x65/0xa0 map_sg+0x8c/0x2a0 scsi_dma_map+0xc6/0x160 pqi_aio_submit_io+0x1f6/0x440 [smartpqi] pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi] scsi_queue_rq+0x79c/0x1200 blk_mq_dispatch_rq_list+0x4dc/0xb70 blk_mq_sched_dispatch_requests+0x249/0x310 __blk_mq_run_hw_queue+0x128/0x200 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x522/0xa10 worker_thread+0x63/0x5b0 kthread+0x1d2/0x1f0 ret_from_fork+0x22/0x40 Fixes: b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-0/+1170
Pull virtio, vhost updates from Michael Tsirkin: "Fixes, features, performance: - new iommu device - vhost guest memory access using vmap (just meta-data for now) - minor fixes" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-mmio: add error check for platform_get_irq scsi: virtio_scsi: Use struct_size() helper iommu/virtio: Add event queue iommu/virtio: Add probe request iommu: Add virtio-iommu driver PCI: OF: Initialize dev->fwnode appropriately of: Allow the iommu-map property to omit untranslated devices dt-bindings: virtio: Add virtio-pci-iommu node dt-bindings: virtio-mmio: Add IOMMU description vhost: fix clang build warning vhost: access vq metadata through kernel virtual address vhost: factor out setting vring addr and num vhost: introduce helpers to get the size of metadata area vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() vhost: fine grain userspace memory accessors vhost: generalize adding used elem
2019-07-12Merge tag 'dma-mapping-5.3' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-10/+4
Pull dma-mapping updates from Christoph Hellwig: - move the USB special case that bounced DMA through a device bar into the USB code instead of handling it in the common DMA code (Laurentiu Tudor and Fredrik Noring) - don't dip into the global CMA pool for single page allocations (Nicolin Chen) - fix a crash when allocating memory for the atomic pool failed during boot (Florian Fainelli) - move support for MIPS-style uncached segments to the common code and use that for MIPS and nios2 (me) - make support for DMA_ATTR_NON_CONSISTENT and DMA_ATTR_NO_KERNEL_MAPPING generic (me) - convert nds32 to the generic remapping allocator (me) * tag 'dma-mapping-5.3' of git://git.infradead.org/users/hch/dma-mapping: (29 commits) dma-mapping: mark dma_alloc_need_uncached as __always_inline MIPS: only select ARCH_HAS_UNCACHED_SEGMENT for non-coherent platforms usb: host: Fix excessive alignment restriction for local memory allocations lib/genalloc.c: Add algorithm, align and zeroed family of DMA allocators nios2: use the generic uncached segment support in dma-direct nds32: use the generic remapping allocator for coherent DMA allocations arc: use the generic remapping allocator for coherent DMA allocations dma-direct: handle DMA_ATTR_NO_KERNEL_MAPPING in common code dma-direct: handle DMA_ATTR_NON_CONSISTENT in common code dma-mapping: add a dma_alloc_need_uncached helper openrisc: remove the partial DMA_ATTR_NON_CONSISTENT support arc: remove the partial DMA_ATTR_NON_CONSISTENT support arm-nommu: remove the partial DMA_ATTR_NON_CONSISTENT support ARM: dma-mapping: allow larger DMA mask than supported dma-mapping: truncate dma masks to what dma_addr_t can hold iommu/dma: Apply dma_{alloc,free}_contiguous functions dma-remap: Avoid de-referencing NULL atomic_pool MIPS: use the generic uncached segment support in dma-direct dma-direct: provide generic support for uncached kernel segments au1100fb: fix DMA API abuse ...
2019-07-12Merge tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds2-2/+2
Pull driver core and debugfs updates from Greg KH: "Here is the "big" driver core and debugfs changes for 5.3-rc1 It's a lot of different patches, all across the tree due to some api changes and lots of debugfs cleanups. Other than the debugfs cleanups, in this set of changes we have: - bus iteration function cleanups - scripts/get_abi.pl tool to display and parse Documentation/ABI entries in a simple way - cleanups to Documenatation/ABI/ entries to make them parse easier due to typos and other minor things - default_attrs use for some ktype users - driver model documentation file conversions to .rst - compressed firmware file loading - deferred probe fixes All of these have been in linux-next for a while, with a bunch of merge issues that Stephen has been patient with me for" * tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits) debugfs: make error message a bit more verbose orangefs: fix build warning from debugfs cleanup patch ubifs: fix build warning after debugfs cleanup patch driver: core: Allow subsystems to continue deferring probe drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT arch_topology: Remove error messages on out-of-memory conditions lib: notifier-error-inject: no need to check return value of debugfs_create functions swiotlb: no need to check return value of debugfs_create functions ceph: no need to check return value of debugfs_create functions sunrpc: no need to check return value of debugfs_create functions ubifs: no need to check return value of debugfs_create functions orangefs: no need to check return value of debugfs_create functions nfsd: no need to check return value of debugfs_create functions lib: 842: no need to check return value of debugfs_create functions debugfs: provide pr_fmt() macro debugfs: log errors when something goes wrong drivers: s390/cio: Fix compilation warning about const qualifiers drivers: Add generic helper to match by of_node driver_find_device: Unify the match function with class_find_device() bus_find_device: Unify the match callback with class_find_device ...
2019-07-04Merge branches 'x86/vt-d', 'x86/amd', 'arm/smmu', 'arm/omap', 'generic-dma-ops' and 'core' into nextJoerg Roedel38-639/+913
2019-07-04iommu/omap: No need to check return value of debugfs_create functionsGreg Kroah-Hartman1-29/+6
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Joerg Roedel <joro@8bytes.org> Cc: iommu@lists.linux-foundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-04iommu/arm-smmu-v3: Invalidate ATC when detaching a deviceJean-Philippe Brucker1-1/+4
We make the invalid assumption in arm_smmu_detach_dev() that the ATC is clear after calling pci_disable_ats(). For one thing, only enabling the PCIe ATS capability constitutes an implicit invalidation event, so the comment was wrong. More importantly, the ATS capability isn't necessarily disabled by pci_disable_ats() in a PF, if the associated VFs have ATS enabled. Explicitly invalidate all ATC entries in arm_smmu_detach_dev(). The endpoint cannot form new ATC entries because STE.EATS is clear. Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS") Reported-by: Manoj Kumar <Manoj.Kumar3@arm.com> Reported-by: Robin Murphy <Robin.Murphy@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-02iommu/arm-smmu-v3: Fix compilation when CONFIG_CMA=nWill Deacon1-0/+6
When compiling a kernel without support for CMA, CONFIG_CMA_ALIGNMENT is not defined which results in the following build failure: In file included from ./include/linux/list.h:9:0 from ./include/linux/kobject.h:19, from ./include/linux/of.h:17 from ./include/linux/irqdomain.h:35, from ./include/linux/acpi.h:13, from drivers/iommu/arm-smmu-v3.c:12: drivers/iommu/arm-smmu-v3.c: In function ‘arm_smmu_device_hw_probe’: drivers/iommu/arm-smmu-v3.c:194:40: error: ‘CONFIG_CMA_ALIGNMENT’ undeclared (first use in this function) #define Q_MAX_SZ_SHIFT (PAGE_SHIFT + CONFIG_CMA_ALIGNMENT) Fix the breakage by capping the maximum queue size based on MAX_ORDER when CMA is not enabled. Reported-by: Zhangshaokun <zhangshaokun@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org> Tested-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-01Merge branch 'arm/renesas' into arm/smmuJoerg Roedel1-62/+125
2019-07-01iommu/vt-d: Cleanup unused variableJacob Pan1-2/+2
Linux IRQ number virq is not used in IRTE allocation. Remove it. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-01iommu/amd: Flush not present cache in iommu_map_pageTom Murphy1-4/+16
check if there is a not-present cache present and flush it if there is. Signed-off-by: Tom Murphy <murphyt7@tcd.ie> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-01iommu/amd: Only free resources once on init errorKevin Mitchell1-14/+13
When amd_iommu=off was specified on the command line, free_X_resources functions were called immediately after early_amd_iommu_init. They were then called again when amd_iommu_init also failed (as expected). Instead, call them only once: at the end of state_next() whenever there's an error. These functions should be safe to call any time and any number of times. However, since state_next is never called again in an error state, the cleanup will only ever be run once. This also ensures that cleanup code is run as soon as possible after an error is detected rather than waiting for amd_iommu_init() to be called. Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-01iommu/amd: Move gart fallback to amd_iommu_initKevin Mitchell1-9/+10
The fallback to the GART driver in the case amd_iommu doesn't work was executed in a function called free_iommu_resources, which didn't really make sense. This was even being called twice if amd_iommu=off was specified on the command line. The only complication is that it needs to be verified that amd_iommu has fully relinquished control by calling free_iommu_resources and emptying the amd_iommu_list. Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-01iommu/amd: Make iommu_disable saferKevin Mitchell1-0/+3
Make it safe to call iommu_disable during early init error conditions before mmio_base is set, but after the struct amd_iommu has been added to the amd_iommu_list. For example, this happens if firmware fails to fill in mmio_phys in the ACPI table leading to a NULL pointer dereference in iommu_feature_disable. Fixes: 2c0ae1720c09c ('iommu/amd: Convert iommu initialization to state machine') Signed-off-by: Kevin Mitchell <kevmitch@arista.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-07-01Merge branch 'for-joerg/arm-smmu/updates' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmuJoerg Roedel5-44/+76
2019-06-25iommu/io-pgtable: Support non-coherent page tablesBjorn Andersson2-5/+14
Describe the memory related to page table walks as non-cacheable for iommu instances that are not DMA coherent. Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> [will: Use cfg->coherent_walk, fix arm-v7s, ensure outer-shareable for NC] Signed-off-by: Will Deacon <will@kernel.org>
2019-06-25iommu/io-pgtable: Replace IO_PGTABLE_QUIRK_NO_DMA with specific flagWill Deacon5-22/+16
IO_PGTABLE_QUIRK_NO_DMA is a bit of a misnomer, since it's really just an indication of whether or not the page-table walker for the IOMMU is coherent with the CPU caches. Since cache coherency is more than just a quirk, replace the flag with its own field in the io_pgtable_cfg structure. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-06-24Merge tag 'v5.2-rc6' into generic-dma-opsJoerg Roedel39-395/+54
Linux 5.2-rc6
2019-06-24driver_find_device: Unify the match function with class_find_device()Suzuki K Poulose2-2/+2
The driver_find_device() accepts a match function pointer to filter the devices for lookup, similar to bus/class_find_device(). However, there is a minor difference in the prototype for the match parameter for driver_find_device() with the now unified version accepted by {bus/class}_find_device(), where it doesn't accept a "const" qualifier for the data argument. This prevents us from reusing the generic match functions for driver_find_device(). For this reason, change the prototype of the driver_find_device() to make the "match" parameter in line with {bus/class}_find_device() and adjust its callers to use the const qualifier. Also, we could now promote the "data" parameter to const as we pass it down as a const parameter to the match functions. Cc: Corey Minyard <minyard@acm.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Sebastian Ott <sebott@linux.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Nehal Shah <nehal-bakulchandra.shah@amd.com> Cc: Shyam Sundar S K <shyam-sundar.s-k@amd.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22Merge tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds1-4/+3
Pull iommu fix from Joerg Roedel: "Revert a commit from the previous pile of fixes which causes new lockdep splats. It is better to revert it for now and work on a better and more well tested fix" * tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
2019-06-22Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"Peter Xu1-4/+3
This reverts commit 7560cc3ca7d9d11555f80c830544e463fcdb28b8. With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt: ====================================================== WARNING: possible circular locking dependency detected 5.2.0-rc5 #78 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: 00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0 but task is already holding lock: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (device_domain_lock){....}: _raw_spin_lock_irqsave+0x3c/0x50 dmar_insert_one_dev_info+0xbb/0x510 domain_add_dev_info+0x50/0x90 dev_prepare_static_identity_mapping+0x30/0x68 intel_iommu_init+0xddd/0x1422 pci_iommu_init+0x16/0x3f do_one_initcall+0x5d/0x2b4 kernel_init_freeable+0x218/0x2c1 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 -> #0 (&(&iommu->lock)->rlock){+.+.}: lock_acquire+0x9e/0x170 _raw_spin_lock+0x25/0x30 domain_context_mapping_one+0xa5/0x4e0 pci_for_each_dma_alias+0x30/0x140 dmar_insert_one_dev_info+0x3b2/0x510 domain_add_dev_info+0x50/0x90 dev_prepare_static_identity_mapping+0x30/0x68 intel_iommu_init+0xddd/0x1422 pci_iommu_init+0x16/0x3f do_one_initcall+0x5d/0x2b4 kernel_init_freeable+0x218/0x2c1 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(device_domain_lock); lock(&(&iommu->lock)->rlock); lock(device_domain_lock); lock(&(&iommu->lock)->rlock); *** DEADLOCK *** 2 locks held by swapper/0/1: #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422 #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0 stack backtrace: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78 Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018 Call Trace: dump_stack+0x85/0xc0 print_circular_bug.cold.57+0x15c/0x195 __lock_acquire+0x152a/0x1710 lock_acquire+0x9e/0x170 ? domain_context_mapping_one+0xa5/0x4e0 _raw_spin_lock+0x25/0x30 ? domain_context_mapping_one+0xa5/0x4e0 domain_context_mapping_one+0xa5/0x4e0 ? domain_context_mapping_one+0x4e0/0x4e0 pci_for_each_dma_alias+0x30/0x140 dmar_insert_one_dev_info+0x3b2/0x510 domain_add_dev_info+0x50/0x90 dev_prepare_static_identity_mapping+0x30/0x68 intel_iommu_init+0xddd/0x1422 ? printk+0x58/0x6f ? lockdep_hardirqs_on+0xf0/0x180 ? do_early_param+0x8e/0x8e ? e820__memblock_setup+0x63/0x63 pci_iommu_init+0x16/0x3f do_one_initcall+0x5d/0x2b4 ? do_early_param+0x8e/0x8e ? rcu_read_lock_sched_held+0x55/0x60 ? do_early_param+0x8e/0x8e kernel_init_freeable+0x218/0x2c1 ? rest_init+0x230/0x230 kernel_init+0xa/0x100 ret_from_fork+0x3a/0x50 domain_context_mapping_one() is taking device_domain_lock first then iommu lock, while dmar_insert_one_dev_info() is doing the reverse. That should be introduced by commit: 7560cc3ca7d9 ("iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock", 2019-05-27) So far I still cannot figure out how the previous deadlock was triggered (I cannot find iommu lock taken before calling of iommu_flush_dev_iotlb()), however I'm pretty sure that that change should be incomplete at least because it does not fix all the places so we're still taking the locks in different orders, while reverting that commit is very clean to me so far that we should always take device_domain_lock first then the iommu lock. We can continue to try to find the real culprit mentioned in 7560cc3ca7d9, but for now I think we should revert it to fix current breakage. CC: Joerg Roedel <joro@8bytes.org> CC: Lu Baolu <baolu.lu@linux.intel.com> CC: dave.jiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner8-32/+8
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner5-60/+5
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-18iommu/io-pgtable-arm: Add support to use system cacheVivek Gautam1-1/+8
Few Qualcomm platforms such as, sdm845 have an additional outer cache called as System cache, aka. Last level cache (LLC) that allows non-coherent devices to upgrade to using caching. This cache sits right before the DDR, and is tightly coupled with the memory controller. The clients using this cache request their slices from this system cache, make it active, and can then start using it. There is a fundamental assumption that non-coherent devices can't access caches. This change adds an exception where they *can* use some level of cache despite still being non-coherent overall. The coherent devices that use cacheable memory, and CPU make use of this system cache by default. Looking at memory types, we have following - a) Normal uncached :- MAIR 0x44, inner non-cacheable, outer non-cacheable; b) Normal cached :- MAIR 0xff, inner read write-back non-transient, outer read write-back non-transient; attribute setting for coherenet I/O devices. and, for non-coherent i/o devices that can allocate in system cache another type gets added - c) Normal sys-cached :- MAIR 0xf4, inner non-cacheable, outer read write-back non-transient Coherent I/O devices use system cache by marking the memory as normal cached. Non-coherent I/O devices should mark the memory as normal sys-cached in page tables to use system cache. Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-18iommu/arm-smmu-v3: Increase maximum size of queuesWill Deacon1-16/+38
We've been artificially limiting the size of our queues to 4k so that we don't end up allocating huge amounts of physically-contiguous memory at probe time. However, 4k is only enough for 256 commands in the command queue, so instead let's try to allocate the largest queue that the SMMU supports, retrying with a smaller size if the allocation fails. The caveat here is that we have to limit our upper bound based on CONFIG_CMA_ALIGNMENT to ensure that our queue allocations remain natually aligned, which is required by the SMMU architecture. Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-18iommu/vt-d: Silence a variable set but not usedQian Cai1-1/+2
The commit "iommu/vt-d: Probe DMA-capable ACPI name space devices" introduced a compilation warning due to the "iommu" variable in for_each_active_iommu() but never used the for each element, i.e, "drhd->iommu". drivers/iommu/intel-iommu.c: In function 'probe_acpi_namespace_devices': drivers/iommu/intel-iommu.c:4639:22: warning: variable 'iommu' set but not used [-Wunused-but-set-variable] struct intel_iommu *iommu; Silence the warning the same way as in the commit d3ed71e5cc50 ("drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-18iommu/vt-d: Remove an unused variable "length"Qian Cai1-3/+0
The linux-next commit "iommu/vt-d: Duplicate iommu_resv_region objects per device list" [1] left out an unused variable, drivers/iommu/intel-iommu.c: In function 'dmar_parse_one_rmrr': drivers/iommu/intel-iommu.c:4014:9: warning: variable 'length' set but not used [-Wunused-but-set-variable] [1] https://lore.kernel.org/patchwork/patch/1083073/ Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-18iommu: Fix integer truncationArnd Bergmann1-2/+2
On 32-bit architectures, phys_addr_t may be different from dma_add_t, both smaller and bigger. This can lead to an overflow during an assignment that clang warns about: drivers/iommu/dma-iommu.c:230:10: error: implicit conversion from 'dma_addr_t' (aka 'unsigned long long') to 'phys_addr_t' (aka 'unsigned int') changes value from 18446744073709551615 to 4294967295 [-Werror,-Wconstant-conversion] Use phys_addr_t here because that is the type that the variable was declared as. Fixes: aadad097cd46 ("iommu/dma: Reserve IOVA for PCIe inaccessible DMA address") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-14Merge tag 'iommu-fixes-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds4-8/+18
Pull iommu fixes from Joerg Roedel: - three fixes for Intel VT-d to fix a potential dead-lock, a formatting fix and a bit setting fix - one fix for the ARM-SMMU to make it work on some platforms with sub-optimal SMMU emulation * tag 'iommu-fixes-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/arm-smmu: Avoid constant zero in TLBI writes iommu/vt-d: Set the right field for Page Walk Snoop iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock iommu: Add missing new line for dma type
2019-06-14iommu/dma: Apply dma_{alloc,free}_contiguous functionsNicolin Chen1-10/+4
This patch replaces dma_{alloc,release}_from_contiguous() with dma_{alloc,free}_contiguous() to simplify those function calls. Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-06-12iommu/vt-d: Consolidate domain_init() to avoid duplicationLu Baolu1-87/+36
The domain_init() and md_domain_init() do almost the same job. Consolidate them to avoid duplication. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Cleanup after delegating DMA domain to generic iommuSai Praneeth Prakhya1-55/+0
[No functional changes] 1. Starting with commit df4f3c603aeb ("iommu/vt-d: Remove static identity map code") there are no callers for iommu_prepare_rmrr_dev() but the implementation of the function still exists, so remove it. Also, as a ripple effect remove get_domain_for_dev() and iommu_prepare_identity_map() because they aren't being used either. 2. Remove extra new line in couple of places. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()Lu Baolu1-0/+2
The drhd and device scope list should be iterated with the iommu global lock held. Otherwise, a suspicious RCU usage message will be displayed. [ 3.695886] ============================= [ 3.695917] WARNING: suspicious RCU usage [ 3.695950] 5.2.0-rc2+ #2467 Not tainted [ 3.695981] ----------------------------- [ 3.696014] drivers/iommu/intel-iommu.c:4569 suspicious rcu_dereference_check() usage! [ 3.696069] other info that might help us debug this: [ 3.696126] rcu_scheduler_active = 2, debug_locks = 1 [ 3.696173] no locks held by swapper/0/1. [ 3.696204] stack backtrace: [ 3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467 [ 3.696370] Call Trace: [ 3.696404] dump_stack+0x85/0xcb [ 3.696441] intel_iommu_init+0x128c/0x13ce [ 3.696478] ? kmem_cache_free+0x16b/0x2c0 [ 3.696516] ? __fput+0x14b/0x270 [ 3.696550] ? __call_rcu+0xb7/0x300 [ 3.696583] ? get_max_files+0x10/0x10 [ 3.696631] ? set_debug_rodata+0x11/0x11 [ 3.696668] ? e820__memblock_setup+0x60/0x60 [ 3.696704] ? pci_iommu_init+0x16/0x3f [ 3.696737] ? set_debug_rodata+0x11/0x11 [ 3.696770] pci_iommu_init+0x16/0x3f [ 3.696805] do_one_initcall+0x5d/0x2e4 [ 3.696844] ? set_debug_rodata+0x11/0x11 [ 3.696880] ? rcu_read_lock_sched_held+0x6b/0x80 [ 3.696924] kernel_init_freeable+0x1f0/0x27c [ 3.696961] ? rest_init+0x260/0x260 [ 3.696997] kernel_init+0xa/0x110 [ 3.697028] ret_from_fork+0x3a/0x50 Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Allow DMA domain attaching to rmrr locked deviceLu Baolu1-1/+2
We don't allow a device to be assigned to user level when it is locked by any RMRR's. Hence, intel_iommu_attach_device() will return error if a domain of type IOMMU_DOMAIN_UNMANAGED is about to attach to a device locked by rmrr. But this doesn't apply to a domain of type other than IOMMU_DOMAIN_UNMANAGED. This adds a check to fix this. Fixes: fa954e6831789 ("iommu/vt-d: Delegate the dma domain to upper layer") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reported-and-tested-by: Qian Cai <cai@lca.pw> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Don't enable iommu's which have been ignoredLu Baolu1-2/+7
The iommu driver will ignore some iommu units if there's no device under its scope or those devices have been explicitly set to bypass the DMA translation. Don't enable those iommu units, otherwise the devices under its scope won't work. Fixes: d8190dc638866 ("iommu/vt-d: Enable DMA remapping after rmrr mapped") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Set domain type for a private domainLu Baolu1-0/+2
Otherwise, domain_get_iommu() will be broken. Fixes: 942067f1b6b97 ("iommu/vt-d: Identify default domains replaced with private") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Don't return error when device gets right domainLu Baolu1-6/+0
If a device gets a right domain in add_device ops, it shouldn't return error. Fixes: 942067f1b6b97 ("iommu/vt-d: Identify default domains replaced with private") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Differentiate relaxable and non relaxable RMRRsEric Auger1-15/+39
Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory region type, let's report USB and GFX RMRRs as relaxable ones. We introduce a new device_rmrr_is_relaxable() helper to check whether the rmrr belongs to the relaxable category. This allows to have a finer reporting at IOMMU API level of reserved memory regions. This will be exploitable by VFIO to define the usable IOVA range and detect potential conflicts between the guest physical address space and host reserved regions. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regionsEric Auger1-5/+7
Introduce a new type for reserved region. This corresponds to directly mapped regions which are known to be relaxable in some specific conditions, such as device assignment use case. Well known examples are those used by USB controllers providing PS/2 keyboard emulation for pre-boot BIOS and early BOOT or RMRRs associated to IGD working in legacy mode. Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow RMRR on graphics devices too"), those regions are currently considered "safe" with respect to device assignment use case which requires a non direct mapping at IOMMU physical level (RAM GPA -> HPA mapping). Those RMRRs currently exist and sometimes the device is attempting to access it but this has not been considered an issue until now. However at the moment, iommu_get_group_resv_regions() is not able to make any difference between directly mapped regions: those which must be absolutely enforced and those like above ones which are known as relaxable. This is a blocker for reporting severe conflicts between non relaxable RMRRs (like MSI doorbells) and guest GPA space. With this new reserved region type we will be able to use iommu_get_group_resv_regions() to enumerate the IOVA space that is usable through the IOMMU API without introducing regressions with respect to existing device assignment use cases (USB and IGD). Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regionsEric Auger1-1/+2
In the case the RMRR device scope is a PCI-PCI bridge, let's check the device belongs to the PCI sub-hierarchy. Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Handle RMRR with PCI bridge device scopesEric Auger1-1/+2
When reading the vtd specification and especially the Reserved Memory Region Reporting Structure chapter, it is not obvious a device scope element cannot be a PCI-PCI bridge, in which case all downstream ports are likely to access the reserved memory region. Let's handle this case in device_has_rmrr. Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from being placed into SI Domain") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Introduce is_downstream_to_pci_bridge helperEric Auger1-8/+29
Several call sites are about to check whether a device belongs to the PCI sub-hierarchy of a candidate PCI-PCI bridge. Introduce an helper to perform that check. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/vt-d: Duplicate iommu_resv_region objects per device listEric Auger1-17/+17
intel_iommu_get_resv_regions() aims to return the list of reserved regions accessible by a given @device. However several devices can access the same reserved memory region and when building the list it is not safe to use a single iommu_resv_region object, whose container is the RMRR. This iommu_resv_region must be duplicated per device reserved region list. Let's remove the struct iommu_resv_region from the RMRR unit and allocate the iommu_resv_region directly in intel_iommu_get_resv_regions(). We hold the dmar_global_lock instead of the rcu-lock to allow sleeping. Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu: Fix a leak in iommu_insert_resv_regionEric Auger1-3/+5
In case we expand an existing region, we unlink this latter and insert the larger one. In that case we should free the original region after the insertion. Also we can immediately return. Fixes: 6c65fb318e8b ("iommu: iommu_get_group_resv_regions") Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu: Add recoverable fault reportingJean-Philippe Brucker1-2/+92
Some IOMMU hardware features, for example PCI PRI and Arm SMMU Stall, enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page Requests and Stall events through the new fault reporting API. The consumer of the fault can be either an I/O page fault handler in the host, or a guest OS. Once handled, the fault must be completed by sending a page response back to the IOMMU. Add an iommu_page_response() function to complete a page fault. There are two ways to extend the userspace API: * Add a field to iommu_page_response and a flag to iommu_page_response::flags describing the validity of this field. * Introduce a new iommu_page_response_X structure with a different version number. The kernel must then support both versions. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu: Introduce device fault report APIJacob Pan1-3/+143
Traditionally, device specific faults are detected and handled within their own device drivers. When IOMMU is enabled, faults such as DMA related transactions are detected by IOMMU. There is no generic reporting mechanism to report faults back to the in-kernel device driver or the guest OS in case of assigned devices. This patch introduces a registration API for device specific fault handlers. This differs from the existing iommu_set_fault_handler/ report_iommu_fault infrastructures in several ways: - it allows to report more sophisticated fault events (both unrecoverable faults and page request faults) due to the nature of the iommu_fault struct - it is device specific and not domain specific. The current iommu_report_device_fault() implementation only handles the "shoot and forget" unrecoverable fault case. Handling of page request faults or stalled faults will come later. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-12iommu/arm-smmu: Avoid constant zero in TLBI writesRobin Murphy1-3/+12
Apparently, some Qualcomm arm64 platforms which appear to expose their SMMU global register space are still, in fact, using a hypervisor to mediate it by trapping and emulating register accesses. Sadly, some deployed versions of said trapping code have bugs wherein they go horribly wrong for stores using r31 (i.e. XZR/WZR) as the source register. While this can be mitigated for GCC today by tweaking the constraints for the implementation of writel_relaxed(), to avoid any potential arms race with future compilers more aggressively optimising register allocation, the simple way is to just remove all the problematic constant zeros. For the write-only TLB operations, the actual value is irrelevant anyway and any old nearby variable will provide a suitable GPR to encode. The one point at which we really do need a zero to clear a context bank happens before any of the TLB maintenance where crashes have been reported, so is apparently not a problem... :/ Reported-by: AngeloGioacchino Del Regno <kholk11@gmail.com> Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> Acked-by: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-06iommu/virtio: Add event queueJean-Philippe Brucker1-9/+106
The event queue offers a way for the device to report access faults from endpoints. It is implemented on virtqueue #1. Whenever the host needs to signal a fault, it fills one of the buffers offered by the guest and interrupts it. Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-06iommu/virtio: Add probe requestJean-Philippe Brucker1-6/+151
When the device offers the probe feature, send a probe request for each device managed by the IOMMU. Extract RESV_MEM information. When we encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region. This will tell other subsystems that there is no need to map the MSI doorbell in the virtio-iommu, because MSIs bypass it. Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-06iommu: Add virtio-iommu driverJean-Philippe Brucker3-0/+928
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU requests such as map/unmap over virtio transport without emulating page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP requests. The bulk of the code transforms calls coming from the IOMMU API into corresponding virtio requests. Mappings are kept in an interval tree instead of page tables. A little more work is required for modular and x86 support, so for the moment the driver depends on CONFIG_VIRTIO=y and CONFIG_ARM64. Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>