Age | Commit message (Collapse) | Author | Files | Lines |
|
While mips might architecturally have the uncached segment all the time,
the infrastructure to use it is only need on platforms where DMA is
at least partially incoherent. Only select it for those configuration
to fix a build failure as the arch_dma_prep_coherent symbol is also only
provided for non-coherent platforms.
Fixes: 2e96e04d25ca ("MIPS: use the generic uncached segment support in dma-direct")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
|
|
The PAGE_SHIFT alignment restriction to devm_gen_pool_create() quickly
exhaust local memory because most allocations are much smaller than
PAGE_SIZE. This causes USB device failures such as
usb 1-2.1: reset full-speed USB device number 4 using sm501-usb
sd 1:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x03 driverbyte=0x00
sd 1:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 08 7c 00 00 f0 00
print_req_error: I/O error, dev sda, sector 2172 flags 80700
when trying to boot from the SM501 USB controller on SH4 with QEMU.
Align allocations as required but not necessarily much more than that.
The HCCA, TD and ED structures align with 256, 32 and 16 byte memory
boundaries, as specified by the Open HCI[1]. The min_alloc_order argument
to devm_gen_pool_create is now somewhat arbitrarily set to 4 (16 bytes).
Perhaps it could be somewhat lower for general buffer allocations.
Reference:
[1] "Open Host Controller Interface Specification for USB",
release 1.0a, Compaq, Microsoft, National Semiconductor, 1999,
pp. 16, 19, 33.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Fredrik Noring <noring@nocrew.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Provide the algorithm option to DMA allocators as well, along with
convenience variants for zeroed and aligned memory. The following
four functions are added:
- gen_pool_dma_alloc_algo()
- gen_pool_dma_alloc_align()
- gen_pool_dma_zalloc_algo()
- gen_pool_dma_zalloc_align()
Signed-off-by: Fredrik Noring <noring@nocrew.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Stop providing our own arch alloc/free hooks and just expose the segment
offset and use the generic dma-direct allocator.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
|
|
Replace the code that sets up uncached PTEs with the generic vmap based
remapping code. It also provides an atomic pool for allocations from
non-blocking context, which we not properly supported by the existing
nds32 code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Greentime Hu <greentime@andestech.com>
Reviewed-by: Greentime Hu <greentime@andestech.com>
|
|
Replace the code that sets up uncached PTEs with the generic vmap based
remapping code. It also provides an atomic pool for allocations from
non-blocking context, which we not properly supported by the existing
arc code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Evgeniy Paltsev <paltsev@synopsys.com>
Tested-by: Evgeniy Paltsev <paltsev@synopsys.com>
|
|
DMA_ATTR_NO_KERNEL_MAPPING is generally implemented by allocating
normal cacheable pages or CMA memory, and then returning the page
pointer as the opaque handle. Lift that code from the xtensa and
generic dma remapping implementations into the generic dma-direct
code so that we don't even call arch_dma_alloc for these allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Only call into arch_dma_alloc if we require an uncached mapping,
and remove the parisc code manually doing normal cached
DMA_ATTR_NON_CONSISTENT allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
|
|
Check if we need to allocate uncached memory for a device given the
allocation flags. Switch over the uncached segment check to this helper
to deal with architectures that do not support the dma_cache_sync
operation and thus should not returned cacheable memory for
DMA_ATTR_NON_CONSISTENT allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The openrisc DMA code supports DMA_ATTR_NON_CONSISTENT allocations, but
does not provide a cache_sync operation. This means any user of it
will never be able to actually transfer cache ownership and thus cause
coherency bugs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Stafford Horne <shorne@gmail.com>
|
|
The arc DMA code supports DMA_ATTR_NON_CONSISTENT allocations, but does
not provide a cache_sync operation. This means any user of it will
never be able to actually transfer cache ownership and thus cause
coherency bugs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Evgeniy Paltsev <paltsev@synopsys.com>
Tested-by: Evgeniy Paltsev <paltsev@synopsys.com>
|
|
The arm-nommu DMA code supports DMA_ATTR_NON_CONSISTENT allocations, but
does not provide a cache_sync operation. This means any user of it
will never be able to actually transfer cache ownership and thus cause
coherency bugs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
|
|
Since the Linux 5.1 merge window we allow drivers to just set the
largest DMA mask they support instead of falling back to smaller ones.
But I forgot to remove a check that prohibits this behavior in the
arm DMA code, as it is rather hidden. There is not reason for this check
as the code will do the right thing for a "too large" DMA mask, so
just remove it.
Fixes: 9eb9e96e97b3 ("Documentation/DMA-API-HOWTO: update dma_mask sections")
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The dma masks in struct device are always 64-bits wide. But for builds
using a 32-bit dma_addr_t we need to ensure we don't store an
unsupportable value. Before Linux 5.0 this was handled at least by
the ARM dma mapping code by never allowing to set a larger dma_mask,
but these days we allow the driver to just set the largest supported
value and never fall back to a smaller one. Ensure this always works
by truncating the value.
Fixes: 9eb9e96e97b3 ("Documentation/DMA-API-HOWTO: update dma_mask sections")
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This patch replaces dma_{alloc,release}_from_contiguous() with
dma_{alloc,free}_contiguous() to simplify those function calls.
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
With architectures allowing the kernel to be placed almost arbitrarily
in memory (e.g.: ARM64), it is possible to have the kernel resides at
physical addresses above 4GB, resulting in neither the default CMA area,
nor the atomic pool from successfully allocating. This does not prevent
specific peripherals from working though, one example is XHCI, which
still operates correctly.
Trouble comes when the XHCI driver gets suspended and resumed, since we
can now trigger the following NPD:
[ 12.664170] usb usb1: root hub lost power or was reset
[ 12.669387] usb usb2: root hub lost power or was reset
[ 12.674662] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[ 12.682896] pgd = ffffffc1365a7000
[ 12.686386] [00000008] *pgd=0000000136500003, *pud=0000000136500003, *pmd=0000000000000000
[ 12.694897] Internal error: Oops: 96000006 [#1] SMP
[ 12.699843] Modules linked in:
[ 12.702980] CPU: 0 PID: 1499 Comm: pml Not tainted 4.9.135-1.13pre #51
[ 12.709577] Hardware name: BCM97268DV (DT)
[ 12.713736] task: ffffffc136bb6540 task.stack: ffffffc1366cc000
[ 12.719740] PC is at addr_in_gen_pool+0x4/0x48
[ 12.724253] LR is at __dma_free+0x64/0xbc
[ 12.728325] pc : [<ffffff80083c0df8>] lr : [<ffffff80080979e0>] pstate: 60000145
[ 12.735825] sp : ffffffc1366cf990
[ 12.739196] x29: ffffffc1366cf990 x28: ffffffc1366cc000
[ 12.744608] x27: 0000000000000000 x26: ffffffc13a8568c8
[ 12.750020] x25: 0000000000000000 x24: ffffff80098f9000
[ 12.755433] x23: 000000013a5ff000 x22: ffffff8009c57000
[ 12.760844] x21: ffffffc13a856810 x20: 0000000000000000
[ 12.766255] x19: 0000000000001000 x18: 000000000000000a
[ 12.771667] x17: 0000007f917553e0 x16: 0000000000001002
[ 12.777078] x15: 00000000000a36cb x14: ffffff80898feb77
[ 12.782490] x13: ffffffffffffffff x12: 0000000000000030
[ 12.787899] x11: 00000000fffffffe x10: ffffff80098feb7f
[ 12.793311] x9 : 0000000005f5e0ff x8 : 65776f702074736f
[ 12.798723] x7 : 6c2062756820746f x6 : ffffff80098febb1
[ 12.804134] x5 : ffffff800809797c x4 : 0000000000000000
[ 12.809545] x3 : 000000013a5ff000 x2 : 0000000000000fff
[ 12.814955] x1 : ffffff8009c57000 x0 : 0000000000000000
[ 12.820363]
[ 12.821907] Process pml (pid: 1499, stack limit = 0xffffffc1366cc020)
[ 12.828421] Stack: (0xffffffc1366cf990 to 0xffffffc1366d0000)
[ 12.834240] f980: ffffffc1366cf9e0 ffffff80086004d0
[ 12.842186] f9a0: ffffffc13ab08238 0000000000000010 ffffff80097c2218 ffffffc13a856810
[ 12.850131] f9c0: ffffff8009c57000 000000013a5ff000 0000000000000008 000000013a5ff000
[ 12.858076] f9e0: ffffffc1366cfa50 ffffff80085f9250 ffffffc13ab08238 0000000000000004
[ 12.866021] fa00: ffffffc13ab08000 ffffff80097b6000 ffffffc13ab08130 0000000000000001
[ 12.873966] fa20: 0000000000000008 ffffffc13a8568c8 0000000000000000 ffffffc1366cc000
[ 12.881911] fa40: ffffffc13ab08130 0000000000000001 ffffffc1366cfa90 ffffff80085e3de8
[ 12.889856] fa60: ffffffc13ab08238 0000000000000000 ffffffc136b75b00 0000000000000000
[ 12.897801] fa80: 0000000000000010 ffffff80089ccb92 ffffffc1366cfac0 ffffff80084ad040
[ 12.905746] faa0: ffffffc13a856810 0000000000000000 ffffff80084ad004 ffffff80084b91a8
[ 12.913691] fac0: ffffffc1366cfae0 ffffff80084b91b4 ffffffc13a856810 ffffff80080db5cc
[ 12.921636] fae0: ffffffc1366cfb20 ffffff80084b96bc ffffffc13a856810 0000000000000010
[ 12.929581] fb00: ffffffc13a856870 0000000000000000 ffffffc13a856810 ffffff800984d2b8
[ 12.937526] fb20: ffffffc1366cfb50 ffffff80084baa70 ffffff8009932ad0 ffffff800984d260
[ 12.945471] fb40: 0000000000000010 00000002eff0a065 ffffffc1366cfbb0 ffffff80084bafbc
[ 12.953415] fb60: 0000000000000010 0000000000000003 ffffff80098fe000 0000000000000000
[ 12.961360] fb80: ffffff80097b6000 ffffff80097b6dc8 ffffff80098c12b8 ffffff80098c12f8
[ 12.969306] fba0: ffffff8008842000 ffffff80097b6dc8 ffffffc1366cfbd0 ffffff80080e0d88
[ 12.977251] fbc0: 00000000fffffffb ffffff80080e10bc ffffffc1366cfc60 ffffff80080e16a8
[ 12.985196] fbe0: 0000000000000000 0000000000000003 ffffff80097b6000 ffffff80098fe9f0
[ 12.993140] fc00: ffffff80097d4000 ffffff8008983802 0000000000000123 0000000000000040
[ 13.001085] fc20: ffffff8008842000 ffffffc1366cc000 ffffff80089803c2 00000000ffffffff
[ 13.009029] fc40: 0000000000000000 0000000000000000 ffffffc1366cfc60 0000000000040987
[ 13.016974] fc60: ffffffc1366cfcc0 ffffff80080dfd08 0000000000000003 0000000000000004
[ 13.024919] fc80: 0000000000000003 ffffff80098fea08 ffffffc136577ec0 ffffff80089803c2
[ 13.032864] fca0: 0000000000000123 0000000000000001 0000000500000002 0000000000040987
[ 13.040809] fcc0: ffffffc1366cfd00 ffffff80083a89d4 0000000000000004 ffffffc136577ec0
[ 13.048754] fce0: ffffffc136610cc0 ffffffffffffffea ffffffc1366cfeb0 ffffffc136610cd8
[ 13.056700] fd00: ffffffc1366cfd10 ffffff800822a614 ffffffc1366cfd40 ffffff80082295d4
[ 13.064645] fd20: 0000000000000004 ffffffc136577ec0 ffffffc136610cc0 0000000021670570
[ 13.072590] fd40: ffffffc1366cfd80 ffffff80081b5d10 ffffff80097b6000 ffffffc13aae4200
[ 13.080536] fd60: ffffffc1366cfeb0 0000000000000004 0000000021670570 0000000000000004
[ 13.088481] fd80: ffffffc1366cfe30 ffffff80081b6b20 ffffffc13aae4200 0000000000000000
[ 13.096427] fda0: 0000000000000004 0000000021670570 ffffffc1366cfeb0 ffffffc13a838200
[ 13.104371] fdc0: 0000000000000000 000000000000000a ffffff80097b6000 0000000000040987
[ 13.112316] fde0: ffffffc1366cfe20 ffffff80081b3af0 ffffffc13a838200 0000000000000000
[ 13.120261] fe00: ffffffc1366cfe30 ffffff80081b6b0c ffffffc13aae4200 0000000000000000
[ 13.128206] fe20: 0000000000000004 0000000000040987 ffffffc1366cfe70 ffffff80081b7dd8
[ 13.136151] fe40: ffffff80097b6000 ffffffc13aae4200 ffffffc13aae4200 fffffffffffffff7
[ 13.144096] fe60: 0000000021670570 ffffffc13a8c63c0 0000000000000000 ffffff8008083180
[ 13.152042] fe80: ffffffffffffff1d 0000000021670570 ffffffffffffffff 0000007f917ad9b8
[ 13.159986] fea0: 0000000020000000 0000000000000015 0000000000000000 0000000000040987
[ 13.167930] fec0: 0000000000000001 0000000021670570 0000000000000004 0000000000000000
[ 13.175874] fee0: 0000000000000888 0000440110000000 000000000000006d 0000000000000003
[ 13.183819] ff00: 0000000000000040 ffffff80ffffffc8 0000000000000000 0000000000000020
[ 13.191762] ff20: 0000000000000000 0000000000000000 0000000000000001 0000000000000000
[ 13.199707] ff40: 0000000000000000 0000007f917553e0 0000000000000000 0000000000000004
[ 13.207651] ff60: 0000000021670570 0000007f91835480 0000000000000004 0000007f91831638
[ 13.215595] ff80: 0000000000000004 00000000004b0de0 00000000004b0000 0000000000000000
[ 13.223539] ffa0: 0000000000000000 0000007fc92ac8c0 0000007f9175d178 0000007fc92ac8c0
[ 13.231483] ffc0: 0000007f917ad9b8 0000000020000000 0000000000000001 0000000000000040
[ 13.239427] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 13.247360] Call trace:
[ 13.249866] Exception stack(0xffffffc1366cf7a0 to 0xffffffc1366cf8d0)
[ 13.256386] f7a0: 0000000000001000 0000007fffffffff ffffffc1366cf990 ffffff80083c0df8
[ 13.264331] f7c0: 0000000060000145 ffffff80089b5001 ffffffc13ab08130 0000000000000001
[ 13.272275] f7e0: 0000000000000008 ffffffc13a8568c8 0000000000000000 0000000000000000
[ 13.280220] f800: ffffffc1366cf960 ffffffc1366cf960 ffffffc1366cf930 00000000ffffffd8
[ 13.288165] f820: ffffff8009931ac0 4554535953425553 4544006273753d4d 3831633d45434956
[ 13.296110] f840: ffff003832313a39 ffffff800845926c ffffffc1366cf880 0000000000040987
[ 13.304054] f860: 0000000000000000 ffffff8009c57000 0000000000000fff 000000013a5ff000
[ 13.311999] f880: 0000000000000000 ffffff800809797c ffffff80098febb1 6c2062756820746f
[ 13.319944] f8a0: 65776f702074736f 0000000005f5e0ff ffffff80098feb7f 00000000fffffffe
[ 13.327884] f8c0: 0000000000000030 ffffffffffffffff
[ 13.332835] [<ffffff80083c0df8>] addr_in_gen_pool+0x4/0x48
[ 13.338398] [<ffffff80086004d0>] xhci_mem_cleanup+0xc8/0x51c
[ 13.344137] [<ffffff80085f9250>] xhci_resume+0x308/0x65c
[ 13.349524] [<ffffff80085e3de8>] xhci_brcm_resume+0x84/0x8c
[ 13.355174] [<ffffff80084ad040>] platform_pm_resume+0x3c/0x64
[ 13.360997] [<ffffff80084b91b4>] dpm_run_callback+0x5c/0x15c
[ 13.366732] [<ffffff80084b96bc>] device_resume+0xc0/0x190
[ 13.372205] [<ffffff80084baa70>] dpm_resume+0x144/0x2cc
[ 13.377504] [<ffffff80084bafbc>] dpm_resume_end+0x20/0x34
[ 13.382980] [<ffffff80080e0d88>] suspend_devices_and_enter+0x104/0x704
[ 13.389585] [<ffffff80080e16a8>] pm_suspend+0x320/0x53c
[ 13.394881] [<ffffff80080dfd08>] state_store+0xbc/0xe0
[ 13.400094] [<ffffff80083a89d4>] kobj_attr_store+0x14/0x24
[ 13.405655] [<ffffff800822a614>] sysfs_kf_write+0x60/0x70
[ 13.411128] [<ffffff80082295d4>] kernfs_fop_write+0x130/0x194
[ 13.416954] [<ffffff80081b5d10>] __vfs_write+0x60/0x150
[ 13.422254] [<ffffff80081b6b20>] vfs_write+0xc8/0x164
[ 13.427376] [<ffffff80081b7dd8>] SyS_write+0x70/0xc8
[ 13.432412] [<ffffff8008083180>] el0_svc_naked+0x34/0x38
[ 13.437800] Code: 92800173 97f6fb9e 17fffff5 d1000442 (f8408c03)
[ 13.444033] ---[ end trace 2effe12f909ce205 ]---
The call path leading to this problem is xhci_mem_cleanup() ->
dma_free_coherent() -> dma_free_from_pool() -> addr_in_gen_pool. If the
atomic_pool is NULL, we can't possibly have the address in the atomic
pool anyway, so guard against that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Stop providing the arch alloc/free hooks and just expose the segment
offset instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com>
|
|
A few architectures support uncached kernel segments. In that case we get
an uncached mapping for a given physica address by using an offset in the
uncached segement. Implement support for this scheme in the generic
dma-direct code instead of duplicating it in arch hooks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Virtual addresses return from dma(m)_alloc_coherent are opaque in what
backs then, and drivers must not poke into them. Switch the driver
to use the generic DMA API mmap helper to avoid these games.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
|
|
This export is not used in modular code, which is a good thing as
everyone should use the proper DMA API instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com>
|
|
With the addition of the local memory allocator, the HCD_LOCAL_MEM
flag can be dropped and the checks against it replaced with a check
for the localmem_pool ptr being initialized.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Tested-by: Fredrik Noring <noring@nocrew.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
In preparation for dropping the existing "coherent" dma mem declaration
APIs, replace the current dma_declare_coherent_memory() based mechanism
with the creation of a genalloc pool that will be used in the OHCI
subsystem as replacement for the DMA APIs.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
In preparation for dropping the existing "coherent" dma mem declaration
APIs, replace the current dma_declare_coherent_memory() based mechanism
with the creation of a genalloc pool that will be used in the OHCI
subsystem as replacement for the DMA APIs.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
For HCs that have local memory, replace the current DMA API usage with
a genalloc generic allocator to manage the mappings for these devices.
To help users, introduce a new HCD API, usb_hcd_setup_local_mem() that
will setup up the genalloc backing up the device local memory. It will
be used in subsequent patches. This is in preparation for dropping
the existing "coherent" dma mem declaration APIs. The current
implementation was relying on a short circuit in the DMA API that in
the end, was acting as an allocator for these type of devices.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Tested-by: Fredrik Noring <noring@nocrew.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
gen_pool_dma_zalloc() is a zeroed memory variant of
gen_pool_dma_alloc(). Also document the return values of both, and
indicate NULL as a "%NULL" constant.
Signed-off-by: Fredrik Noring <noring@nocrew.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Commit fdaeec198ada ("dma-contiguous: add dma_{alloc,free}_contiguous()
helpers") adds a pair of new helper functions so as to abstract code in
the dma-direct (and other places in the future), however it breaks QEMU
boot feature using x86_64 defconfig.
That's because x86_64 defconfig has CONFIG_DMA_CMA=n so those two newly
introduced helper functions are empty in their !CONFIG_DMA_CMA version,
while previously the platform independent dma-direct code had fallback
alloc_pages_node() and __free_pages().
So this patch fixes it by adding alloc_pages_node() and __free_pages()
in the !CONFIG_DMA_CMA version of the two helper functions.
Tested with below QEMU command:
qemu-system-x86_64 -m 512m \
-drive file=images/x86_64/rootfs.ext4,format=raw,if=ide \
-append 'console=ttyS0 root=/dev/sda' -nographic \
-kernel arch/x86_64/boot/bzImage
with the rootfs from the below link:
https://github.com/ClangBuiltLinux/continuous-integration/raw/master/images/x86_64/rootfs.ext4
Fixes: fdaeec198ada ("dma-contiguous: add dma_{alloc,free}_contiguous() helpers")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The addresses within a single page are always contiguous, so it's
not so necessary to always allocate one single page from CMA area.
Since the CMA area has a limited predefined size of space, it may
run out of space in heavy use cases, where there might be quite a
lot CMA pages being allocated for single pages.
However, there is also a concern that a device might care where a
page comes from -- it might expect the page from CMA area and act
differently if the page doesn't.
This patch tries to use the fallback alloc_pages path, instead of
one-page size allocations from the global CMA area in case that a
device does not have its own CMA area. This'd save resources from
the CMA global area for more CMA allocations, and also reduce CMA
fragmentations resulted from trivial allocations.
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Both dma_alloc_from_contiguous() and dma_release_from_contiguous() are
very simply implemented, but requiring callers to pass certain
parameters like count and align, and taking a boolean parameter to check
__GFP_NOWARN in the allocation flags. So every function call duplicates
similar work:
unsigned long order = get_order(size);
size_t count = size >> PAGE_SHIFT;
page = dma_alloc_from_contiguous(dev, count, order,
gfp & __GFP_NOWARN);
[...]
dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
Additionally, as CMA can be used only in the context which permits
sleeping, most of callers do a gfpflags_allow_blocking() check and a
corresponding fallback allocation of normal pages upon any false result:
if (gfpflags_allow_blocking(flag))
page = dma_alloc_from_contiguous();
if (!page)
page = alloc_pages();
[...]
if (!dma_release_from_contiguous(dev, page, count))
__free_pages(page, get_order(size));
So this patch simplifies those function calls by abstracting these
operations into the two new functions: dma_{alloc,free}_contiguous.
As some callers of dma_{alloc,release}_from_contiguous() might be
complicated, this patch just implements these two new functions to
kernel/dma/direct.c only as an initial step.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicolin Chen <nicoleotsuka@gmail.com>
Tested-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Clang warns:
drivers/iommu/dma-iommu.c:897:6: warning: logical not is only applied to
the left hand side of this comparison [-Wlogical-not-parentheses]
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
^ ~~
drivers/iommu/dma-iommu.c:897:6: note: add parentheses after the '!' to
evaluate the comparison first
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
^
( )
drivers/iommu/dma-iommu.c:897:6: note: add parentheses around left hand
side expression to silence this warning
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
^
( )
1 warning generated.
Judging from the rest of the commit and the conditional in
iommu_dma_map_sg, either
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
or
if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
was intended, not a combination of the two.
I personally think that the former is easier to understand so use that.
Fixes: 06d60728ff5c ("iommu/dma: move the arm64 wrappers to common code")
Link: https://github.com/ClangBuiltLinux/linux/issues/497
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
With most of the previous functionality now elsewhere a lot of the
headers included in this file are not needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
For entirely dma coherent architectures there is no requirement to ever
remap dma coherent allocation. Move all the remap and pool code under
IS_ENABLED() checks and drop the Kconfig dependency.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Inline __iommu_dma_mmap_pfn into the main function, and use the
fact that __iommu_dma_get_pages return NULL for remapped contigous
allocations to simplify the code flow a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Inline __iommu_dma_get_sgtable_page into the main function, and use the
fact that __iommu_dma_get_pages return NULL for remapped contigous
allocations to simplify the code flow a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
All the logic in iommu_dma_alloc that deals with page allocation from
the CMA or page allocators can be split into a self-contained helper,
and we can than map the result of that or the atomic pool allocation
with the iommu later. This also allows reusing __iommu_dma_free to
tear down the allocations and MMU mappings when the IOMMU mapping
fails.
Based on a patch from Robin Murphy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Most importantly clear up the size / iosize confusion. Also rename addr
to cpu_addr to match the surrounding code and make the intention a little
more clear.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Most of it can double up to serve the failure cleanup path for
iommu_dma_alloc().
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Instead of having a separate code path for the non-blocking alloc_pages
and CMA allocations paths merge them into one. There is a slight
behavior change here in that we try the page allocator if CMA fails.
This matches what dma-direct and other iommu drivers do and will be
needed to use the dma-iommu code on architectures without DMA remapping
later on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Always remapping CMA allocations was largely a bodge to keep the freeing
logic manageable when it was split between here and an arch wrapper. Now
that it's all together and streamlined, we can relax that limitation.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Shuffle around the self-contained atomic and non-contiguous cases to
return early and get out of the way of the CMA case that we're about to
work on next.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[hch: slight changes to the code flow]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The freeing logic was made particularly horrible by part of it being
opaque to the arch wrapper, which led to a lot of convoluted repetition
to ensure each path did everything in the right order. Now that it's
all private, we can pick apart and consolidate the logically-distinct
steps of freeing the IOMMU mapping, the underlying pages, and the CPU
remap (if necessary) into something much more manageable.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[various cosmetic changes to the code flow]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
We only have a single caller of this function left, so open code it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Move the call to dma_common_pages_remap into __iommu_dma_alloc and
rename it to iommu_dma_alloc_remap. This creates a self-contained
helper for remapped pages allocation and mapping.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Since we duplicate the find_vm_area() logic a few times in places where
we only care aboute the pages, factor out a helper to abstract it.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[hch: don't warn when not finding a region, as we'll rely on that later]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The remaining internal callsites don't care about having prototypes
compatible with the relevant dma_map_ops callbacks, so the extra
level of indirection just wastes space and complictaes things.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Most of the callers don't care, and the couple that do already have the
domain to hand for other reasons are in slow paths where the (trivial)
overhead of a repeated lookup will be utterly immaterial.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[hch: dropped the hunk touching iommu_dma_get_msi_page to avoid a
conflict with another series]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Moving this function up to its unmap counterpart helps to keep related
code together for the following changes.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
There is nothing really arm64 specific in the iommu_dma_ops
implementation, so move it to dma-iommu.c and keep a lot of symbols
self-contained. Note the implementation does depend on the
DMA_DIRECT_REMAP infrastructure for now, so we'll have to make the
DMA_IOMMU support depend on it, but this will be relaxed soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
arch_dma_prep_coherent can handle physically contiguous ranges larger
than PAGE_SIZE just fine, which means we don't need a page-based
iterator.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|