aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-02-11drm/nouveau/instmem: switch to instanced constructorBen Skeggs1-2/+2
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>
2020-09-25drm/nouveau/gk20a: stop setting DMA_ATTR_NON_CONSISTENTChristoph Hellwig1-2/+1
DMA_ATTR_NON_CONSISTENT is a no-op except on PA-RISC and a few MIPS configs, so don't set it in this ARM specific driver part. Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-02drm/nouveau/mmu: remove old vmm frontendBen Skeggs1-11/+0
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-11-02drm/nouveau/imem/nv50-: use new interfaces for vmm operationsBen Skeggs1-19/+26
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-11-02drm/nouveau/mmu: implement new vmm frontendBen Skeggs1-0/+1
These are the new priviledged interfaces to the VMM backends, and expose some functionality that wasn't previously available. It's now possible to allocate a chunk of address-space (even all of it), without causing page tables to be allocated up-front, and then map into it at arbitrary locations. This is the basic primitive used to support features such as sparse mapping, or to allow userspace control over its own address-space, or HMM (where the GPU driver isn't in control of the address-space layout). Rather than being tied to a subtle combination of memory object and VMA properties, arguments that control map flags (ro, kind, etc) are passed explicitly at map time. The compatibility hacks to implement the old frontend on top of the new driver backends have been replaced with something similar to implement the old frontend's interfaces on top of the new frontend. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-11-02drm/nouveau: wrap nvkm_mem objects in nvkm_memory interfacesBen Skeggs1-0/+9
This is a transition step, to enable finer-grained commits while transitioning to new MMU interfaces. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-11-02drm/nouveau/core/memory: change map interface to support upcoming mmu changesBen Skeggs1-4/+5
Map flags (access, kind, etc) are currently defined in either the VMA, or the memory object, which turns out to not be ideal for things like suballocated buffers, etc. These will become per-map flags instead, so we need to support passing these arguments in nvkm_memory_map(). Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-11-02drm/nouveau: separate buffer object backing memory from nvkm structuresBen Skeggs1-1/+0
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-11-02drm/nouveau/imem: remove now-unused wrapper for backend objectsBen Skeggs1-1/+0
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-11-02drm/nouveau/core/memory: split info pointers from accessor pointersBen Skeggs1-3/+7
The accessor functions can change as a result of acquire()/release() calls, and are protected by any refcounting done there. Other functions must remain constant, as they can be called any time. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-04-06drm/nouveau/imem/gk20a: Turn instmem lock into mutexThierry Reding1-11/+8
The gk20a implementation of instance memory uses vmap()/vunmap() to map memory regions into the kernel's virtual address space. These functions may sleep, so protecting them by a spin lock is not safe. This triggers a warning if the DEBUG_ATOMIC_SLEEP Kconfig option is enabled. Fix this by using a mutex instead. Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Tested-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-02-17drm/nouveau/core/memory: distinguish between coherent/non-coherent targetsBen Skeggs1-1/+1
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2017-02-17drm/nouveau/core/mm: replace region list with next pointerBen Skeggs1-13/+4
We never have any need for a double-linked list here, and as there's generally a large number of these objects, replace it with a single- linked list in order to save some memory. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-08-04dma-mapping: use unsigned long for dma_attrsKrzysztof Kozlowski1-7/+6
The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs *attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs *attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-14drm/nouveau/instmem/gk20a: add write barrier when releasing DMA objectAlexandre Courbot1-0/+2
When using the DMA-API for instmem, we may obtain a write-combined mapping. For such cases, add a write barrier in gk20a_instobj_release_dma() to make sure that all writes have reached memory at this time. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-01-11drm/nouveau/instmem/gk20a: use DMA API CPU mappingAlexandre Courbot1-92/+62
Commit 69c4938249fb ("drm/nouveau/instmem/gk20a: use direct CPU access") tried to be smart while using the DMA-API by managing the CPU mappings of buffers allocated with the DMA-API by itself. In doing so, it relied on dma_to_phys() which is an architecture-private function not available everywhere. This broke the build on several architectures. Since there is no reliable and portable way to obtain the physical address of a DMA-API buffer, stop trying to be smart and just use the CPU mapping that the DMA-API can provide. This means that buffers will be CPU-mapped for all their life as opposed to when we need them, but anyway using the DMA-API here is a fallback for when no IOMMU is available so we should not expect optimal behavior. This makes the IOMMU and DMA-API implementations of instmem diverge enough that we should maybe put them into separate files... Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-01-11drm/nouveau/instmem/gk20a: fix race conditionsAlexandre Courbot1-29/+37
The LRU list used for recycling CPU mappings was handling concurrency very poorly. For instance, if an instobj was acquired twice before being released once, it would end up into the LRU list even though there is still a client accessing it. This patch fixes this by properly counting how many clients are currently using a given instobj. While at it, we also raise errors when inconsistencies are detected, and factorize some code. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-11drm/nouveau: fix build failures on all non ARM.Dave Airlie1-0/+6
gk20a is an ARM only GPU, so we can just do the correct thing on ARM but fail on other architectures. The other option was to use SWIOTLB as the define, which means phys_to_page exists, but this seems clearer. Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-11-03drm/nouveau/instmem/gk20a: make use of the IOMMU bitAlexandre Courbot1-4/+6
Use the IOMMU bit specified in platform data instead of hardcoding it to the bit used by current Tegra GPUs. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-03drm/nouveau/instmem/gk20a: use direct CPU accessAlexandre Courbot1-97/+264
The Great Nouveau Refactoring Take II brought us a lot of goodness, including acquire/release methods that are called before and after an instobj is modified. These functions can be used as synchronization points to manage CPU/GPU coherency if we modify an instobj using the CPU. This patch replaces the legacy and slow PRAMIN access for gk20a instmem with CPU mappings and writes. A LRU list is used to unmap unused mappings after a certain threshold (currently 1MB) of mapped instobjs is reached. This allows mappings to be reused most of the time. Accessing instobjs using the CPU requires to maintain the GPU L2 cache, which we do in the acquire/release functions. This triggers a lot of L2 flushes/invalidates, but most of them are performed on an empty cache (and thus return immediately), and overall context setup performance greatly benefits from this (from 250ms to 160ms on Jetson TK1 for a simple libdrm program). Making L2 management more explicit should allow us to grab some more performance in the future. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/tegra: merge platform setup from nouveau drmBen Skeggs1-12/+8
The copyright header in nvkm/engine/device/platform.c has been replaced with the NVIDIA one from drm/nouveau_platform.c, as most of the actual code is now theirs. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/device: remove pci/platform_device from common structBen Skeggs1-1/+1
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/imem: convert to new-style nvkm_subdevBen Skeggs1-33/+21
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/imem: improve management of instance memoryBen Skeggs1-94/+120
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/platform: remove subclassing of nvkm_deviceBen Skeggs1-7/+6
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/imem: switch to subdev printk macrosBen Skeggs1-11/+15
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/imem: switch to device pri macrosBen Skeggs1-4/+6
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/imem: cosmetic changesBen Skeggs1-78/+78
This is purely preparation for upcoming commits, there should be no code changes here. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-08-28drm/nouveau/device: include core/device.h automatically for subdevs/enginesBen Skeggs1-1/+0
Pretty much every subdev/engine is going to need access to nvkm_device shortly to touch registers and/or output messages. The odd placement of the includes is necessary to work around some inter-dependencies that currently exist. This will be fixed later. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-04-14drm/nouveau/instmem/gk20a: fix crash during error pathAlexandre Courbot1-1/+1
If a memory allocation fails when using the DMA allocator, gk20a_instobj_dtor_dma() will be called on the failed instmem object. At this time, node->handle might not be NULL despite the call to dma_alloc_attrs() having failed. node->cpuaddr is the right member to check for such a failure, so use it instead. Reported-by: Vince Hsu <vinceh@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-04-14drm/nouveau/instmem/gk20a: add IOMMU supportAlexandre Courbot1-39/+252
Let GK20A's instmem take advantage of the IOMMU if it is present. Having an IOMMU means that instmem is no longer allocated using the DMA API, but instead obtained through page_alloc and made contiguous to the GPU by IOMMU mappings. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-04-14drm/nouveau/instmem/gk20a: use DMA attributesAlexandre Courbot1-4/+20
instmem for GK20A is allocated using dma_alloc_coherent(), which provides us with a coherent CPU mapping that we never use because instmem objects are accessed through PRAMIN. Switch to dma_alloc_attrs() which gives us the option to dismiss that CPU mapping and free up some CPU virtual space. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-04-14drm/nouveau/instmem/gk20a: move memory allocation to instmemAlexandre Courbot1-0/+211
GK20A does not have dedicated RAM, thus having a RAM device for it does not make sense. Move the contiguous physical memory allocation to instmem. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>