Age | Commit message (Collapse) | Author | Files | Lines |
|
Add helpers for allocating GPUVM memory in kernel mode and use them
to allocate memory for the CWSR trap handler.
v2: Use dev instead of pdd->dev in kfd_process_free_gpuvm
v3:
* Cleaned up and simplified kfd_process_alloc_gpuvm
* Moved allocation for dGPU to kfd_process_device_init_vm
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Also used for cleaning up on process termination.
v2: Refactored cleanup on process termination
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Set up the GPUVM aperture for SVM (shared virtual memory) that allows
sharing a part of virtual address space between GPUs and CPUs.
Report the size of the GPUVM aperture that is supported by KGD accurately.
The low part of the GPUVM aperture is reserved for kernel use. This is
for kernel-allocated buffers that are only accessed on the GPU:
- CWSR trap handler
- IB for submitting commands in user-mode context from kernel mode
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Currently the number of GPUs is limited by aperture placement options
available on GFX7 and GFX8 hardware. This limitation is not necessary.
Scratch and LDS represent per-work-item and per-work-group storage
respectively. Different work-items and work-groups use the same virtual
address to access their own data. Work running on different GPUs is by
definition in different work-groups (different dispatches, in fact).
That means the same virtual addresses can be used for these apertures
on different GPUs.
Add a new AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl that removes the
artificial limitation on the number of GPUs that can be supported. The
new ioctl allows user mode to query the number of GPUs to allocate
enough memory for all GPUs to be reported.
This deprecates AMDKFD_IOC_GET_PROCESS_APERTURES.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Populate DRM render device minor in kfd topology
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Instead of creating all VMs on process creation, create them when
a process is bound to a device. This will later allow registering
an existing VM from a DRM render node FD at runtime, before the
process is bound to the device. This way the render node VM can be
used for KFD instead of creating our own redundant VM.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
This allows acquiring an existing VM from a render node FD to use it
for a compute process.
Such VMs get destroyed when the original file descriptor is released.
Added a callback from amdgpu_vm_fini to handle KFD VM destruction
correctly in this case.
v2:
* Removed vm->vm_context check in amdgpu_amdkfd_gpuvm_destroy_cb,
check vm->process_info earlier instead
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
v2: Removed updating and checking of vm->vm_context
v3: Enable amdgpu_vm_clear_bo in amdgpu_vm_make_compute
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Make sure the PD BO is valid and attach the eviction fence during VM
creation. This ensures that the pd_phys_address is actually valid
and an eviction that would invalidate it triggers a KFD process
eviction like it should.
v2: Use uninterruptible waiting in initial PD validation
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Remove struct amdkfd_vm and move the fields into struct amdgpu_vm.
This will allow turning a VM created by a DRM render node into a
KFD VM.
v2: Removed vm_context field
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
When CONFIG_ACPI is disabled, we never initialize the acpi_table
structure in kfd_create_crat_image_virtual:
drivers/gpu/drm/amd/amdkfd/kfd_crat.c: In function 'kfd_create_crat_image_virtual':
drivers/gpu/drm/amd/amdkfd/kfd_crat.c:888:40: error: 'acpi_table' may be used uninitialized in this function [-Werror=maybe-uninitialized]
The undefined behavior also happens for any other acpi_get_table()
failure, but then the compiler can't warn about it.
This adds an error check that prevents the structure from
being used in error, avoiding both the undefined behavior and
the warning about it.
Fixes: 520b8fb755cc ("drm/amdkfd: Add topology support for CPUs")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
This patch fixes kernel build in ARCH=frv
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
|
|
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
v2: use temporaries to trivially reduces object size.
The for-loops process data in the mclk_table but use slck_table.count
as the loop index limit. I believe these are cut-n-paste errors from
the previous almost identical loops as indicated by static analysis.
Fix these.
Detected by CoverityScan, CID#1466001 ("Copy-paste error")
Fixes: 5d97cf39ff24 ("drm/amd/pp: Add and initialize OD_dpm_table for CI/VI.")
Fixes: 5e4d4fbea557 ("drm/amd/pp: Implement edit_dpm_table on smu7")
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
_PR3 doesn't seem to work properly, use ATPX instead.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=104064
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Trivial fix to spelling mistake in pr_err error message text
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In new asics(vega12), no power state management in driver,
So no need to implement related callback functions.
and add some ps checks in pp_psm.c
Revert "drm/amd/powerplay: add new pp_psm infrastructure for vega12 (v2)"
This reverts commit 7d1a63f3aa331b853e41f92d0e7890ed31de8c13.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix the issue thermal irq was always triggered
as GPU under temperature range detected
The low temp in default thermal policy
was set to -273. so need to use int type for the low temp.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
v2: add sanity checking
v3: make code open
v4: also handle visible to invisible fallback
v5: Since two fallback cases, re-use goto retry
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
DRM_VMW_EVENT_FENCE_SIGNALED (struct drm_vmw_event_fence) and
DRM_EVENT_VBLANK (struct drm_event_vblank) pass timestamps in 32-bit
seconds/microseconds format.
As of commit c61eef726a78 ("drm: add support for monotonic vblank
timestamps"), other DRM drivers use monotonic times for drm_event_vblank,
but vmwgfx still uses CLOCK_REALTIME for both events, which suffers from
the y2038/y2106 overflow as well as time jumps.
For consistency, this changes vmwgfx to use ktime_get_ts64 as well,
which solves those problems and avoids the deprecated do_gettimeofday()
function.
This should be transparent to to user space, as long as it doesn't
compare the time against the result of gettimeofday().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
We were relying on the pinned screen object backup buffer to be destroyed
when not used. But if we hold a copy of the atomic state, like when
hibernating, the backup buffer might not be destroyed since it's
refcounted by the atomic state. This causes us to hibernate with a
buffer pinned in VRAM.
Fix this by only having the buffer pinned when it is actually used by a
screen object.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
|
|
For legacy surfaces, they were previously registered as device resources
when the driver resources were created. Since they are evictable we instead
register them as device resources once they are created on the device,
just like for guest-backed surfaces. This has implications during
hibernation where we can't hibernate with device resources active.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
|
|
Use kasprintf instead of combination of kmalloc and sprintf. Also,
remove the local variables used for storing the string length as they
are not required now.
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
It was used to early block fbdev dirty processing. Replace it with an
unprotected check of the par->dirty.active field. While this might
race with the vmw_fb_off() function, we do a protected check later so
the race will at worst lead to grabbing and releasing a couple of locks.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
|
|
Make it possible to hibernate also with masters that don't switch VT at
hibernation time. We save and restore modesetting state unless fbdev is
active and enabled at hibernation time.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
|
|
fbdev framebuffers were previously pinned to be able to keep them mapped
across updates.
This commit introduces a mechanism that instead revalidates the map on
each update, keeping the map cached across updates. The cached map is torn
down if the underlying pages change. Typically on buffer object moves and
swapouts.
This should be nicer to the system when we have resource contention.
Testing done: Basic fbdev functionality under Fedora 27.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
|
|
The start / stop and preempt commands don't honor the context argument
but rather acts on all available contexts.
Also add detection for context 1 availability.
Note that currently there's no driver interface for submitting buffers
using the high-priority command queue (context 1).
Testing done:
Change the default context for command submission to 1 instead of 0,
verify basic desktop functionality including faulty command injection and
recovery.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
|
|
This blit was previously performed using two large vmaps, one of which
was teared down and remapped on each blit. Use the more resource-
conserving TTM cpu blit instead.
The blit is used in boundary-box computing mode which makes it possible
to minimize the bounding box used in host operations.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
The utility uses kmap_atomic() instead of vmapping the whole buffer
object. As a result there will be more book-keeping but on some
architectures this will help avoid exhausting vmalloc space and also
avoid expensive TLB flushes.
The blit utility also adds a provision to compute a bounding box of
changed content, which is very useful to optimize presentation speed
of ill-behaved applications that don't supply proper damage regions, and
for page-flips. The cost of computing the bounding box is not that
expensive when done in a cpu-blit utility like this.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
It will be used by vmwgfx cpu blit.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
Use helpers to perform the kmap_atomic_prot() functionality to
a) Avoid in-function ifdefs that violate the kernel coding policy,
b) Facilitate exporting the functionality.
This commit should not change any functionality.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
The current limit of 2 leads to some GPU idle times, as the usual
IRQ latency leads to up to 3 jobs getting signaled at once with some
standard workloads.
A larger HW job limit might lead to slightly worse QoS, but we accept
that to not sacrifice GPU throughput in the common case.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
|
|
Use drm_plane_helper_check_update also for the cursor plane.
Some applications, like gdm on gnome shell still uses cursor front-buffer
like rendering without notifying the kernel. We do need some kind of
noficiation, but work around this for now by updating the cursor image on
every cursor move.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
|
|
Page flip can be slow for vmwgfx in some cases, like need to do surface
copy to different surface or waiting for IN_FENCE_FD. Enabling
nonblocking commits for vmwgfx in case userspace request it.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
Atomic ioctl can also send the same page flip flags as legacy ioctl.
In those cases also need to send the vblank event to userspace.
vmwgfx does not support flag DRM_MODE_PAGE_FLIP_ASYNC, so this flag is
never expected.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
The dmabuf_dirty/surface_dirty in case of screen object is moved to
plane atomic update, so that page flip in atomic ioctl also works.
vmwgfx does not support DRM_MODE_PAGE_FLIP_ASYNC, so this flag is never
expected.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
The function drm_crtc_arm_vblank_event should be used for the driver
which have vblank interrupt support. In case of vmwgfx we do not have
vblank interrupt.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
When display surface is different than the framebuffer surface, atomic
path do not copy the surface data. This commit moved the code to copy
surface from legacy to atomic path.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
In case of page flip there is no need to iterate over all display unit
in the function "vmw_kms_helper_dirty". If crtc is available then
dirty commands is performed on that crtc only.
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
|
|
Problem:
When unloading due to failure amdgpu_device_fini was called twice
which was leading to NULL ptr in amdgpu_irq_disable_all.
Fix:
Call amdgpu_device_fini only once from amdgpu_driver_unload_kms.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Mixed up exporter and importer here. E.g. while mapping the BO we need
to check the importer not the exporter.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=105633
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
v2: add Vega12 support
1. delete useless argument in function register_thermal_interrupt
2. rename function name register_thermal_interrupt to register_irq_handlers
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
v2: add Vega12 support
1. remove struct cgs_os_ops
2. delete cgs_linux.h
3. refine the irq code for vega10, can fix set pp table
failed issue.
4. add common smu irq process function
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add change to return per DPM level clock in DAL interface
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove W/A carried over from VG10 to set VDDSOC Floor Voltage
prior to enabling DPM since the VBIOS covers the floor voltage
setting now
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Under some heavy computing environment(e.g. dgemm test), it
takes the asic over 10+ seconds to finish the dispatched job
which will trigger the timeout.
It's quite confusing although it does not seem to bring any
real problems. As a quick workround, we choose to not enfoce
the timeout setting on compute queues.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|