laptop-kernel/drivers/gpu/drm/amd, branch master

drm/amd/powerplay: Fix CIK shutdown temperature

2025-10-23T14:24:34Z

[ Upstream commit 6917112af2ba36c5f19075eb9f2933ffd07e55bf ] Remove extra multiplication. CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case the shutdown temperature is hardcoded in smu7_init_dpm_defaults and is already multiplied by 1000. The value was mistakenly multiplied another time by smu7_get_thermal_temperature_range. Fixes: 4ba082572a42 ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676 Reviewed-by: Alex Deucher Signed-off-by: Timur Kristóf Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: set an error on all fences from a bad context

2025-10-23T14:24:34Z

[ Upstream commit ff780f4f80323148d43198f2052c14160c8428d3 ] When we backup ring contents to reemit after a queue reset, we don't backup ring contents from the bad context. When we signal the fences, we should set an error on those fences as well. v2: misc cleanups v3: add locking for fence error, fix comment (Christian) v4: fix wrap around, locking (Christian) Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: handle wrap around in reemit handling

2025-10-23T14:24:34Z

[ Upstream commit 1f22fcb88bfef26a966e9eb242c692c6bf253d47 ] Compare the sequence numbers directly. Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: fix handling of harvesting for ip_discovery firmware

2025-10-23T14:24:34Z

[ Upstream commit 357d90be2c7aaa526a840cddffd2b8d676fe75a6 ] Chips which use the IP discovery firmware loaded by the driver reported incorrect harvesting information in the ip discovery table in sysfs because the driver only uses the ip discovery firmware for populating sysfs and not for direct parsing for the driver itself as such, the fields that are used to print the harvesting info in sysfs report incorrect data for some IPs. Populate the relevant fields for this case as well. Fixes: 514678da56da ("drm/amdgpu/discovery: fix fw based ip discovery") Acked-by: Tom St Denis Reviewed-by: Lijo Lazar Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin

drm/amdgpu: add support for cyan skillfish without IP discovery

2025-10-23T14:24:34Z

[ Upstream commit 9e6a5cf1a23bf575e93544ae05585659063b1c18 ] For platforms without an IP discovery table. Signed-off-by: Alex Deucher Stable-dep-of: 357d90be2c7a ("drm/amdgpu: fix handling of harvesting for ip_discovery firmware") Signed-off-by: Sasha Levin

drm/amdgpu: add ip offset support for cyan skillfish

2025-10-23T14:24:33Z

[ Upstream commit e8529dbc75cab56fc3c57830d0fd48cbd8911e6c ] For chips that don't have IP discovery tables. Signed-off-by: Alex Deucher Stable-dep-of: 357d90be2c7a ("drm/amdgpu: fix handling of harvesting for ip_discovery firmware") Signed-off-by: Sasha Levin

drm/amd: Fix hybrid sleep

2025-10-23T14:24:26Z

[ Upstream commit 0a6e9e098fcc318fec0f45a05a5c4743a81a60d9 ] [Why] commit 530694f54dd5e ("drm/amdgpu: do not resume device in thaw for normal hibernation") optimized the flow for systems that are going into S4 where the power would be turned off. Basically the thaw() callback wouldn't resume the device if the hibernation image was successfully created since the system would be powered off. This however isn't the correct flow for a system entering into s0i3 after the hibernation image is created. Some of the amdgpu callbacks have different behavior depending upon the intended state of the suspend. [How] Use pm_hibernation_mode_is_suspend() as an input to decide whether to run resume during thaw() callback. Reported-by: Ionut Nechita Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4573 Tested-by: Ionut Nechita Fixes: 530694f54dd5e ("drm/amdgpu: do not resume device in thaw for normal hibernation") Acked-by: Alex Deucher Tested-by: Kenneth Crudup Signed-off-by: Mario Limonciello (AMD) Cc: 6.17+ # 6.17+: 495c8d35035e: PM: hibernate: Add pm_hibernation_mode_is_suspend() Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman

drm/amd: Check whether secure display TA loaded successfully

2025-10-23T14:24:26Z

commit c760bcda83571e07b72c10d9da175db5051ed971 upstream. [Why] Not all renoir hardware supports secure display. If the TA is present but the feature isn't supported it will fail to load or send commands. This shows ERR messages to the user that make it seems like there is a problem. [How] Check the resp_status of the context to see if there was an error before trying to send any secure display commands. Reviewed-by: Alex Deucher Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1415 Signed-off-by: Mario Limonciello Signed-off-by: Alex Deucher Signed-off-by: Adrian Yip Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: fix gfx12 mes packet status return check

2025-10-23T14:24:25Z

commit d0de79f66a80eeb849033fae34bd07a69ce72235 upstream. GFX12 MES uses low 32 bits of status return for success (1 or 0) and high bits for debug information if low bits are 0. GFX11 MES doesn't do this so checking full 64-bit status return for 1 or 0 is still valid. Signed-off-by: Jonathan Kim Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman

drm/amdgpu: use atomic functions with memory barriers for vm fault info

2025-10-23T14:24:25Z

commit 6df8e84aa6b5b1812cc2cacd6b3f5ccbb18cda2b upstream. The atomic variable vm_fault_info_updated is used to synchronize access to adev->gmc.vm_fault_info between the interrupt handler and get_vm_fault_info(). The default atomic functions like atomic_set() and atomic_read() do not provide memory barriers. This allows for CPU instruction reordering, meaning the memory accesses to vm_fault_info and the vm_fault_info_updated flag are not guaranteed to occur in the intended order. This creates a race condition that can lead to inconsistent or stale data being used. The previous implementation, which used an explicit mb(), was incomplete and inefficient. It failed to account for all potential CPU reorderings, such as the access of vm_fault_info being reordered before the atomic_read of the flag. This approach is also more verbose and less performant than using the proper atomic functions with acquire/release semantics. Fix this by switching to atomic_set_release() and atomic_read_acquire(). These functions provide the necessary acquire and release semantics, which act as memory barriers to ensure the correct order of operations. It is also more efficient and idiomatic than using explicit full memory barriers. Fixes: b97dfa27ef3a ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman