| Age | Commit message (Collapse) | Author | Files | Lines |
|
[Why]
Pausing DPM power profiles during static screen caused a bunch of
audio/performance/clock issues that were addressed in this fix:
'commit 1412482b7143 ("Revert "drm/amd/display: pause the workload setting in dm"")'
This logic in function amdgpu_dm_crtc_vblank_control_worker() was moved
to amdgpu_dm_ism.c, but the fix was lost in the process.
[How]
Reapply the fix to amdgpu_dm_ism.c
Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bc621e91d6fc004cfae9148c5a91acad19ada3e4)
|
|
The Idle State Manager (ISM) uses delayed work to apply display idle
optimizations later, instead of immediately. This helps avoid rapid idle
transitions that can hurt power or performance.
A crash was seen during driver teardown. The system boots normally and
the driver loads successfully. Later, when the GPU is being stopped, the
log shows:
amdgpu 0000:0e:00.0: finishing device.
Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]
After this, delayed ISM work still runs and reaches:
dm_ism_sso_delayed_work_func()
-> amdgpu_dm_ism_commit_event()
-> dm_ism_commit_idle_optimization_state()
-> dc_allow_idle_optimizations_internal()
The crash report showed:
KASAN: null-ptr-deref in range [0x690-0x697]
Signature:
[22601.113316] KASAN: null-ptr-deref in range [0x0000000000000690-0x0000000000000697]
...
[22601.113368] Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]
[22601.113930] RIP: 0010:dc_allow_idle_optimizations_internal+0xa6/0xc40 [amdgpu]
...
[22601.114491] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000690
...
[22601.114561] Call Trace:
[22601.114566] <TASK>
[22601.114572] ? srso_alias_return_thunk+0x5/0xfbef5
[22601.114582] ? update_load_avg+0x1b6/0x20b0
[22601.114593] ? __pfx_dc_allow_idle_optimizations_internal+0x10/0x10 [amdgpu]
[22601.114932] ? psi_group_change+0x4ed/0x8d0
[22601.114942] dm_ism_commit_idle_optimization_state+0x214/0x570 [amdgpu]
[22601.115268] amdgpu_dm_ism_commit_event+0xe1d/0x15a0 [amdgpu]
[22601.115588] ? srso_alias_return_thunk+0x5/0xfbef5
[22601.115595] ? __kasan_check_write+0x18/0x20
[22601.115603] ? srso_alias_return_thunk+0x5/0xfbef5
[22601.115610] ? mutex_lock+0x83/0xc0
[22601.115620] dm_ism_sso_delayed_work_func+0x64/0x90 [amdgpu]
GDB resolved dc_allow_idle_optimizations_internal+0xa6 to:
struct dc_state *context = dc->current_state;
The matching disassembly showed:
mov %rdi, %r12
mov 0x690(%r12), %r13
where r12 holds the dc pointer. A GDB layout dump of struct dc showed:
/* 1680 | 8 */ struct dc_state *current_state;
Since 1680 decimal is 0x690, this confirms that current_state is at
offset 0x690. The faulting access was effectively:
dc + 0x690
which indicates that dc was NULL at the time of dereference.
This shows that ISM work can still run during teardown after dc has
been cleared.
ISM is not expected to run after dc is destroyed. Fix this by disabling
ISM under dc_lock in amdgpu_dm_fini() before dc_destroy(), ensuring no
further ISM work runs after dc teardown.
Also add ASSERT(dm->dc) in amdgpu_dm_ism_commit_event() to enforce this
invariant, and ASSERT(mutex_is_locked(&dm->dc_lock)) in
amdgpu_dm_ism_disable() to clarify the locking requirement.
Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Suggested-by: Leo Li <sunpeng.li@amd.com>
Cc: Ray Wu <ray.wu@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update comments in dm_ism_get_idle_allow_delay() and
dm_ism_insert_record() to better reflect their behavior and inputs.
dm_ism_get_idle_allow_delay() computes the delay before allowing
idle optimizations based on history and stream timing.
dm_ism_insert_record() stores idle duration records in the
circular history buffer.
These functions explain what they do, but they do not explain what their
inputs mean.
Fixes the below with gcc W=1:
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'current_state' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'event' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'next_state' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:153 function parameter 'ism' not described in 'dm_ism_get_idle_allow_delay'
../display/amdgpu_dm/amdgpu_dm_ism.c:153 function parameter 'stream' not described in 'dm_ism_get_idle_allow_delay'
../display/amdgpu_dm/amdgpu_dm_ism.c:216 function parameter 'ism' not described in 'dm_ism_insert_record'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'current_state' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'event' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:44 function parameter 'next_state' not described in 'dm_ism_next_state'
../display/amdgpu_dm/amdgpu_dm_ism.c:153 function parameter 'ism' not described in 'dm_ism_get_idle_allow_delay'
../display/amdgpu_dm/amdgpu_dm_ism.c:153 function parameter 'stream' not described in 'dm_ism_get_idle_allow_delay'
../display/amdgpu_dm/amdgpu_dm_ism.c:216 function parameter 'ism' not described in 'dm_ism_insert_record'
Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Cc: Ray Wu <ray.wu@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
Rapid allow/disallow of idle optimization calls, whether it be IPS or
self-refresh features, can end up using more power if actual
time-in-idle is low. It can also spam DMUB command submission in a way
that prevents it from servicing other requestors.
[How]
Introduce the Idle State Manager (ISM) to amdgpu. It maintains a finite
state machine that uses a hysteresis to determine if a delay should be
inserted between a caller allowing idle, and when the actual idle
optimizations are programmed.
A second timer is also introduced to enable static screen optimizations
(SSO) such as PSR1 and Replay low HZ idle mode. Rapid SSO enable/disable
can have a negative power impact on some low hz video playback, and can
introduce user lag for PSR1 (due to up to 3 frames of sync latency).
This effectively rate-limits idle optimizations, based on hysteresis.
This also replaces the existing delay logic used for PSR1, allowing
drm_vblank_crtc_config.disable_immediate = true, and thus allowing
drm_crtc_vblank_restore().
v2:
* Loosen criteria for ISM to exit idle optimizations; it failed to exit
idle correctly on cursor updates when there are no drm_vblank
requestors,
* Document default_ism_config
* Convert pr_debug to trace events to reduce overhead on frequent
codepaths
* checkpatch.pl fixes
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4527
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3709
Fixes: 58a261bfc967 ("drm/amd/display: use a more lax vblank enable policy for older ASICs")
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|