aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-03-26drm/amdgpu/gfx11: fix num_mecAlex Deucher1-1/+1
GC11 only has 1 mec. Fixes: 3d879e81f0f9 ("drm/amdgpu: add init support for GFX11 (v2)") Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amd/pm: Add gpu_metrics_v1_8Asad Kamal2-0/+117
Add new gpu_metrics_v1_8 to acquire below host limit counters Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amdgpu: Prefer shadow rom when availableLijo Lazar1-7/+27
Fetch VBIOS from shadow ROM when available before trying other methods like EFI method. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Fixes: 9c081c11c621 ("drm/amdgpu: Reorder to read EFI exported ROM first") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4066 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amd/pm: Update smu metrics table for smu_v13_0_6Asad Kamal1-2/+5
Update smu metrics table to vesrion 0x10 for smu_v13_0_6 v2: Host metrics support removal moved to separate patch (Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amd/pm: Remove host limit metrics supportAsad Kamal1-15/+0
Firmware algorithm changed and the values in this version are not accurate thereby remove host limit metric support for smu_v13_0_6, smu_v13_0_12 & smu_v13_0_14 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26Remove unnecessary firmware version check for gc v9_4_2Candice Li1-0/+1
GC v9_4_2 uses a new versioning scheme for CP firmware, making the warning ("CP firmware version too old, please update!") irrelevant. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amdgpu: stop unmapping MQD for kernel queues v3Christian König7-432/+67
This looks unnecessary and actually extremely harmful since using kmap() is not possible while inside the ring reset. Remove all the extra mapping and unmapping of the MQDs. v2: also fix debugfs v3: fix coding style typo Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA"Jesse.zhang@amd.com2-133/+14
this temporarily reverts commit 6ec04e38b2f6 ("drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA") it cause a regression. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amdgpu: Parse all deferred errors with UMC aca handleXiang Liu9-19/+14
We should only increase the deferred errors in UMC block. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amdgpu: Update ta ras blockStanley.Yang3-0/+11
Update ta ra block to keep sync with RAS TA. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amdgpu: Add NPS2 to DPX compatible modeLijo Lazar1-1/+2
Compute partition DPX is possible in NPS2 mode. Update the compatible modes for DPX. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amdgpu: Use correct gfx deferred error countXiang Liu1-3/+4
In the case of parsing GFX deferred error from SMU corrected error channel, the error count should be set to 1 instead of parsing from MISC0 register, which is 0. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amd/display: Actually do immediate vblank disableLeo Li1-0/+2
[Why] The `vblank_config.offdelay` field follows the same semantics as the `drm_vblank_offdelay` parameter. Setting it to 0 will never disable vblank. [How] Set `offdelay` to a positive number. Fixes: e45b6716de4b ("drm/amd/display: use a more lax vblank enable policy for DCN35+") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amd/display: prevent hang on link training failBrendan Tam2-3/+10
[Why] When link training fails, the phy clock will be disabled. However, in enable_streams, it is assumed that link training succeeded and the mux selects the phy clock, causing a hang when a register write is made. [How] When enable_stream is hit, check if link training failed. If it did, fall back to the ref clock to avoid a hang and keep the system in a recoverable state. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Brendan Tam <Brendan.Tam@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26Revert "drm/amd/display: dml2 soc dscclk use DPM table clk setting"Charlene Liu1-1/+1
[why] this dscclk use DCN defined per DPM level will cause a DCFCLK increase. needs to follow up. This reverts commit 15b959534a39530a21d378190557cc8d1eab7b09 Reviewed-by: Yihan Zhu <yihan.zhu@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amd/display: Increase vblank offdelay for PSR panelsLeo Li1-8/+33
[Why] Depending on when the HW latching event (vupdate) of double-buffered registers happen relative to the PSR SDP (signals panel psr enter/exit) deadline, and how bad the Panel clock has drifted since the last ALPM off event, there can be up to 3 frames of delay between sending the PSR exit cmd to DMUB fw, and when the panel starts displaying live frames. This can manifest as micro-stuttering when userspace commit patterns cause rapid toggling of the DRM vblank counter, since PSR enter/exit is hooked up to DRM vblank disable/enable respectively. In the ideal world, the panel should present the live frame immediately on PSR exit cmd. But due to HW design and PSR limitations, immediate exit can only happen by chance, when: 1. PSR exit cmd is ack'd by FW before HW latching (vupdate) event, and 2. Panel's SDP deadline -- determined by it's PSR Start Delay in DPCD 71h -- is after the vupdate event. The PSR exit SDP can then be sent immediately after HW latches. Otherwise, we have to wait 1 frame. And 3. There is negligible drift between the panel's clock and source clock. Otherwise, there can be up to 1 frame of drift. Note that this delay is not expected with Panel Replay. [How] Since PSR power savings can be quite substantial, and there are a lot of systems in the wild with PSR panels, It'll be nice to have a middle ground that balances user experience with power savings. A simple way to achieve this is by extending the vblank offdelay, such that additional PSR exit delays will be less perceivable. We can set: 20/100 * offdelay_ms = 3_frames_ms => offdelay_ms = 5 * 3_frames_ms This ensures that `3_frames_ms` will only be experienced as a 20% delay on top how long the panel has been static, and thus make the delay less perceivable. If this ends up being too high of a percentage, it can be dropped further in a future change. Fixes: 537ef0f88897 ("drm/amd/display: use new vblank enable policy for DCN35+") Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amd: Handle being compiled without SI or CIK support betterMario Limonciello1-20/+24
If compiled without SI or CIK support but amdgpu tries to load it will run into failures with uninitialized callbacks. Show a nicer message in this case and fail probe instead. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4050 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amd/pm: Add zero RPM enabled OD setting support for SMU14.0.2Tomasz Pakuła1-1/+54
Hook up zero RPM enable for 9070 and 9070 XT based on RDNA3 (smu 13.0.0 and 13.0.7) code. Tested on 9070 XT Hellhound Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-26drm/amd/pm: Prevent division by zeroDenis Arefev1-2/+2
The user can set any speed value. If speed is greater than UINT_MAX/8, division by zero is possible. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: c52dcf49195d ("drm/amd/pp: Avoid divide-by-zero in fan_ctrl_set_fan_speed_rpm") Signed-off-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amd/pm: Prevent division by zeroDenis Arefev1-2/+2
The user can set any speed value. If speed is greater than UINT_MAX/8, division by zero is possible. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: c52dcf49195d ("drm/amd/pp: Avoid divide-by-zero in fan_ctrl_set_fan_speed_rpm") Signed-off-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amd/pm: Prevent division by zeroDenis Arefev1-1/+1
The user can set any speed value. If speed is greater than UINT_MAX/8, division by zero is possible. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 031db09017da ("drm/amd/powerplay/vega20: enable fan RPM and pwm settings V2") Signed-off-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amd/pm: Prevent division by zeroDenis Arefev1-0/+3
The user can set any speed value. If speed is greater than UINT_MAX/8, division by zero is possible. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b64625a303de ("drm/amd/pm: correct the address of Arcturus fan related registers") Signed-off-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-26drm/amd/pm: Prevent division by zeroDenis Arefev1-1/+1
The user can set any speed value. If speed is greater than UINT_MAX/8, division by zero is possible. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: c05d1c401572 ("drm/amd/swsmu: add aldebaran smu13 ip support (v3)") Signed-off-by: Denis Arefev <arefev@swemel.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2025-03-25drm/xe: Fix unmet direct dependencies warningYue Haibing1-1/+1
WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n] Selected by [m]: - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM [=m] && DRM_XE [=m] && DRM_XE [=m]=m [=m] && HAS_IOPORT [=y] DRM_XE_DISPLAY requires FB_IOMEM_HELPERS, but the dependency FB_CORE is missing, selecting FB_IOMEM_HELPERS if DRM_FBDEV_EMULATION is set as other drm drivers. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250323114103.1960511-1-yuehaibing@huawei.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 689582882802cd64986c1eb584c9f5184d67f0cf) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-25drm/xe: Set survivability mode before heci initLucas De Marchi1-2/+10
Commit d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci") tried to follow the logic: initialize everything needed and if everything succeeds, set the flag that it's enabled. While it fixed some corner cases of those calls failing, it was wrong for setting the flag after the call to xe_heci_gsc_init(): that function does a different initialization for survivability mode. Fix that and add comments about this being done on purpose. Suggested-by: Riana Tauro <riana.tauro@intel.com> Fixes: d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-2-fdb3559ea965@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 14efa739ca70514e8b923a02b5bcb42511dd1ee8) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-25drm/xe: Move survivability back to xeLucas De Marchi4-19/+34
Commit d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci") moved the survivability handling to be done entirely in the xe_pci layer. However there are some issues with that approach: 1) Survivability mode needs at least the mmio initialized, otherwise it can't really read a register to decide if it should enter that state 2) SR-IOV mode should be initialized, otherwise it's not possible to check if it's VF Besides, as pointed by Riana the check for xe_survivability_mode_enable() was wrong in xe_pci_probe() since it's not a bool return. Fix that by moving the initialization to be entirely in the xe_device layer, with the correct dependencies handled: only after mmio and sriov initialization, and not triggering it on error from wait_for_lmem_ready(). This restores the trigger behavior before that commit. The xe_pci layer now only checks for "is it enabled?", like it's doing in xe_pci_suspend()/xe_pci_remove(), etc. Cc: Riana Tauro <riana.tauro@intel.com> Fixes: d40f275d96e8 ("drm/xe: Move survivability entirely to xe_pci") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-1-fdb3559ea965@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 86b5e0dbba07438de91dd81095464c6c4aa7a372) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-25drm/xe: Apply Wa_16023105232Vinay Belgaumkar6-12/+74
The WA requires KMD to disable DOP clock gating during a semaphore wait and also ensure that idle delay for every CS is lower than the idle wait time in the PWRCTX_MAXCNT register. Default values for these registers already comply with this restriction. v2: Store timestamp_base in gt info and other comments (Daniele) v3: Skip WA check for VF v4: Review comments (Matt Roper) v5: Cleanup the clock functions and use reg_field_get (Matt Roper) v6: Fix checkpatch issue v7: Fix CI issue Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250320175123.3026754-1-vinay.belgaumkar@intel.com (cherry picked from commit 7c53ff050ba88bb37eed3e17f2bb8ec592d83302) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-25drm/xe/vf: Don't check CTC_MODE[0] if VFMichal Wajdeczko1-4/+11
Starting from commit 18778b5fdd01 ("drm/xe: Eliminate usage of TIMESTAMP_OVERRIDE") we access the CTC_MODE register only to warn if it has undocumented value. There is no point in doing that on the VF driver. While here, move this check to a helper function. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250311114042.1954-2-michal.wajdeczko@intel.com (cherry picked from commit fce3fb7b914bcd19341de8d8eff8bef371c2cddf) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-21drm/amd/pm: Update feature list for smu_v13_0_6Asad Kamal1-1/+5
Update feature list for smu_v13_0_6 to show vcn & smu deep sleep feature enable status Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu: Add parameter documentation for amdgpu_sync_fenceSrinivasan Shanmugam1-0/+1
The 'flags' parameter, which specifies memory allocation behavior while creating a sync entry, Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c:162: warning: Function parameter or struct member 'flags' not described in 'amdgpu_sync_fence' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/discovery: optionally use fw based ip discoveryAlex Deucher1-8/+32
On chips without native IP discovery support, use the fw binary if available, otherwise we can continue without it. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asicsFlora Cui1-1/+27
vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/discovery: check ip_discovery fw file availableFlora Cui1-15/+16
Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amd/pm: Remove unnecessay UQ10 to UINT conversionAsad Kamal1-2/+2
Few of the metrics data for smu_v13_0_12 has not been reported in Q10 format, remove UQ10 to UINT conversion for those Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amd/pm: Remove unnecessay UQ10 to UINT conversionAsad Kamal1-2/+2
Few of the metrics data for smu_v13_0_6 has not been reported in Q10 format, remove UQ10 to UINT conversion for those v2: Move smu_v13_0_12 changes to separate patch(Kevin) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMAJesse.zhang@amd.com2-14/+133
This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that all relevant page table cache levels (L1 PTEs, L2 PTEs, and L2 PDEs) are invalidated. - Modified the `sdma_v4_4_2_ring_emit_vm_flush` function to use the new `sdma_v4_4_2_get_invalidate_req` function. The updated function emits the necessary register writes and waits to perform a VM flush for the specified VMID. It updates the PTB address registers and issues a VM invalidation request using the specified VM invalidation engine. - Included the necessary header file `gc/gc_9_0_sh_mask.h` to provide access to the required register definitions. v2: vm flush by the vm inalidation packet (Lijo) v3: code stle and define thh macro for the vm invalidation packet (Christian) v4: Format definition sdma vm invalidate packet (Lijo) Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flushJesse.zhang@amd.com3-1/+57
- Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and avoids the issue of insufficient VM invalidation engines. - Add synchronization for GPU TLB flush operations in gmc_v9_0.c. Use spin_lock and spin_unlock to ensure thread safety and prevent race conditions during TLB flush operations. This improves the stability and reliability of the driver, especially in multi-threaded environments. v2: replace the sdma ring check with a function `amdgpu_sdma_is_page_queue` to check if a ring is an SDMA page queue.(Lijo) v3: Add GC version check, only enabled on GC9.4.3/9.4.4/9.5.0 v4: Fix code style and add more detailed description (Christian) v5: Remove dependency on vm_inv_eng loop order, explicitly lookup shared inv_eng(Christian/Lijo) v6: Added search shared ring function amdgpu_sdma_get_shared_ring (Lijo) Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amd/amdgpu: Increase max rings to enable SDMA page ringJesse.zhang@amd.com1-1/+1
Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu: Decode deferred error type in gfx aca bank parserXiang Liu3-10/+36
In the case of injecting uncorrected error with background workload, the deferred error among uncorrected errors need to be specified by checking the deferred and poison bits of status register. v2: refine checking for deferred error v2: log possiable DEs among CEs v2: generate CPER records for DEs among UEs Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUsSrinivasan Shanmugam1-0/+14
Enable the cleaner shader for GFX11.5.0/11.5.1 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX11.5.0/11.5.1 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Mario Sopena-Novales <mario.novales@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/mes: clean up SDMA HQD loopAlex Deucher1-5/+3
Follow the same logic as the other IP types. Reviewed-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/mes: enable compute pipes across all MECAlex Deucher1-2/+1
Enable pipes on both MECs for MES. Fixes: 745f46b6a99f ("drm/amdgpu: enable mes v12 self test") Acked-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/mes: drop MES 10.x leftoversAlex Deucher1-4/+1
Leftover from MES bring up. There is no production MES support for MES 10.x. The rest of the MES 10.x code has already been removed so drop this. Acked-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/mes: optimize compute loop handlingAlex Deucher1-1/+1
Break when we get to the end of the supported pipes rather than continuing the loop. Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/sdma: guilty tracking is per instanceAlex Deucher2-16/+18
The gfx and page queues are per instance, so track them per instance. v2: drop extra parameter (Lijo) Fixes: fdbfaaaae06b ("drm/amdgpu: Improve SDMA reset logic with guilty queue tracking") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu/sdma: fix engine reset handlingAlex Deucher4-15/+13
Move the kfd suspend/resume code into the caller. That is where the KFD is likely to detect a reset so on the KFD side there is no need to call them. Also add a mutex to lock the actual reset sequence. v2: make the locking per instance Fixes: bac38ca8c475 ("drm/amdkfd: implement per queue sdma reset for gfx 9.4+") Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu: remove invalid usage of sched.readyChristian König1-11/+0
I can't count how often I had to remove this nonsense. Probably doesn't need an explanation any more. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu: add cleaner shader trace pointChristian König2-0/+16
Note when the cleaner shader is executed. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu: add isolation trace pointChristian König2-0/+18
Note when we switch from one isolation owner to another. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21drm/amdgpu: stop reserving VMIDs to enforce isolationChristian König4-19/+6
That was quite troublesome for gang submit. Completely drop this approach and enforce the isolation separately. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>