aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-09-16drm/amdgpu: Add a kernel parameter for specifying the asic typeYong Zhao1-1/+6
As more and more new asics start to reuse the old device IDs before launch, there is a need to quickly override the existing asic type corresponding to the reused device ID through a kernel parameter. With this, engineers no longer need to rely on local hack patches, facilitating cooperation across teams. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13drm/amdgpu: move the call of ras recovery_init and bad page reserve to proper placeTao Zhou1-5/+0
ras recovery_init should be called after ttm init, bad page reserve should be put in front of gpu reset since i2c may be unstable during gpu reset. add cleanup for recovery_init and recovery_fini v2: add more comment and print. remove cancel_work_sync in recovery_init. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13drm/amdkfd: Query kfd device info by CHIP id instead of pci device idYong Zhao1-1/+1
This optimizes out the pci device id usage in KFD and makes the code more maintainable. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13dmr/amdgpu: Add system auto reboot to RAS.Andrey Grodzovsky1-0/+14
In case of RAS error allow user configure auto system reboot through ras_ctrl. This is also part of the temproray work around for the RAS hang problem. v4: Use latest kernel API for disk sync. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13drm/amdgpu: Avoid HW GPU reset for RAS.Andrey Grodzovsky1-10/+28
Problem: Under certain conditions, when some IP bocks take a RAS error, we can get into a situation where a GPU reset is not possible due to issues in RAS in SMU/PSP. Temporary fix until proper solution in PSP/SMU is ready: When uncorrectable error happens the DF will unconditionally broadcast error event packets to all its clients/slave upon receiving fatal error event and freeze all its outbound queues, err_event_athub interrupt will be triggered. In such case and we use this interrupt to issue GPU reset. THe GPU reset code is modified for such case to avoid HW reset, only stops schedulers, deatches all in progress and not yet scheduled job's fences, set error code on them and signals. Also reject any new incoming job submissions from user space. All this is done to notify the applications of the problem. v2: Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c Remove print param from amdgpu_ras_query_error_count v3: Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset for other XGMI hive memebers. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-09-13drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case.Andrey Grodzovsky1-13/+10
Issue 1: In XGMI case amdgpu_device_lock_adev for other devices in hive was called to late, after access to their repsective schedulers. So relocate the lock to the begining of accessing the other devs. Issue 2: Using amdgpu_device_ip_need_full_reset to switch the device list from all devices in hive to the single 'master' device who owns this reset call is wrong because when stopping schedulers we iterate all the devices in hive but when restarting we will only reactivate the 'master' device. Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS needed we then have to stop schedulers for all devices in hive and not only the 'master' but with amdgpu_device_ip_need_full_reset we already missed the opprotunity do to so. So just remove this logic and always stop and start all schedulers for all devices in hive. Also minor cleanup and print fix. v4: Minor coding style fix. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-30drm/amdgpu: Handle job is NULL use case in amdgpu_device_gpu_recoverAndrey Grodzovsky1-6/+4
This should be checked at all places job is accessed. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-29drm/amdgpu: Enable DC on RenoirRoman Li1-0/+3
Enable DC support for renoir. Acked-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-29drm/amdgpu: introduce vram lost for reset (v2)Monk Liu1-2/+2
for SOC15/vega10 the BACO reset & mode1 would introduce vram lost in high end address range, current kmd's vram lost checking cannot catch it since it only check very ahead visible frame buffer v2: cover NV as well Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15drm/amdgpu: Use new mode2 reset interface for RV.Andrey Grodzovsky1-0/+1
Integrate the mode2 reset into rest sequence. v2: Check ppfuncs pointer for NULL Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-12drm/amdgpu: add renoir support for gpu_info and ip block settingHuang Rui1-1/+7
This patch adds renoir support for gpu_info firmware and ip block setting. Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-12drm/amdgpu: add renoir asic_type enumHuang Rui1-0/+1
This patch adds renoir to amd_asic_type enum and amdgpu_asic_name[]. Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-09drm/amdgpu: remove RREG64/WREG64Tao Zhou1-33/+0
atomic 64 bits REG operations are useless currently Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-09drm/amdgpu: replace readq/writeq with atomic64 operationsTao Zhou1-6/+2
what we really want is a read or write that is guaranteed to be 64 bits at a time, atomic64 operations are supported on all architectures Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-06drm/amdgpu: Fix GPU reset crash regression.Andrey Grodzovsky1-0/+2
amdgpu_ip_block.status.hw for GMC wasn't set to false on suspend during GPU reset and so on resume gmc_v9_0_resume wasn't called. Caused by 'drm/amdgpu: fix double ucode load by PSP(v3)' Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-06drm/amdgpu: Fix panic during gpu resetxinhui pan1-1/+1
Clear the flag after hw suspend, otherwise it skips the corresponding hw resume. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: Add nv12 DC ip blockLeo Li1-0/+1
Load DC and amdgpu display manager Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: set asic family and ip blocks for navi12Xiaojie Yuan1-0/+1
same with navi10 Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: add gpu_info firmware for navi12Xiaojie Yuan1-0/+4
gpu_info firmare store asic configuration details. Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: add navi12 asic typeXiaojie Yuan1-0/+1
Add asic type. Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: fix double ucode load by PSP(v3)Monk Liu1-21/+38
previously the ucode loading of PSP was repreated, one executed in phase_1 init/re-init/resume and the other in fw_loading routine Avoid this double loading by clearing ip_blocks.status.hw in suspend or reset prior to the FW loading and any block's hw_init/resume v2: still do the smu fw loading since it is needed by bare-metal v3: drop the change in reinit_early_sriov, just clear all block's status.hw in the head place and set the status.hw after hw_init done is enough Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: cleanup vega10 SRIOV code pathMonk Liu1-3/+0
we can simplify all those unnecessary function under SRIOV for vega10 since: 1) PSP L1 policy is by force enabled in SRIOV 2) original logic always set all flags which make itself a dummy step besides, 1) the ih_doorbell_range set should also be skipped for VEGA10 SRIOV. 2) the gfx_common registers should also be skipped for VEGA10 SRIOV. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add RREG64/WREG64(_PCIE) operationsTao Zhou1-0/+73
add 64 bits register access functions v2: implement 64 bit functions in low level Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-30drm/amdgpu: put the SMC into the proper state on reset/unloadAlex Deucher1-0/+27
When doing a GPU reset or unloading the driver, we need to put the SMU into the apprpriate state for the re-init after the reset or unload to reliably work. I don't think this is necessary for BACO because the SMU actually controls the BACO state to it needs to be active. For suspend (S3), the asic is put into D3 so the SMU would be powered down so I don't think we need to put the SMU into any special state. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: add Arcturus gpu info firmwareLe Ma1-0/+4
Add GPU info firmware for Arcturus. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: add to set Arcturus ip blocksLe Ma1-0/+1
Add IP blocks for Arcturus. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: add Arcturus asic typeLe Ma1-0/+1
Add asic type for Arcturus. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amd/display: add dm blockBhawanpreet Lakha1-0/+1
enable DC for navi14. Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: set asic family and ip blocks for navi14Xiaojie Yuan1-0/+1
same with navi10 Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: add gpu_info firmware for navi14Xiaojie Yuan1-0/+4
Add navi14 to case statement to load the GPU info firmware. Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: add navi14 asic typeXiaojie Yuan1-0/+1
Add CHIP_NAVI14 to the list of asic types. Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-09drm/amdgpu/psp: add a mutex to protect access to the psp ringAlex Deucher1-0/+1
We need to serialize access to the psp ring if there are multiple callers at runtime. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-09drm/amdgpu: properly guard the generic discovery codeAlex Deucher1-2/+2
It's only available on navi and newer. Reviewed-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-08amdgpu: make pmu support optionalArnd Bergmann1-2/+4
When CONFIG_PERF_EVENTS is disabled, we cannot compile the pmu portion of the amdgpu driver: drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c:48:38: error: no member named 'hw' in 'struct perf_event' struct hw_perf_event *hwc = &event->hw; ~~~~~ ^ drivers/gpu/drm/amd/amdgpu/amdgpu_pmu.c:51:13: error: no member named 'attr' in 'struct perf_event' if (event->attr.type != event->pmu->type) ~~~~~ ^ ... Use conditional compilation for this file. Fixes: 9c7c85f7ea1f ("drm/amdgpu: add pmu counters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-05drm/amdgpu: Disable ras features on all IPs before gpu resetxinhui pan1-0/+4
Perform a ras_suspend to disable ras on all IPs to workaround some ROCm stability issue. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-01drm/amdgpu: enable PCIE atomics ops supportJack Xiao1-0/+11
GPU atomics operation depends on PCIE atomics support. Always enable PCIE atomics ops support in case that it hasn't been enabled. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-01drm/amdgpu: fix MGPU fan boost enablement for XGMI resetEvan Quan1-0/+13
MGPU fan boost feature should not be enabled until all the devices from the same hive are all back from reset. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-25Merge branch 'drm-next' into drm-next-5.3Alex Deucher1-1/+2
Backmerge drm-next and fix up conflicts due to drmP.h removal. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-22drm/amdgpu: Enable DC support for Navi10Harry Wentland1-0/+3
Enable the IP for navi10. Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21drm/amd/display: Read soc_bounding_box from gpu_info (v2)Harry Wentland1-1/+9
[WHY] We don't want to expose sensitive ASIC information before ASIC release. [HOW] Encode the soc_bounding_box in the gpu_info FW (for Linux) and read it at driver load. v2: fix warning when CONFIG_DRM_AMD_DC_DCN2_0 is not set (Alex) Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21drm/amdgpu: add to set navi ip blocksHuang Rui1-0/+8
Set the IPs for navi10 in early_init like other asics. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21drm/amdgpu: update golden setting programming logicHawking Zhang1-1/+4
Since from soc15, make sure only AndMasked bit get changed when applied or_mask Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21drm/amdgpu/mes: enable mes on navi10 and later asicJack Xiao1-0/+3
When amdgpu_mes is enabled and asic family is navi10 and later asic, enable mes per device. Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21drm/amdgpu/discovery: add module param for ip discovery enablementXiaojie Yuan1-0/+10
to control enablement. Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21drm/amdgpu: add mcbp unit test in debugfs (v3)Jack Xiao1-0/+1
The MCBP unit test is used to test the functionality of MCBP. It emualtes to send preemption request and resubmit the unfinished jobs. v2: squash in fixes (Alex) v3: squash in memory leak fix (Jack) Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21drm/amdgpu: enable the static csa when mcbp enabledJack Xiao1-1/+1
CSA is the Context Save Area for preemption. Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-21drm/amdgpu: add mcbp driver parameterJack Xiao1-0/+3
Add mcbp driver parameter, so that mcbp feature can be enabled/disabled by driver parameter. Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-20drm/amdgpu: parse the new members added by gpu_info ucode v1_1Hawking Zhang1-0/+9
Parse the new parameters for gfx10. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-20drm/amdgpu: add navi10 gpu info firmwareHuang Rui1-0/+4
gpu info firmware stores configuration data for various IP blocks. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-20drm/amdgpu: add navi10 asic typeHuang Rui1-0/+1
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>