drm/amdgpu:fix gpu recover missing skipping(v2)

if app close CTX right after IB submit, gpu recover will fail to find out the entity behind this guilty job thus lead to no job skipping for this guilty job. to fix this corner case just move the increasement of job->karma out of the entity iteration. v2: only do karma increasment if bad->s_priority != KERNEL because we always consider KERNEL job be correct and always want to recover an unfinished kernel job (sometimes kernel job is interrupted by VF FLR or other GPU hang event) Signed-off-by: Monk Liu <Monk.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-By: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
author: Monk Liu <Monk.Liu@amd.com> 2017-11-08 14:35:04 +0800
committer: Alex Deucher <alexander.deucher@amd.com> 2017-12-04 16:41:46 -0500
commit: cfb83b1d9c38c29c3c89e8d242b8e7f0148d6c09 (patch)
tree: 8556b9a53600baf9ada180d981f8e4407785a681
parent: drm/amdgpu:read VRAMLOST from gim (diff)
download: linux-dev-cfb83b1d9c38c29c3c89e8d242b8e7f0148d6c09.tar.xz
linux-dev-cfb83b1d9c38c29c3c89e8d242b8e7f0148d6c09.zip
1 files changed, 3 insertions, 2 deletions
diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 941b5920b97b..53ea7e12d219 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -463,7 +463,8 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched, struct amd_sched_jo
 	}
 	spin_unlock(&sched->job_list_lock);
 
-	if (bad) {
+	if (bad && bad->s_priority != AMD_SCHED_PRIORITY_KERNEL) {
+		atomic_inc(&bad->karma);
 		/* don't increase @bad's karma if it's from KERNEL RQ,
 		 * becuase sometimes GPU hang would cause kernel jobs (like VM updating jobs)
 		 * corrupt but keep in mind that kernel jobs always considered good.
@@ -474,7 +475,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched, struct amd_sched_jo
 			spin_lock(&rq->lock);
 			list_for_each_entry_safe(entity, tmp, &rq->entities, list) {
 				if (bad->s_fence->scheduled.context == entity->fence_context) {
-				    if (atomic_inc_return(&bad->karma) > bad->sched->hang_limit)
+				    if (atomic_read(&bad->karma) > bad->sched->hang_limit)
 						if (entity->guilty)
 							atomic_set(entity->guilty, 1);
 					break;
author	Monk Liu <Monk.Liu@amd.com>	2017-11-08 14:35:04 +0800
committer	Alex Deucher <alexander.deucher@amd.com>	2017-12-04 16:41:46 -0500
commit	cfb83b1d9c38c29c3c89e8d242b8e7f0148d6c09 (patch)
tree	8556b9a53600baf9ada180d981f8e4407785a681
parent	drm/amdgpu:read VRAMLOST from gim (diff)
download	linux-dev-cfb83b1d9c38c29c3c89e8d242b8e7f0148d6c09.tar.xz linux-dev-cfb83b1d9c38c29c3c89e8d242b8e7f0148d6c09.zip