<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/drivers/gpu/drm/scheduler, branch master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/drivers/gpu/drm/scheduler?h=master</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/drivers/gpu/drm/scheduler?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-10-25T11:14:36Z</updated>
<entry>
<title>drm/scheduler: fix fence ref counting</title>
<updated>2022-10-25T11:14:36Z</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2022-09-27T16:43:03Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=b3af84383e7abdc5e63435817bb73a268e7c3637'/>
<id>urn:sha1:b3af84383e7abdc5e63435817bb73a268e7c3637</id>
<content type='text'>
We leaked dependency fences when processes were beeing killed.

Additional to that grab a reference to the last scheduled fence.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20220929180151.139751-1-christian.koenig@amd.com
</content>
</entry>
<entry>
<title>Merge drm/drm-fixes into drm-misc-fixes</title>
<updated>2022-10-20T07:09:00Z</updated>
<author>
<name>Thomas Zimmermann</name>
<email>tzimmermann@suse.de</email>
</author>
<published>2022-10-20T07:09:00Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=1aca5ce036e3499336d1a2ace3070f908381c055'/>
<id>urn:sha1:1aca5ce036e3499336d1a2ace3070f908381c055</id>
<content type='text'>
Backmerging to get v6.1-rc1.

Signed-off-by: Thomas Zimmermann &lt;tzimmermann@suse.de&gt;
</content>
</entry>
<entry>
<title>drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flag</title>
<updated>2022-10-19T10:42:51Z</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2022-10-07T07:51:13Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=7b476affcccfc7e644541a0a719f53fc7bd34c53'/>
<id>urn:sha1:7b476affcccfc7e644541a0a719f53fc7bd34c53</id>
<content type='text'>
Setting this flag on a scheduler fence prevents pipelining of jobs
depending on this fence. In other words we always insert a full CPU
round trip before dependent jobs are pushed to the pipeline.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-1-christian.koenig@amd.com
</content>
</entry>
<entry>
<title>Revert "drm/sched: Use parent fence instead of finished"</title>
<updated>2022-10-07T02:58:39Z</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2022-10-07T02:40:50Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=bafaf67c42f4b547bf4fb329ac6dcb28b05de15e'/>
<id>urn:sha1:bafaf67c42f4b547bf4fb329ac6dcb28b05de15e</id>
<content type='text'>
This reverts commit e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86.

This is causing instability on Linus' desktop, and I'm seeing
oops with VK CTS runs.

netconsole got me the following oops:
[ 1234.778760] BUG: kernel NULL pointer dereference, address: 0000000000000088
[ 1234.778782] #PF: supervisor read access in kernel mode
[ 1234.778787] #PF: error_code(0x0000) - not-present page
[ 1234.778791] PGD 0 P4D 0
[ 1234.778798] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1234.778803] CPU: 7 PID: 805 Comm: systemd-journal Not tainted 6.0.0+ #2
[ 1234.778809] Hardware name: System manufacturer System Product
Name/PRIME X370-PRO, BIOS 5603 07/28/2020
[ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]
[ 1234.778828] Code: aa 0f 1d ce e9 57 ff ff ff 48 89 d7 e8 9d 8f 3f
ce e9 4a ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53
48 89 fb &lt;48&gt; 8b af 88 00 00 00 f0 ff 8d f0 00 00 00 48 8b 85 80 01 00
00 f0
[ 1234.778834] RSP: 0000:ffffabe680380de0 EFLAGS: 00010087
[ 1234.778839] RAX: ffffffffc04e9230 RBX: 0000000000000000 RCX: 0000000000000018
[ 1234.778897] RDX: 00000ba278e8977a RSI: ffff953fb288b460 RDI: 0000000000000000
[ 1234.778901] RBP: ffff953fb288b598 R08: 00000000000000e0 R09: ffff953fbd98b808
[ 1234.778905] R10: 0000000000000000 R11: ffffabe680380ff8 R12: ffffabe680380e00
[ 1234.778908] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff953fbd9ec458
[ 1234.778912] FS:  00007f35e7008580(0000) GS:ffff95428ebc0000(0000)
knlGS:0000000000000000
[ 1234.778916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1234.778919] CR2: 0000000000000088 CR3: 000000010147c000 CR4: 00000000003506e0
[ 1234.778924] Call Trace:
[ 1234.778981]  &lt;IRQ&gt;
[ 1234.778989]  dma_fence_signal_timestamp_locked+0x6a/0xe0
[ 1234.778999]  dma_fence_signal+0x2c/0x50
[ 1234.779005]  amdgpu_fence_process+0xc8/0x140 [amdgpu]
[ 1234.779234]  sdma_v3_0_process_trap_irq+0x70/0x80 [amdgpu]
[ 1234.779395]  amdgpu_irq_dispatch+0xa9/0x1d0 [amdgpu]
[ 1234.779609]  amdgpu_ih_process+0x80/0x100 [amdgpu]
[ 1234.779783]  amdgpu_irq_handler+0x1f/0x60 [amdgpu]
[ 1234.779940]  __handle_irq_event_percpu+0x46/0x190
[ 1234.779946]  handle_irq_event+0x34/0x70
[ 1234.779949]  handle_edge_irq+0x9f/0x240
[ 1234.779954]  __common_interrupt+0x66/0x100
[ 1234.779960]  common_interrupt+0xa0/0xc0
[ 1234.779965]  &lt;/IRQ&gt;
[ 1234.779968]  &lt;TASK&gt;
[ 1234.779971]  asm_common_interrupt+0x22/0x40
[ 1234.779976] RIP: 0010:finish_mkwrite_fault+0x22/0x110
[ 1234.779981] Code: 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 41
54 55 48 89 fd 53 48 8b 07 f6 40 50 08 0f 84 eb 00 00 00 48 8b 45 30
48 8b 18 &lt;48&gt; 89 df e8 66 bd ff ff 48 85 c0 74 0d 48 89 c2 83 e2 01 48
83 ea
[ 1234.779985] RSP: 0000:ffffabe680bcfd78 EFLAGS: 00000202

Revert it for now and figure it out later.

Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
</content>
</entry>
<entry>
<title>drm/sched: Use parent fence instead of finished</title>
<updated>2022-09-16T13:53:25Z</updated>
<author>
<name>Arvind Yadav</name>
<email>Arvind.Yadav@amd.com</email>
</author>
<published>2022-09-14T16:43:20Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86'/>
<id>urn:sha1:e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86</id>
<content type='text'>
Using the parent fence instead of the finished fence
to get the job status. This change is to avoid GPU
scheduler timeout error which can cause GPU reset.

Signed-off-by: Arvind Yadav &lt;Arvind.Yadav@amd.com&gt;
Reviewed-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20220914164321.2156-6-Arvind.Yadav@amd.com
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/scheduler: quieten kernel-doc warnings</title>
<updated>2022-09-06T20:14:28Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2022-04-04T21:30:40Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=f8ad757e40c9c776a13eaa56d73e8e62381517b6'/>
<id>urn:sha1:f8ad757e40c9c776a13eaa56d73e8e62381517b6</id>
<content type='text'>
Fix kernel-doc warnings in gpu_scheduler.h and sched_main.c.

Quashes these warnings:

include/drm/gpu_scheduler.h:332: warning: missing initial short description on line:
 * struct drm_sched_backend_ops
include/drm/gpu_scheduler.h:412: warning: missing initial short description on line:
 * struct drm_gpu_scheduler
include/drm/gpu_scheduler.h:461: warning: Function parameter or member 'dev' not described in 'drm_gpu_scheduler'

drivers/gpu/drm/scheduler/sched_main.c:201: warning: missing initial short description on line:
 * drm_sched_dependency_optimized
drivers/gpu/drm/scheduler/sched_main.c:995: warning: Function parameter or member 'dev' not described in 'drm_sched_init'

Fixes: 2d33948e4e00 ("drm/scheduler: add documentation")
Fixes: 8ab62eda177b ("drm/sched: Add device pointer to drm_gpu_scheduler")
Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Cc: Nayan Deshmukh &lt;nayan26deshmukh@gmail.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Jiawei Gu &lt;Jiawei.Gu@amd.com&gt;
Cc: dri-devel@lists.freedesktop.org
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20220404213040.12912-1-rdunlap@infradead.org
</content>
</entry>
<entry>
<title>drm/sched: move calling drm_sched_entity_select_rq</title>
<updated>2022-07-19T15:22:25Z</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2022-07-13T16:14:52Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=6d602e031103fb78dbe50dbf57a5f29737494c6f'/>
<id>urn:sha1:6d602e031103fb78dbe50dbf57a5f29737494c6f</id>
<content type='text'>
We already discussed that the call to drm_sched_entity_select_rq() needs
to move to drm_sched_job_arm() to be able to set a new scheduler list
between _init() and _arm(). This was just not applied for some reason.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20220714103902.7084-2-christian.koenig@amd.com
</content>
</entry>
<entry>
<title>Merge tag 'amd-drm-next-5.20-2022-07-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next</title>
<updated>2022-07-12T01:07:32Z</updated>
<author>
<name>Dave Airlie</name>
<email>airlied@redhat.com</email>
</author>
<published>2022-07-12T01:07:30Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=344feb7ccf764756937cfd74fa4ac5caba069c99'/>
<id>urn:sha1:344feb7ccf764756937cfd74fa4ac5caba069c99</id>
<content type='text'>
amd-drm-next-5.20-2022-07-05:

amdgpu:
- Various spelling and grammer fixes
- Various eDP fixes
- Various DMCUB fixes
- VCN fixes
- GMC 11 fixes
- RAS fixes
- TMZ support for GC 10.3.7
- GPUVM TLB flush fixes
- SMU 13.0.x updates
- DCN 3.2 Support
- DCN 3.2.1 Support
- MES updates
- GFX11 modifiers support
- USB-C fixes
- MMHUB 3.0.1 support
- SDMA 6.0 doorbell fixes
- Initial devcoredump support
- Enable high priority gfx queue on asics which support it
- Enable GPU reset for SMU 13.0.4
- OLED display fixes
- MPO fixes
- DC frame size fixes
- ASPM support for PCIE 7.4/7.6
- GPU reset support for SMU 13.0.0
- GFX11 updates
- VCN JPEG fix
- BACO support for SMU 13.0.7
- VCN instance handling fix
- GFX8 GPUVM TLB flush fix
- GPU reset rework
- VCN 4.0.2 support
- GTT size fixes
- DP link training fixes
- LSDMA 6.0.1 support
- Various backlight fixes
- Color encoding fixes
- Backlight config cleanup
- VCN 4.x unified queue cleanup

amdkfd:
- MMU notifier fixes
- Updates for GC 10.3.6 and 10.3.7
- P2P DMA support using dma-buf
- Add available memory IOCTL
- SDMA 6.0.1 fix
- MES fixes
- HMM profiler support

radeon:
- License fix
- Backlight config cleanup

UAPI:
- Add available memory IOCTL to amdkfd
  Proposed userspace: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html
- HMM profiler support for amdkfd
  Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080805.html

Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20220705212633.6037-1-alexander.deucher@amd.com
</content>
</entry>
<entry>
<title>drm/sched: Partial revert of 'drm/sched: Keep s_fence-&gt;parent pointer'</title>
<updated>2022-06-28T15:24:31Z</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2022-06-20T20:39:47Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=45ecaea738830b9d521c93520c8f201359dcbd95'/>
<id>urn:sha1:45ecaea738830b9d521c93520c8f201359dcbd95</id>
<content type='text'>
Problem:
This patch caused negative refcount as described in [1] because
for that case parent fence did not signal by the time of drm_sched_stop and hence
kept in pending list the assumption was they will not signal and
so fence was put to account for the s_fence-&gt;parent refcount but for
amdgpu which has embedded HW fence (always same parent fence)
drm_sched_fence_release_scheduled was always called and would
still drop the count for parent fence once more. For jobs that
never signaled this imbalance was masked by refcount bug in
amdgpu_fence_driver_clear_job_fences that would not drop
refcount on the fences that were removed from fence drive
fences array (against prevois insertion into the array in
get in amdgpu_fence_emit).

Fix:
Revert this patch and by setting s_job-&gt;s_fence-&gt;parent to NULL
as before prevent the extra refcount drop in amdgpu when
drm_sched_fence_release_scheduled is called on job release.

Also - align behaviour in drm_sched_resubmit_jobs_ext with that of
drm_sched_main when submitting jobs - take a refcount for the
new parent fence pointer and drop refcount for original kref_init
for new HW fence creation (or fake new HW fence in amdgpu - see next patch).

[1] - https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4db0@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Tested-by: Yiqing Yao &lt;yiqing.yao@amd.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
</entry>
<entry>
<title>drm/scheduler: Don't kill jobs in interrupt context</title>
<updated>2022-05-17T14:06:41Z</updated>
<author>
<name>Dmitry Osipenko</name>
<email>dmitry.osipenko@collabora.com</email>
</author>
<published>2022-04-11T22:15:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=7d64c40a7d96190d9d06e240305389e025295916'/>
<id>urn:sha1:7d64c40a7d96190d9d06e240305389e025295916</id>
<content type='text'>
Interrupt context can't sleep. Drivers like Panfrost and MSM are taking
mutex when job is released, and thus, that code can sleep. This results
into "BUG: scheduling while atomic" if locks are contented while job is
freed. There is no good reason for releasing scheduler's jobs in IRQ
context, hence use normal context to fix the trouble.

Cc: stable@vger.kernel.org
Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes")
Signed-off-by: Dmitry Osipenko &lt;dmitry.osipenko@collabora.com&gt;
Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20220411221536.283312-1-dmitry.osipenko@collabora.com
</content>
</entry>
</feed>
