aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/i915_scheduler.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-07-28drm/i915: move scheduler slabs to direct module init/exitDaniel Vetter1-24/+15
With the global kmem_cache shrink infrastructure gone there's nothing special and we can convert them over. I'm doing this split up into each patch because there's quite a bit of noise with removing the static global.slab_dependencies|priorities to just a slab_dependencies|priorities. v2: Make slab static (Jason, 0day) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727121037.2041102-8-daniel.vetter@ffwll.ch
2021-07-27drm/i915/guc: Implement GuC priority managementMatthew Brost1-0/+7
Implement a simple static mapping algorithm of the i915 priority levels (int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as follows: i915 level < 0 -> GuC low level (3) i915 level == 0 -> GuC normal level (2) i915 level < INT_MAX -> GuC high level (1) i915 level == INT_MAX -> GuC highest level (0) We believe this mapping should cover the UMD use cases (3 distinct user levels + 1 kernel level). In addition to static mapping, a simple counter system is attached to each context tracking the number of requests inflight on the context at each level. This is needed as the GuC levels are per context while in the i915 levels are per request. v2: (Daniele) - Add BUILD_BUG_ON to enforce ordering of priority levels - Add missing lockdep to guc_prio_fini - Check for return before setting context registered flag - Map DISPLAY priority or higher to highest guc prio - Update comment for guc_prio Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-33-matthew.brost@intel.com
2021-07-27drm/i915: Reset GPU immediately if submission is disabledMatthew Brost1-0/+6
If submission is disabled by the backend for any reason, reset the GPU immediately in the heartbeat code as the backend can't be reenabled until the GPU is reset. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-10-matthew.brost@intel.com
2021-07-27drm/i915: Add i915_sched_engine destroy vfuncMatthew Brost1-1/+2
This is required to allow backend specific cleanup v2: (John H) - Rework commit message Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-7-matthew.brost@intel.com
2021-07-22drm/i915: Ditch i915 globals shrink infrastructureDaniel Vetter1-7/+0
This essentially reverts commit 84a1074920523430f9dc30ff907f4801b4820072 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jan 24 11:36:08 2018 +0000 drm/i915: Shrink the GEM kmem_caches upon idling mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it then we need to fix that there, not hand-roll our own slab shrinking code in i915. Also when this was added there was only one other caller of kmem_cache_shrink (added 2005 to the acpi code). Now there's a 2nd one outside of i915 code in a kunit test, which seems legit since that wants to very carefully control what's in the kmem_cache. This out of a total of over 500 calls to kmem_cache_create. This alone should have been warning sign enough that we're doing something silly. Noticed while reviewing a patch set from Jason to fix up some issues in our i915_init() and i915_exit() module load/cleanup code. Now that i915_globals.c isn't any different than normal init/exit functions, we should convert them over to one unified table and remove i915_globals.[hc] entirely. v2: Improve commit message (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: David Airlie <airlied@linux.ie> Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210721183229.4136488-1-daniel.vetter@ffwll.ch
2021-06-18drm/i915: Move submission tasklet to i915_sched_engineMatthew Brost1-0/+1
The submission tasklet operates on i915_sched_engine, thus it is the correct place for it. v3: (Jason Ekstrand) Change sched_engine->engine to a void* private data pointer Add kernel doc v4: (Daniele) Update private_data comment Set queue_priority_hint in kick_execlists v5: (CI) Rebase and fix build error Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-9-matthew.brost@intel.com
2021-06-18drm/i915: Update i915_scheduler to operate on i915_sched_engineMatthew Brost1-23/+23
Rather passing around an intel_engine_cs in the scheduling code, pass around a i915_sched_engine. v3: (Jason Ekstrand) Add READ_ONCE around rq->engine in lock_sched_engine Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-8-matthew.brost@intel.com
2021-06-18drm/i915: Add kick_backend function to i915_sched_engineMatthew Brost1-60/+2
Not all back-ends require a kick after a scheduling update, so make the kick a call-back function that the back-end can opt-in to. Also move the current kick function from the scheduler to the execlists file as it is specific to that back-end. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-7-matthew.brost@intel.com
2021-06-18drm/i915: Move active tracking to i915_sched_engineMatthew Brost1-7/+23
Move active request tracking and its lock to i915_sched_engine. This lock is also the submission lock so having it in the i915_sched_engine is the correct place. v3: (Jason Ekstrand) Add kernel doc v6: Rebase Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.comk> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-5-matthew.brost@intel.com
2021-06-18drm/i915: Move priolist to new i915_sched_engine objectMatthew Brost1-13/+40
Introduce i915_sched_engine object which is lower level data structure that i915_scheduler / generic code can operate on without touching execlist specific structures. This allows additional submission backends to be added without breaking the layering. Currently the execlists backend uses 1 of these object per each engine (physical or virtual) but future backends like the GuC will point to less instances utilizing the reference counting. This is a bit of detour to integrating the i915 with the DRM scheduler but this object will still exist when the DRM scheduler lands in the i915. It will however look a bit different. It will encapsulate the drm_gpu_scheduler object plus and common variables (to the backends) related to scheduling. Regardless this is a step in the right direction. This patch starts the aforementioned transition by moving the priolist into the i915_sched_engine object. v3: (Jason Ekstrand) Update comment next to intel_engine_cs.virtual Add kernel doc (Checkpatch) Fix double the in commit message v4: (Daniele) Update comment message. Add comment about subclass field Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-2-matthew.brost@intel.com
2021-03-24drm/i915: Remove I915_USER_PRIORITY_SHIFTChris Wilson1-1/+0
As we do not have any internal priority levels, the priority can be set directed from the user values. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210120121439.17600-2-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-03-24drm/i915: Strip out internal prioritiesChris Wilson1-45/+6
Since we are not using any internal priority levels, and in the next few patches will introduce a new index for which the optimisation is not so lear cut, discard the small table within the priolist. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210120121439.17600-1-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2021-01-15drm/i915: Mark up protected uses of 'i915_request_completed'Chris Wilson1-1/+1
When we know that we are inside the timeline mutex, or inside the submission flow (under active.lock or the holder's rcu lock), we know that the rq->hwsp is stable and we can use the simpler direct version. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210114135612.13210-1-chris@chris-wilson.co.uk
2020-12-24drm/i915/gt: Replace direct submit with direct call to taskletChris Wilson1-4/+0
Rather than having special case code for opportunistically calling process_csb() and performing a direct submit while holding the engine spinlock for submitting the request, simply call the tasklet directly. This allows us to retain the direct submission path, including the CS draining to allow fast/immediate submissions, without requiring any duplicated code paths, and most importantly greatly simplifying the control flow by removing reentrancy. This will enable us to close a few races in the virtual engines in the next few patches. The trickiest part here is to ensure that paired operations (such as schedule_in/schedule_out) remain under consistent locking domains, e.g. when pulled outside of the engine->active.lock v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous"). v3: Update engine-reset to be tasklet aware Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201224135544.1713-1-chris@chris-wilson.co.uk
2020-11-19drm/i915: Show timeline dependencies for debugTvrtko Ursulin1-0/+28
Include the signalers each request in the timeline is waiting on, as a means to try and identify the cause of a stall. This can be quite verbose, even as for now we only show each request in the timeline and its immediate antecedents. This generates output like: Timeline 886: { count 1, ready: 0, inflight: 0, seqno: { current: 664, last: 666 }, engine: rcs0 } U 886:29a- prio=0 @ 134ms: gem_exec_parall<4621> U bc1:27a- prio=0 @ 134ms: gem_exec_parall[4917] Timeline 825: { count 1, ready: 0, inflight: 0, seqno: { current: 802, last: 804 }, engine: vcs0 } U 825:324 prio=0 @ 107ms: gem_exec_parall<4518> U b75:140- prio=0 @ 110ms: gem_exec_parall<5486> Timeline b46: { count 1, ready: 0, inflight: 0, seqno: { current: 782, last: 784 }, engine: vcs0 } U b46:310- prio=0 @ 70ms: gem_exec_parall<5428> U c11:170- prio=0 @ 70ms: gem_exec_parall[5501] Timeline 96b: { count 1, ready: 0, inflight: 0, seqno: { current: 632, last: 634 }, engine: vcs0 } U 96b:27a- prio=0 @ 67ms: gem_exec_parall<4878> U b75:19e- prio=0 @ 67ms: gem_exec_parall<5486> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-6-chris@chris-wilson.co.uk
2020-05-25drm/i915: Don't set queue-priority hint when supressing the rescheduleChris Wilson1-8/+8
We recorded the execlists->queue_priority_hint update for the inflight request without kicking the tasklet. The next submitted request then failed to be scheduled as it had a lower priority than the hint, leaving the HW running with only the inflight request. Fixes: 6cebcf746f3f ("drm/i915: Tweak scheduler's kick_submission()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200519063123.20673-1-chris@chris-wilson.co.uk (cherry picked from commit b86fc6e5e89e5645b43f57171c26740ef38f9f4a) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2020-05-14drm/i915: Drop no-semaphore boostingChris Wilson1-7/+4
Now that we have fast timeslicing on semaphores, we no longer need to prioritise none-semaphore work as we will yield any work blocked on a semaphore to the next in the queue. Previously with no timeslicing, blocking on the semaphore caused extremely bad scheduling with multiple clients utilising multiple rings. Now, there is no impact and we can remove the complication. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200513173504.28322-1-chris@chris-wilson.co.uk
2020-05-07drm/i915: Remove wait priority boostingChris Wilson1-11/+1
Upon waiting a request (when asked), we gave that request a small priority boost, not enough for it to cause preemption, but enough for it to be scheduled next before all equals. We also used that bit to give new clients a small priority boost, similar to FQ_CODEL, such that we favoured short interactive tasks ahead of long running streams. However, this is causing lots of complications with timeslicing where we both want to honour the boost and yet ignore it. Those complications cause unexpected user behaviour (tasks not being timesliced and run concurrently as epxected), and the easiest way to resolve that is to remove the boost. Hopefully, we can find a compromise again if we need to, but in theory timeslicing itself and future more advanced schedulers should give us the interactivity boost we seek. Testcase: igt/gem_exec_schedule/lateslice Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk
2020-05-07drm/i915: Mark concurrent submissions with a weak-dependencyChris Wilson1-3/+3
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could correctly perform priority inheritance from the parallel branches to the common trunk. However, for the purpose of timeslicing and reset handling, the dependency is weak -- as we the pair of requests are allowed to run in parallel and not in strict succession. The real significance though is that this allows us to rearrange groups of WAIT_FOR_SUBMIT linked requests along the single engine, and so can resolve user level inter-batch scheduling dependencies from user semaphores. Fixes: c81471f5e95c ("drm/i915: Copy across scheduler behaviour flags across submit fences") Testcase: igt/gem_exec_fence/submit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk
2020-03-31drm/i915/gt: Include a few tracek for timeslicingChris Wilson1-0/+6
Add a few telltales to see when timeslicing is being enabled. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200331120502.14713-1-chris@chris-wilson.co.uk
2020-03-31drm/i915: Defer kicking the tasklet until all rescheduling is completeChris Wilson1-0/+4
Since we may kick more than engine, and may kick each one a couple of times, coalesce the tasklet execution to the end. This also ensures that we have the chance to run the tasklet immediately after priority bumping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200331114852.11583-1-chris@chris-wilson.co.uk
2020-03-10drm/i915: Tweak scheduler's kick_submission()Chris Wilson1-1/+2
Skip useless priority bumping on adding a new dependency by making sure that we do update the priority if we would have rescheduled the active cotnext. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200310115947.6482-1-chris@chris-wilson.co.uk
2020-03-09drm/i915/execlsts: Mark up racy inspection of current i915_request priorityChris Wilson1-1/+1
[ 120.176548] BUG: KCSAN: data-race in __i915_schedule [i915] / effective_prio [i915] [ 120.176566] [ 120.176577] write to 0xffff8881e35e6540 of 4 bytes by task 730 on cpu 3: [ 120.176792] __i915_schedule+0x63e/0x920 [i915] [ 120.177007] __bump_priority+0x63/0x80 [i915] [ 120.177220] __i915_sched_node_add_dependency+0x258/0x300 [i915] [ 120.177438] i915_sched_node_add_dependency+0x50/0xa0 [i915] [ 120.177654] i915_request_await_dma_fence+0x1da/0x530 [i915] [ 120.177867] i915_request_await_object+0x2fe/0x470 [i915] [ 120.178081] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915] [ 120.178292] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 120.178309] drm_ioctl_kernel+0xe4/0x120 [ 120.178322] drm_ioctl+0x297/0x4c7 [ 120.178335] ksys_ioctl+0x89/0xb0 [ 120.178348] __x64_sys_ioctl+0x42/0x60 [ 120.178361] do_syscall_64+0x6e/0x2c0 [ 120.178375] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 120.178387] [ 120.178397] read to 0xffff8881e35e6540 of 4 bytes by interrupt on cpu 2: [ 120.178606] effective_prio+0x25/0xc0 [i915] [ 120.178812] process_csb+0xe8b/0x10a0 [i915] [ 120.179021] execlists_submission_tasklet+0x30/0x170 [i915] [ 120.179038] tasklet_action_common.isra.0+0x42/0xa0 [ 120.179053] __do_softirq+0xd7/0x2cd [ 120.179066] irq_exit+0xbe/0xe0 [ 120.179078] do_IRQ+0x51/0x100 [ 120.179090] ret_from_intr+0x0/0x1c [ 120.179104] cpuidle_enter_state+0x1b8/0x5d0 [ 120.179117] cpuidle_enter+0x50/0x90 [ 120.179131] do_idle+0x1a1/0x1f0 [ 120.179145] cpu_startup_entry+0x14/0x16 [ 120.179158] start_secondary+0x120/0x180 [ 120.179172] secondary_startup_64+0xa4/0xb0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-5-chris@chris-wilson.co.uk
2020-03-06drm/i915: Always propagate the invocation to i915_scheduleChris Wilson1-4/+1
We only call i915_schedule() when we know we have changed the priority on a request and so require to propagate any change in priority to its signalers (for PI). By unconditionally checking all of our signalers, we avoid skipping changes made prior to construction of the request (as the request may be waited upon before submission when used in parallel). References: https://gitlab.freedesktop.org/drm/intel/issues/1318 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306071614.2846708-2-chris@chris-wilson.co.uk
2020-02-20drm/i915: Double check bumping after the spinlockChris Wilson1-0/+3
In preparation for making GEM execbuf parallel, we need to be prepared to handle very early declaration of dependencies -- even before our signaler has itself been submitted. References: a79ca656b648 ("drm/i915: Push the wakeref->count deferral to the backend") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200220123608.1666271-1-chris@chris-wilson.co.uk
2020-02-20drm/i915/gt: Protect signaler walk with RCUChris Wilson1-3/+4
While we know that the waiters cannot disappear as we walk our list (only that they might be added), the same cannot be said for our signalers as they may be completed by the HW and retired as we process this request. Ergo we need to use rcu to protect the list iteration and remember to mark up the list_del_rcu. v2: Mark the deps as safe-for-rcu Fixes: 793c22617367 ("drm/i915/gt: Protect execlists_hold/unhold from new waiters") Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200220075025.1539375-1-chris@chris-wilson.co.uk
2020-02-07drm/i915/gt: Protect execlists_hold/unhold from new waitersChris Wilson1-1/+1
As we may add new waiters to a request as it is being run, we need to mark the list iteration as being safe for concurrent addition. v2: Mika spotted that we used the same trick for signalers_list, so warn the compiler about the lockless walk there as well. Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200207110213.2734386-1-chris@chris-wilson.co.uk
2020-02-07drm/i915/gt: Protect defer_request() from new waitersChris Wilson1-2/+4
Mika spotted <4>[17436.705441] general protection fault: 0000 [#1] PREEMPT SMP PTI <4>[17436.705447] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.0+ #1 <4>[17436.705449] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018 <4>[17436.705512] RIP: 0010:__execlists_submission_tasklet+0xc4d/0x16e0 [i915] <4>[17436.705516] Code: c5 4c 8d 60 e0 75 17 e9 8c 07 00 00 49 8b 44 24 20 49 39 c5 4c 8d 60 e0 0f 84 7a 07 00 00 49 8b 5c 24 08 49 8b 87 80 00 00 00 <48> 39 83 d8 fe ff ff 75 d9 48 8b 83 88 fe ff ff a8 01 0f 84 b6 05 <4>[17436.705518] RSP: 0018:ffffc9000012ce80 EFLAGS: 00010083 <4>[17436.705521] RAX: ffff88822ae42000 RBX: 5a5a5a5a5a5a5a5a RCX: dead000000000122 <4>[17436.705523] RDX: ffff88822ae42588 RSI: ffff8881e32a7908 RDI: ffff8881c429fd48 <4>[17436.705525] RBP: ffffc9000012cf00 R08: ffff88822ae42588 R09: 00000000fffffffe <4>[17436.705527] R10: ffff8881c429fb80 R11: 00000000a677cf08 R12: ffff8881c42a0aa8 <4>[17436.705529] R13: ffff8881c429fd38 R14: ffff88822ae42588 R15: ffff8881c429fb80 <4>[17436.705532] FS: 0000000000000000(0000) GS:ffff88822ed00000(0000) knlGS:0000000000000000 <4>[17436.705534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[17436.705536] CR2: 00007f858c76d000 CR3: 0000000005610003 CR4: 00000000003606e0 <4>[17436.705538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>[17436.705540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <4>[17436.705542] Call Trace: <4>[17436.705545] <IRQ> <4>[17436.705603] execlists_submission_tasklet+0xc0/0x130 [i915] which is us consuming a partially initialised new waiter in defer_requests(). We can prevent this by initialising the i915_dependency prior to making it visible, and since we are using a concurrent list_add/iterator mark them up to the compiler. Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-2-chris@chris-wilson.co.uk
2020-01-16drm/i915: Keep track of request among the scheduling listsChris Wilson1-12/+10
If we keep track of when the i915_request.sched.link is on the HW runlist, or in the priority queue we can simplify our interactions with the request (such as during rescheduling). This also simplifies the next patch where we introduce a new in-between list, for requests that are ready but neither on the run list or in the queue. v2: Update i915_sched_node.link explanation for current usage where it is a link on both the queue and on the runlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk
2019-12-20drm/i915: Drop GEM context as a direct link from i915_requestChris Wilson1-1/+1
Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk
2019-12-11drm/i915: Copy across scheduler behaviour flags across submit fencesChris Wilson1-1/+0
We want the bonded request to have the same scheduler properties as its master so that it is placed at the same depth in the queue. For example, consider we have requests A, B and B', where B & B' are a bonded pair to run in parallel on two engines. A -> B \- B' B will run after A and so may be scheduled on an idle engine and wait on A using a semaphore. B' sees B being executed and so enters the queue on the same engine as A. As B' did not inherit the semaphore-chain from B, it may have higher precedence than A and so preempts execution. However, B' then sits on a semaphore waiting for B, who is waiting for A, who is blocked by B. Ergo B' needs to inherit the scheduler properties from B (i.e. the semaphore chain) so that it is scheduled with the same priority as B and will not be executed ahead of Bs dependencies. Furthermore, to prevent the priorities changing via the expose fence on B', we need to couple in the dependencies for PI. This requires us to relax our sanity-checks that dependencies are strictly in order. v2: Synchronise (B, B') execution on all platforms, regardless of using a scheduler, any no-op syncs should be elided. Fixes: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding") Closes: https://gitlab.freedesktop.org/drm/intel/issues/464 Testcase: igt/gem_exec_balancer/bonded-chain Testcase: igt/gem_exec_balancer/bonded-semaphore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
2019-11-22drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_requestChris Wilson1-0/+12
As we start peeking into requests for longer and longer, e.g. incorporating use of spinlocks when only protected by an rcu_read_lock(), we need to be careful in how we reset the request when recycling and need to preserve any barriers that may still be in use as the request is reset for reuse. Quoting Linus Torvalds: > If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU? .. because the object can be accessed (by RCU) after the refcount has gone down to zero, and the thing has been released. That's the whole and only point of SLAB_TYPESAFE_BY_RCU. That flag basically says: "I may end up accessing this object *after* it has been free'd, because there may be RCU lookups in flight" This has nothing to do with constructors. It's ok if the object gets reused as an object of the same type and does *not* get re-initialized, because we're perfectly fine seeing old stale data. What it guarantees is that the slab isn't shared with any other kind of object, _and_ that the underlying pages are free'd after an RCU quiescent period (so the pages aren't shared with another kind of object either during an RCU walk). And it doesn't necessarily have to have a constructor, because the thing that a RCU walk will care about is (a) guaranteed to be an object that *has* been on some RCU list (so it's not a "new" object) (b) the RCU walk needs to have logic to verify that it's still the *same* object and hasn't been re-used as something else. In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used immediately, but because it gets reused as the same kind of object, the RCU walker can "know" what parts have meaning for re-use, in a way it couidn't if the re-use was random. That said, it *is* subtle, and people should be careful. > So the re-use might initialize the fields lazily, not necessarily using a ctor. If you have a well-defined refcount, and use "atomic_inc_not_zero()" to guard the speculative RCU access section, and use "atomic_dec_and_test()" in the freeing section, then you should be safe wrt new allocations. If you have a completely new allocation that has "random stale content", you know that it cannot be on the RCU list, so there is no speculative access that can ever see that random content. So the only case you need to worry about is a re-use allocation, and you know that the refcount will start out as zero even if you don't have a constructor. So you can think of the refcount itself as always having a zero constructor, *BUT* you need to be careful with ordering. In particular, whoever does the allocation needs to then set the refcount to a non-zero value *after* it has initialized all the other fields. And in particular, it needs to make sure that it uses the proper memory ordering to do so. NOTE! One thing to be very worried about is that re-initializing whatever RCU lists means that now the RCU walker may be walking on the wrong list so the walker may do the right thing for this particular entry, but it may miss walking *other* entries. So then you can get spurious lookup failures, because the RCU walker never walked all the way to the end of the right list. That ends up being a much more subtle bug. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
2019-11-04drm/i915: Protect request peeking with RCUChris Wilson1-2/+7
Since the execlists_active() is no longer protected by the engine->active.lock, we need to protect the request pointer with RCU to prevent it being freed as we evaluate whether or not we need to preempt. Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Fixes: 13ed13a4dcbf ("drm/i915: Don't set queue_priority_hint if we don't kick the submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191104090158.2959-2-chris@chris-wilson.co.uk
2019-10-21drm/i915: Don't set queue_priority_hint if we don't kick the submissionChris Wilson1-15/+22
If we change the priority of the active context, then it has no impact on the decision of whether to preempt the active context -- we don't preempt the context with itself. In this situation, we elide the tasklet rescheduling and should *not* be marking up the queue_priority_hint as that may mask a later submission where we decide we don't have to kick the tasklet as a higher priority submission is pending (spoiler alert, it was not). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191021080226.537-1-chris@chris-wilson.co.uk
2019-10-18drm/i915/execlists: Don't merely skip submission if maybe timeslicingChris Wilson1-2/+15
Normally, we try and skip submission if ELSP[1] is filled. However, we may desire to enable timeslicing due to the queue priority, even if ELSP[1] itself does not require timeslicing. That is the queue is equal priority to ELSP[0] and higher priority then ELSP[1]. Previously, we would wait until the context switch to preempt the current ELSP[1], but with timeslicing, we want to preempt ELSP[0] and replace it with the queue. In writing the test case, it become quickly apparent that we were also suppressing the tasklet during promotion and so failing to notice when the queue started requiring timeslicing. Fixes: 2229adc81380 ("drm/i915/execlist: Trim immediate timeslice expiry") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191018072027.31948-1-chris@chris-wilson.co.uk
2019-08-13drm/i915: Push the wakeref->count deferral to the backendChris Wilson1-2/+1
If the backend wishes to defer the wakeref parking, make it responsible for unlocking the wakeref (i.e. bumping the counter). This allows it to time the unlock much more carefully in case it happens to needs the wakeref to be active during its deferral. For instance, during engine parking we may choose to emit an idle barrier (a request). To do so, we borrow the engine->kernel_context timeline and to ensure exclusive access we keep the engine->wakeref.count as 0. However, to submit that request to HW may require a intel_engine_pm_get() (e.g. to keep the submission tasklet alive) and before we allow that we have to rewake our wakeref to avoid a recursive deadlock. <4> [257.742916] IRQs not enabled as expected <4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100 <4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid <4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G U W 5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1 <4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015 <4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100 <4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3 <4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086 <4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302 <4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001 <4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008 <4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430 <4> [257.743074] FS: 0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000 <4> [257.743082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0 <4> [257.743096] Call Trace: <4> [257.743104] <IRQ> <4> [257.743265] __i915_request_commit+0x240/0x5d0 [i915] <4> [257.743427] ? __i915_request_create+0x228/0x4c0 [i915] <4> [257.743584] __engine_park+0x64/0x250 [i915] <4> [257.743730] ____intel_wakeref_put_last+0x1c/0x70 [i915] <4> [257.743878] i915_sample+0x2ee/0x310 [i915] <4> [257.744030] ? i915_pmu_cpu_offline+0xb0/0xb0 [i915] <4> [257.744040] __hrtimer_run_queues+0x11e/0x4b0 <4> [257.744068] hrtimer_interrupt+0xea/0x250 <4> [257.744079] ? lockdep_hardirqs_off+0x79/0xd0 <4> [257.744101] smp_apic_timer_interrupt+0x96/0x280 <4> [257.744114] apic_timer_interrupt+0xf/0x20 <4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae v2: Keep the priority_hint assert v3: That assert was desperately trying to point out my bug. Sorry, little assert. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk
2019-06-20drm/i915/execlists: Minimalistic timeslicingChris Wilson1-0/+1
If we have multiple contexts of equal priority pending execution, activate a timer to demote the currently executing context in favour of the next in the queue when that timeslice expires. This enforces fairness between contexts (so long as they allow preemption -- forced preemption, in the future, will kick those who do not obey) and allows us to avoid userspace blocking forward progress with e.g. unbounded MI_SEMAPHORE_WAIT. For the starting point here, we use the jiffie as our timeslice so that we should be reasonably efficient wrt frequent CPU wakeups. Testcase: igt/gem_exec_scheduler/semaphore-resolve Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-2-chris@chris-wilson.co.uk
2019-06-20drm/i915/execlists: Preempt-to-busyChris Wilson1-2/+1
When using a global seqno, we required a precise stop-the-workd event to handle preemption and unwind the global seqno counter. To accomplish this, we would preempt to a special out-of-band context and wait for the machine to report that it was idle. Given an idle machine, we could very precisely see which requests had completed and which we needed to feed back into the run queue. However, now that we have scrapped the global seqno, we no longer need to precisely unwind the global counter and only track requests by their per-context seqno. This allows us to loosely unwind inflight requests while scheduling a preemption, with the enormous caveat that the requests we put back on the run queue are still _inflight_ (until the preemption request is complete). This makes request tracking much more messy, as at any point then we can see a completed request that we believe is not currently scheduled for execution. We also have to be careful not to rewind RING_TAIL past RING_HEAD on preempting to the running context, and for this we use a semaphore to prevent completion of the request before continuing. To accomplish this feat, we change how we track requests scheduled to the HW. Instead of appending our requests onto a single list as we submit, we track each submission to ELSP as its own block. Then upon receiving the CS preemption event, we promote the pending block to the inflight block (discarding what was previously being tracked). As normal CS completion events arrive, we then remove stale entries from the inflight tracker. v2: Be a tinge paranoid and ensure we flush the write into the HWS page for the GPU semaphore to pick in a timely fashion. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
2019-06-14drm/i915: Replace engine->timeline with a plain listChris Wilson1-19/+19
To continue the onslaught of removing the assumption of a global execution ordering, another casualty is the engine->timeline. Without an actual timeline to track, it is overkill and we can replace it with a much less grand plain list. We still need a list of requests inflight, for the simple purpose of finding inflight requests (for retiring, resetting, preemption etc). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk
2019-05-22drm/i915: Load balancing across a virtual engineChris Wilson1-3/+16
Having allowed the user to define a set of engines that they will want to only use, we go one step further and allow them to bind those engines into a single virtual instance. Submitting a batch to the virtual engine will then forward it to any one of the set in a manner as best to distribute load. The virtual engine has a single timeline across all engines (it operates as a single queue), so it is not able to concurrently run batches across multiple engines by itself; that is left up to the user to submit multiple concurrent batches to multiple queues. Multiple users will be load balanced across the system. The mechanism used for load balancing in this patch is a late greedy balancer. When a request is ready for execution, it is added to each engine's queue, and when an engine is ready for its next request it claims it from the virtual engine. The first engine to do so, wins, i.e. the request is executed at the earliest opportunity (idle moment) in the system. As not all HW is created equal, the user is still able to skip the virtual engine and execute the batch on a specific engine, all within the same queue. It will then be executed in order on the correct engine, with execution on other virtual engines being moved away due to the load detection. A couple of areas for potential improvement left! - The virtual engine always take priority over equal-priority tasks. Mostly broken up by applying FQ_CODEL rules for prioritising new clients, and hopefully the virtual and real engines are not then congested (i.e. all work is via virtual engines, or all work is to the real engine). - We require the breadcrumb irq around every virtual engine request. For normal engines, we eliminate the need for the slow round trip via interrupt by using the submit fence and queueing in order. For virtual engines, we have to allow any job to transfer to a new ring, and cannot coalesce the submissions, so require the completion fence instead, forcing the persistent use of interrupts. - We only drip feed single requests through each virtual engine and onto the physical engines, even if there was enough work to fill all ELSP, leaving small stalls with an idle CS event at the end of every request. Could we be greedy and fill both slots? Being lazy is virtuous for load distribution on less-than-full workloads though. Other areas of improvement are more general, such as reducing lock contention, reducing dispatch overhead, looking at direct submission rather than bouncing around tasklets etc. sseu: Lift the restriction to allow sseu to be reconfigured on virtual engines composed of RENDER_CLASS (rcs). v2: macroize check_user_mbz() v3: Cancel virtual engines on wedging v4: Commence commenting v5: Replace 64b sibling_mask with a list of class:instance v6: Drop the one-element array in the uabi v7: Assert it is an virtual engine in to_virtual_engine() v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2) Link: https://github.com/intel/media-driver/pull/283 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
2019-05-17drm/i915: Bump signaler priority on adding a waiterChris Wilson1-0/+11
The handling of the no-preemption priority level imposes the restriction that we need to maintain the implied ordering even though preemption is disabled. Otherwise we may end up with an AB-BA deadlock across multiple engine due to a real preemption event reordering the no-preemption WAITs. To resolve this issue we currently promote all requests to WAIT on unsubmission, however this interferes with the timeslicing requirement that we do not apply any implicit promotion that will defeat the round-robin timeslice list. (If we automatically promote the active request it will go back to the head of the queue and not the tail!) So we need implicit promotion to prevent reordering around semaphores where we are not allowed to preempt, and we must avoid implicit promotion on unsubmission. So instead of at unsubmit, if we apply that implicit promotion on adding the dependency, we avoid the semaphore deadlock and we also reduce the gains made by the promotion for user space waiting. Furthermore, by keeping the earlier dependencies at a higher level, we reduce the search space for timeslicing without altering runtime scheduling too badly (no dependencies at all will be assigned a higher priority for rrul). v2: Limit the bump to external edges (as originally intended) i.e. between contexts and out to the user. Testcase: igt/gem_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
2019-05-13drm/i915: Check for no-op priority changes firstChris Wilson1-2/+2
In all likelihood, the priority and node are already in the CPU cache and by checking them first, we can avoid having to chase the *request->hwsp for the current breadcrumb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-3-chris@chris-wilson.co.uk
2019-05-13drm/i915: Pass i915_sched_node around internallyChris Wilson1-17/+19
To simplify the next patch, update bump_priority and schedule to accept the internal i915_sched_ndoe directly and not expect a request pointer. add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7) Function old new delta i915_schedule_bump_priority 109 113 +4 i915_schedule 50 54 +4 __i915_schedule 922 907 -15 v2: Adopt node for the old rq local, since it no longer is a request but the origin node. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-2-chris@chris-wilson.co.uk
2019-05-13drm/i915: Rearrange i915_scheduler.cChris Wilson1-105/+105
To avoid pulling in a forward declaration in the next patch, move the i915_sched_node handling to after the main dfs of the scheduler. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190513120102.29660-1-chris@chris-wilson.co.uk
2019-05-07drm/i915: Only reschedule the submission tasklet if preemption is possibleChris Wilson1-15/+19
If we couple the scheduler more tightly with the execlists policy, we can apply the preemption policy to the question of whether we need to kick the tasklet at all for this priority bump. v2: Rephrase it as a core i915 policy and not an execlists foible. v3: Pull the kick together. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190507122544.12698-1-chris@chris-wilson.co.uk
2019-04-11drm/i915: Bump ready tasks ahead of busywaitsChris Wilson1-10/+11
Consider two tasks that are running in parallel on a pair of engines (vcs0, vcs1), but then must complete on a shared engine (rcs0). To maximise throughput, we want to run the first ready task on rcs0 (i.e. the first task that completes on either of vcs0 or vcs1). When using semaphores, however, we will instead queue onto rcs in submission order. To resolve this incorrect ordering, we want to re-evaluate the priority queue when each of the request is ready. Normally this happens because we only insert into the priority queue requests that are ready, but with semaphores we are inserting ahead of their readiness and to compensate we penalize those tasks with reduced priority (so that tasks that do not need to busywait should naturally be run first). However, given a series of tasks that each use semaphores, the queue degrades into submission fifo rather than readiness fifo, and so to counter this we give a small boost to semaphore users as their dependent tasks are completed (and so we no longer require any busywait prior to running the user task as they are then ready themselves). v2: Fixup irqsave for schedule_lock (Tvrtko) Testcase: igt/gem_exec_schedule/semaphore-codependency Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190409152922.23894-1-chris@chris-wilson.co.uk
2019-04-02drm/i915: Only emit one semaphore per requestChris Wilson1-2/+3
Ideally we only need one semaphore per ring to accommodate waiting on multiple engines in parallel. However, since we do not know which fences we will finally be waiting on, we emit a semaphore for every fence. It turns out to be quite easy to trick ourselves into exhausting our ringbuffer causing an error, just by feeding in a batch that depends on several thousand contexts. Since we never can be waiting on more than one semaphore in parallel (other than perhaps the desire to busywait on multiple engines), just pick the first fence for our semaphore. If we pick the wrong fence to busywait on, we just miss an opportunity to reduce latency. An adaption might be to use sched.flags as either a semaphore counter, or to track the first busywait on each engine, converting it back to a single use bit prior to closing the request. v2: Track first semaphore used per-engine (this caters for our basic igt that semaphores are working). Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Testcase: igt/gem_exec_fence/long-history Fixes: e88619646971 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-3-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2019-03-06drm/i915: Use i915_global_register()Chris Wilson1-12/+20
Rather than manually add every new global into each hook, use i915_global_register() function and keep a list of registered globals to invoke instead. However, I haven't found a way for random drivers to add an .init table to avoid having to manually add ourselves to i915_globals_init() each time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190305213830.18094-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2019-03-01drm/i915: Prioritise non-busywait semaphore workloadsChris Wilson1-0/+6
We don't want to busywait on the GPU if we have other work to do. If we give non-busywaiting workloads higher (initial) priority than workloads that require a busywait, we will prioritise work that is ready to run immediately. We then also have to be careful that we don't give earlier semaphores an accidental boost because later work doesn't wait on other rings, hence we keep a history of semaphore usage of the dependency chain. v2: Stop rolling the bits into a chain and just use a flag in case this request or any of our dependencies use a semaphore. The rolling around was contagious as Tvrtko was heard to fall off his chair. Testcase: igt/gem_exec_schedule/semaphore Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk
2019-02-28drm/i915/execlists: Suppress mere WAIT preemptionChris Wilson1-1/+0
WAIT is occasionally suppressed by virtue of preempted requests being promoted to NEWCLIENT if they have not all ready received that boost. Make this consistent for all WAIT boosts that they are not allowed to preempt executing contexts and are merely granted the right to be at the front of the queue for the next execution slot. This is in keeping with the desire that the WAIT boost be a minor tweak that does not give excessive promotion to its user and open ourselves to trivial abuse. The problem with the inconsistent WAIT preemption becomes more apparent as the preemption is propagated across the engines, where one engine may preempt and the other not, and we be relying on the exact execution order being consistent across engines (e.g. using HW semaphores to coordinate parallel execution). v2: Also protect GuC submission from false preemption loops. v3: Build bug safeguards and better debug messages for st. v4: Do the priority bumping in unsubmit (i.e. on preemption/reset unwind), applying it earlier during submit causes out-of-order execution combined with execute fences. v5: Call sw_fence_fini for our dummy request (Matthew) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk