aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2021-11-02drm/i915: stop setting cache_dirty on discreteMatthew Auld4-3/+22
Should not be needed. Even with non-coherent display, we should be using device local-memory there, and not system memory. v2: also add a warning in i915_gem_clflush_object Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-4-matthew.auld@intel.com
2021-11-02drm/i915: move cpu_write_needs_clflushMatthew Auld3-14/+15
Move it next to its partner in crime; gpu_write_needs_clflush. For better readability lets keep gpu vs cpu at least in the same file. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-3-matthew.auld@intel.com
2021-11-02drm/i915/clflush: disallow on discreteMatthew Auld1-2/+3
We seem to have an unfortunate issue where we arrive from: i915_gem_object_flush_if_display+0x86/0xd0 [i915] intel_user_framebuffer_dirty+0x1a/0x50 [i915] drm_mode_dirtyfb_ioctl+0xfb/0x1b0 which can be before the pages are populated(and pinned for display), and so i915_gem_object_has_struct_page() might still return true, as per the ttm backend. We could re-order the later get_pages() call here, but since on discrete everything should already be coherent, with the exception of the display engine, and even there display surfaces must be allocated in device local-memory anyway, so there should in theory be no conceivable reason to ever call i915_gem_clflush_object() on discrete. References: https://gitlab.freedesktop.org/drm/intel/-/issues/4320 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-2-matthew.auld@intel.com
2021-11-02drm/i915/clflush: fixup handling of cache_dirtyMatthew Auld1-1/+9
In theory if clflush_work_create() somehow fails here, and we don't yet have mm.pages populated then we end up resetting cache_dirty, which is likely wrong, since that will potentially skip the flush-on-acquire, if it was needed. It looks like intel_user_framebuffer_dirty() can arrive here before the pages are populated. v2(Thomas): - Move setting cache_dirty out of the async portion, also add a comment for why that should still be safe. v3: - Add Thomas' irc r-b Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-1-matthew.auld@intel.com
2021-11-01drm/i915: Introduce refcounted sg-tablesThomas Hellström9-147/+277
As we start to introduce asynchronous failsafe object migration, where we update the object state and then submit asynchronous commands we need to record what memory resources are actually used by various part of the command stream. Initially for three purposes: 1) Error capture. 2) Asynchronous migration error recovery. 3) Asynchronous vma bind. At the time where these happens, the object state may have been updated to be several migrations ahead and object sg-tables discarded. In order to make it possible to keep sg-tables with memory resource information for these operations, introduce refcounted sg-tables that aren't freed until the last user is done with them. The alternative would be to reference information sitting on the corresponding ttm_resources which typically have the same lifetime as these refcountes sg_tables, but that leads to other awkward constructs: Due to the design direction chosen for ttm resource managers that would lead to diamond-style inheritance, the LMEM resources may sometimes be prematurely freed, and finally the subclassed struct ttm_resource would have to bleed into the asynchronous vma bind code. v3: - Address a number of style issues (Matthew Auld) v4: - Dont check for st->sgl being NULL in i915_ttm_tt__shmem_unpopulate(), that should never happen. (Matthew Auld) v5: - Fix a Potential double-free (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211101122444.114607-1-thomas.hellstrom@linux.intel.com
2021-11-01drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9Cooper Chiou1-0/+41
This implements WaProgramMgsrForCorrectSliceSpecificMmioReads which was omitted by mistake from Gen9 documentation, while it is actually applicable to fused off parts. Workaround consists of making sure MCR packet control register is programmed to point to enabled slice/subslice pair before doing any MMIO reads from the affected registers. Failure do to this can result in complete system hangs when running certain workloads. Two known cases which can cause system hangs are: 1. "test_basic progvar_prog_scope_uninit" test which is part of Khronos OpenCL conformance suite (https://github.com/KhronosGroup/OpenCL-CTS) with the Intel OpenCL driver (https://github.com/intel/compute-runtime). 2. VP8 media hardware encoding using the full-feature build of the Intel media-driver (https://github.com/intel/media-driver) and ffmpeg. For the former case patch was verified to fix the hard system hang when executing the OCL test on Intel Pentium CPU 6405U which contains fused off GT1 graphics. Reference: HSD#1508045018,1405586840, BSID#0575 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: William Tseng <william.tseng@intel.com> Cc: Shawn C Lee <shawn.c.lee@intel.com> Cc: Pawel Wilma <pawel.wilma@intel.com> Signed-off-by: Cooper Chiou <cooper.chiou@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211025042623.3876-1-cooper.chiou@intel.com
2021-10-29drm/i915: Remove some dead struct fwd decl from i915_drv.hDaniel Vetter1-2/+0
Gone with userptr rewrite by Maarten in ed29c2691188 ("drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.") Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211022082200.2684194-1-daniel.vetter@ffwll.ch
2021-10-29drm/i915/gtt: stop caching the scratch pageMatthew Auld1-2/+2
Normal users shouldn't be hitting this, likely this would indicate a userspace bug. So don't bother caching, which should be safe now that we manually flush the page. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-2-matthew.auld@intel.com
2021-10-29drm/i915/gtt: flush the scratch pageMatthew Auld1-0/+3
The scratch page is directly visible in the users address space, and while this is forced as CACHE_LLC, by the kernel, we still have to contend with things like "Bypass-LLC" MOCS. So just flush no matter what. v2(Thomas): - Make sure we use drm_clflush_virt_range here, in case clflush support is missing. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-1-matthew.auld@intel.com
2021-10-28drm/i915/pmu: Connect engine busyness stats from GuC to pmuUmesh Nerlige Ramappa13-28/+453
With GuC handling scheduling, i915 is not aware of the time that a context is scheduled in and out of the engine. Since i915 pmu relies on this info to provide engine busyness to the user, GuC shares this info with i915 for all engines using shared memory. For each engine, this info contains: - total busyness: total time that the context was running (total) - id: id of the running context (id) - start timestamp: timestamp when the context started running (start) At the time (now) of sampling the engine busyness, if the id is valid (!= ~0), and start is non-zero, then the context is considered to be active and the engine busyness is calculated using the below equation engine busyness = total + (now - start) All times are obtained from the gt clock base. For inactive contexts, engine busyness is just equal to the total. The start and total values provided by GuC are 32 bits and wrap around in a few minutes. Since perf pmu provides busyness as 64 bit monotonically increasing values, there is a need for this implementation to account for overflows and extend the time to 64 bits before returning busyness to the user. In order to do that, a worker runs periodically at frequency = 1/8th the time it takes for the timestamp to wrap. As an example, that would be once in 27 seconds for a gt clock frequency of 19.2 MHz. Note: There might be an over-accounting of busyness due to the fact that GuC may be updating the total and start values while kmd is reading them. (i.e kmd may read the updated total and the stale start). In such a case, user may see higher busyness value followed by smaller ones which would eventually catch up to the higher value. v2: (Tvrtko) - Include details in commit message - Move intel engine busyness function into execlist code - Use union inside engine->stats - Use natural type for ping delay jiffies - Drop active_work condition checks - Use for_each_engine if iterating all engines - Drop seq locking, use spinlock at GuC level to update engine stats - Document worker specific details v3: (Tvrtko/Umesh) - Demarcate GuC and execlist stat objects with comments - Document known over-accounting issue in commit - Provide a consistent view of GuC state - Add hooks to gt park/unpark for GuC busyness - Stop/start worker in gt park/unpark path - Drop inline - Move spinlock and worker inits to GuC initialization - Drop helpers that are called only once v4: (Tvrtko/Matt/Umesh) - Drop addressed opens from commit message - Get runtime pm in ping, remove from the park path - Use cancel_delayed_work_sync in disable_submission path - Update stats during reset prepare - Skip ping if reset in progress - Explicitly name execlists and GuC stats objects - Since disable_submission is called from many places, move resetting stats to intel_guc_submission_reset_prepare v5: (Tvrtko) - Add a trylock helper that does not sleep and synchronize PMU event callbacks and worker with gt reset v6: (CI BAT failures) - DUTs using execlist submission failed to boot since __gt_unpark is called during i915 load. This ends up calling the GuC busyness unpark hook and results in kick-starting an uninitialized worker. Let park/unpark hooks check if GuC submission has been initialized. - drop cant_sleep() from trylock helper since rcu_read_lock takes care of that. v7: (CI) Fix igt@i915_selftest@live@gt_engines - For GuC mode of submission the engine busyness is derived from gt time domain. Use gt time elapsed as reference in the selftest. - Increase busyness calculation to 10ms duration to ensure batch runs longer and falls within the busyness tolerances in selftest. v8: - Use ktime_get in selftest as before - intel_reset_trylock_no_wait results in a lockdep splat that is not trivial to fix since the PMU callback runs in irq context and the reset paths are tightly knit into the driver. The test that uncovers this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait, instead use the reset_count to synchronize with gt reset during pmu callback. For the ping, continue to use intel_reset_trylock since ping is not run in irq context. - GuC PM timestamp does not tick when GuC is idle. This can potentially result in wrong busyness values when a context is active on the engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to process the GuC busyness stats. This works since both GuC timestamp and RING timestamp are synced with the same clock. - The busyness stats may get updated after the batch starts running. This delay causes the busyness reported for 100us duration to fall below 95% in the selftest. The only option at this time is to wait for GuC busyness to change from idle to active before we sample busyness over a 100us period. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-28drm/i915/pmu: Add a name to the execlists statsUmesh Nerlige Ramappa3-46/+53
In preparation for GuC pmu stats, add a name to the execlists stats structure so that it can be differentiated from the GuC stats. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-1-umesh.nerlige.ramappa@intel.com
2021-10-27drm/i915: Revert 'guc_id' from i915_request tracepointJoonas Lahtinen1-5/+2
Avoid adding backend specific data to the tracepoints outside of the LOW_LEVEL_TRACEPOINTS kernel config protection. These bits of information are bound to change depending on the selected submission method per platform and are not necessarily possible to maintain in the future. Fixes: dbf9da8d55ef ("drm/i915/guc: Add trace point for GuC submit") Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027093255.66489-1-joonas.lahtinen@linux.intel.com
2021-10-27MAINTAINERS: Add Tvrtko as drm/i915 co-maintainerJoonas Lahtinen1-0/+1
Add Tvrtko Ursulin as a co-maintainer for drm/i915 driver. Tvrtko will bring added bandwidth and focus to the GT/GEM domain (drm-intel-gt-next). This will help with the increased driver maintenance efforts, allows alternating the drm-intel-gt-next pull requests and also should increase the coverage for the maintenance in general. Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Acked-by: Jani Nikula <jani.nikula@linux.intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Sean Paul <sean@poorly.run> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: dri-devel@lists.freedesktop.org Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211025134907.20078-1-joonas.lahtinen@linux.intel.com
2021-10-25drm/i915: Use ERR_CAST instead of ERR_PTR(PTR_ERR())Wan Jiabing1-1/+1
Fix following coccicheck warning: ./drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3117:15-22: WARNING: ERR_CAST can be used with eb->requests[i]. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211025113316.24424-1-wanjiabing@vivo.com
2021-10-25drm/i915/selftests: Fix inconsistent IS_ERR and PTR_ERRKai Song1-2/+2
Fix inconsistent IS_ERR and PTR_ERR in i915_gem_dmabuf.c Signed-off-by: Kai Song <songkai01@inspur.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211022120655.22173-1-songkai01@inspur.com
2021-10-22drm/i915/guc: Fix recursive lock in GuC submissionMatthew Brost1-1/+2
Use __release_guc_id (lock held) rather than release_guc_id (acquires lock), add lockdep annotations. 213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr [ 213.283459] ============================================ [ 213.283462] WARNING: possible recursive locking detected {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }} [ 213.283470] -------------------------------------------- [ 213.283472] kworker/u24:0/8 is trying to acquire lock: [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915] {{[ 213.283618] }} {{ but task is already holding lock:}} [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283720] }} {{ other info that might help us debug this:}} [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0 [ 213.283728] ---- [ 213.283730] lock(&guc->submission_state.lock); [ 213.283734] lock(&guc->submission_state.lock); {{[ 213.283737] }} {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8: [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283860] }} {{ stack backtrace:}} [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18 [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915] [ 213.283957] Call Trace: [ 213.283960] dump_stack_lvl+0x57/0x72 [ 213.283966] __lock_acquire.cold+0x191/0x2d3 [ 213.283972] lock_acquire+0xb5/0x2b0 [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915] [ 213.284139] ? lock_release+0xb9/0x280 [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60 [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915] [ 213.284310] process_one_work+0x270/0x550 [ 213.284315] worker_thread+0x52/0x3b0 [ 213.284319] ? process_one_work+0x550/0x550 [ 213.284322] kthread+0x135/0x160 [ 213.284326] ? set_kthread_struct+0x40/0x40 [ 213.284331] ret_from_fork+0x1f/0x30 and a bit later in the trace: {{ 227.499864] do_raw_spin_lock+0x94/0xa0}} [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60 [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915] [ 227.500209] ? mark_held_locks+0x49/0x70 [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915] [ 227.500320] reset_prepare+0x78/0x90 [i915] [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915] [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915] [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915] [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915] [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915] [ 227.500767] i915_pci_probe+0x84/0x170 [i915] [ 227.500838] local_pci_probe+0x42/0x80 [ 227.500842] pci_device_probe+0xd9/0x190 [ 227.500844] really_probe+0x1f2/0x3f0 [ 227.500847] __driver_probe_device+0xfe/0x180 [ 227.500848] driver_probe_device+0x1e/0x90 [ 227.500850] __driver_attach+0xc4/0x1d0 [ 227.500851] ? __device_attach_driver+0xe0/0xe0 [ 227.500853] ? __device_attach_driver+0xe0/0xe0 [ 227.500854] bus_for_each_dev+0x64/0x90 [ 227.500856] bus_add_driver+0x12e/0x1f0 [ 227.500857] driver_register+0x8f/0xe0 [ 227.500859] i915_init+0x1d/0x8f [i915] [ 227.500934] ? 0xffffffffc144a000 [ 227.500936] do_one_initcall+0x58/0x2d0 [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80 [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0 [ 227.500944] do_init_module+0x5c/0x270 [ 227.500946] __do_sys_finit_module+0x95/0xe0 [ 227.500949] do_syscall_64+0x38/0x90 [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 227.500953] RIP: 0033:0x7ffa59d2ae0d [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48 [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004 [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070 [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90 [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0 v2: (CI build) - Fix build error Fixes: 1a52faed31311 ("drm/i915/guc: Take GT PM ref when deregistering context") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com
2021-10-22drm/i915/selftests: Update live.evict to wait on requests / idle GPU after each loopMatthew Brost1-0/+19
Update live.evict to wait on last request and idle GPU after each loop. This not only enhances the test to fill the GGTT on each engine class but also avoid timeouts from igt_flush_test when using GuC submission. igt_flush_test (idle GPU) can take a long time with GuC submission if losts of contexts are created due to H2G / G2H required to destroy contexts. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211021214040.33292-1-matthew.brost@intel.com
2021-10-22drm/i915/selftests: Increase timeout in requests perf selftestMatthew Brost1-2/+2
perf_parallel_engines is micro benchmark to test i915 request scheduling. The test creates a thread per physical engine and submits NOP requests and waits the requests to complete in a loop. In execlists mode this works perfectly fine as powerful CPU has enough cores to feed each engine and process the CSBs. With GuC submission the uC gets overwhelmed as all threads feed into a single CTB channel and the GuC gets bombarded with CSBs as contexts are immediately switched in and out on the engines due to the zero runtime of the requests. When the GuC is overwhelmed scheduling of contexts is unfair due to the nature of the GuC scheduling algorithm. This behavior is understood and deemed acceptable as this micro benchmark isn't close to real world use case. Increasing the timeout of wait period for requests to complete. This makes the test understand that is ok for contexts to get starved in this scenario. A future patch / cleanup may just delete these micro benchmark tests as they basically mean nothing. We care about real workloads not made up ones. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211011175704.28509-1-matthew.brost@intel.com
2021-10-22drm/i915/ttm: enable shmem tt backendMatthew Auld1-1/+2
Turn on the shmem tt backend, and enable shrinking. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-8-matthew.auld@intel.com
2021-10-22drm/i915/ttm: use cached system pages when evicting lmemMatthew Auld1-4/+4
This should let us do an accelerated copy directly to the shmem pages when temporarily moving lmem-only objects, where the i915-gem shrinker can later kick in to swap out the pages, if needed. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-7-matthew.auld@intel.com
2021-10-22drm/i915/ttm: move shrinker management into adjust_lruMatthew Auld5-22/+137
We currently just evict lmem objects to system memory when under memory pressure. For this case we might lack the usual object mm.pages, which effectively hides the pages from the i915-gem shrinker, until we actually "attach" the TT to the object, or in the case of lmem-only objects it just gets migrated back to lmem when touched again. For all cases we can just adjust the i915 shrinker LRU each time we also adjust the TTM LRU. The two cases we care about are: 1) When something is moved by TTM, including when initially populating an object. Importantly this covers the case where TTM moves something from lmem <-> smem, outside of the normal get_pages() interface, which should still ensure the shmem pages underneath are reclaimable. 2) When calling into i915_gem_object_unlock(). The unlock should ensure the object is removed from the shinker LRU, if it was indeed swapped out, or just purged, when the shrinker drops the object lock. v2(Thomas): - Handle managing the shrinker LRU in adjust_lru, where it is always safe to touch the object. v3(Thomas): - Pretty much a re-write. This time piggy back off the shrink_pin stuff, which actually seems to fit quite well for what we want here. v4(Thomas): - Just use a simple boolean for tracking ttm_shrinkable. v5: - Ensure we call adjust_lru when faulting the object, to ensure the pages are visible to the shrinker, if needed. - Add back the adjust_lru when in i915_ttm_move (Thomas) v6(Reported-by: kernel test robot <lkp@intel.com>): - Remove unused i915_tt Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v4 Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-6-matthew.auld@intel.com
2021-10-22drm/i915: add some kernel-doc for shrink_pin and friendsMatthew Auld2-1/+54
Attempt to document shrink_pin and the other relevant interfaces that interact with it, before we start messing with it. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-5-matthew.auld@intel.com
2021-10-22drm/i915: drop unneeded make_unshrinkable in free_objectMatthew Auld1-9/+0
The comment here is no longer accurate, since the current shrinker code requires a full ref before touching any objects. Also unset_pages() should already do the required make_unshrinkable() for us, if needed, which is also nicely balanced with set_pages(). Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-4-matthew.auld@intel.com
2021-10-22drm/i915/gtt: drop unneeded make_unshrinkableMatthew Auld2-2/+0
We already do this when mapping the pages. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-3-matthew.auld@intel.com
2021-10-22drm/i915/ttm: add tt shmem backendMatthew Auld6-42/+247
For cached objects we can allocate our pages directly in shmem. This should make it possible(in a later patch) to utilise the existing i915-gem shrinker code for such objects. For now this is still disabled. v2(Thomas): - Add optional try_to_writeback hook for objects. Importantly we need to check if the object is even still shrinkable; in between us dropping the shrinker LRU lock and acquiring the object lock it could for example have been moved. Also we need to differentiate between "lazy" shrinking and the immediate writeback mode. Also later we need to handle objects which don't even have mm.pages, so bundling this into put_pages() would require somehow handling that edge case, hence just letting the ttm backend handle everything in try_to_writeback doesn't seem too bad. v3(Thomas): - Likely a bad idea to touch the object from the unpopulate hook, since it's not possible to hold a reference, without also creating circular dependency, so likely this is too fragile. For now just ensure we at least mark the pages as dirty/accessed when called from the shrinker on WILLNEED objects. - s/try_to_writeback/shrinker_release_pages, since this can do more than just writeback. - Get rid of do_backup boolean and just set the SWAPPED flag prior to calling unpopulate. - Keep shmem_tt as lowest priority for the TTM LRU bo_swapout walk, since these just get skipped anyway. We can try to come up with something better later. v4(Thomas): - s/PCI_DMA/DMA/. Also drop NO_KERNEL_MAPPING and NO_WARN, which apparently doesn't do anything with streaming mappings. - Just pass along the error for ->truncate, and assume nothing. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Oak Zeng <oak.zeng@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-2-matthew.auld@intel.com
2021-10-22drm/i915/gem: Break out some shmem backend utilsThomas Hellström1-75/+106
Break out some shmem backend utils for future reuse by the TTM backend: shmem_alloc_st(), shmem_free_st() and __shmem_writeback() which we can use to provide a shmem-backed TTM page pool for cached-only TTM buffer objects. Main functional change here is that we now compute the page sizes using the dma segments rather than using the physical page address segments. v2(Reported-by: kernel test robot <lkp@intel.com>) - Make sure we initialise the mapping on the error path in shmem_get_pages() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-1-matthew.auld@intel.com
2021-10-22drm/i915/dmabuf: fix broken buildMatthew Auld1-0/+7
wbinvd_on_all_cpus() is only defined on x86 it seems, plus we need to include asm/smp.h here. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211021125332.2455288-1-matthew.auld@intel.com
2021-10-20drm/i915/selftests: mark up hugepages object with start_cpu_writeMatthew Auld1-1/+6
Just like we do for internal objects. Also just use i915_gem_object_set_cache_coherency() here. No need for over-flushing on LLC platforms. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-9-matthew.auld@intel.com
2021-10-20drm/i915: mark up internal objects with start_cpu_writeMatthew Auld1-0/+2
While the pages can't be swapped out, they can be discarded by the shrinker. Normally such objects are marked with __I915_MADV_PURGED, which can't be unset, and therefore requires a new object. For kernel internal objects this is not true, since the madv hint is reset for our special volatile objects, such that we can re-acquire new pages, if so desired, without needing a new object. As a result we should probably be paranoid here and put the object back into the CPU domain when discarding the pages, and also correctly set cache_dirty, if required. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-8-matthew.auld@intel.com
2021-10-20drm/i915: expand on the kernel-doc for cache_dirtyMatthew Auld2-0/+38
Add some details around non-LLC platforms and cflushing, when dealing with the flush-on-acquire, which is potentially security sensitive. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-7-matthew.auld@intel.com
2021-10-20drm/i915/shmem: ensure flush during swap-in on non-LLCMatthew Auld1-0/+12
On non-LLC platforms, force the flush-on-acquire if this is ever swapped-in. Our async flush path is not trust worthy enough yet(and happens in the wrong order), and with some tricks it's conceivable for userspace to change the cache-level to I915_CACHE_NONE after the pages are swapped-in, and since execbuf binds the object before doing the async flush, there is a potential race window. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-6-matthew.auld@intel.com
2021-10-20drm/i915/userptr: add paranoid flush-on-acquireMatthew Auld1-1/+4
Even though userptr objects are always coherent with the GPU, with no way for userspace to change this with the set_caching ioctl, even on non-LLC platforms, there is still the 'Bypass LCC' mocs setting, which might permit reading the contents of main memory directly. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-5-matthew.auld@intel.com
2021-10-20drm/i915/dmabuf: add paranoid flush-on-acquireMatthew Auld1-1/+5
As pointed out by Thomas, we likely need to flush the pages here if the GPU can read the page contents directly from main memory. Underneath we don't know what the sg_table is pointing to, so just add a wbinvd_on_all_cpus() here, for now. Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-4-matthew.auld@intel.com
2021-10-20drm/i915: extract bypass-llc check into helperMatthew Auld3-16/+28
It looks like we will need this in some more places, so extract as a helper. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-3-matthew.auld@intel.com
2021-10-20drm/i915: mark userptr objects as ALLOC_USERMatthew Auld1-1/+2
These are userspace objects, so mark them as such. In a later patch it's useful to determine how paranoid we need to be when managing cache flushes. In theory no functional changes. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-2-matthew.auld@intel.com
2021-10-20drm/i915: mark dmabuf objects as ALLOC_USERMatthew Auld1-1/+2
These are userspace objects, so mark them as such. In a later patch it's useful to determine how paranoid we need to be when managing cache flushes. In theory no functional changes. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018174508.2137279-1-matthew.auld@intel.com
2021-10-20drm/i915/selftests: remove duplicate include in mock_region.cRan Jianping1-2/+0
'drm/ttm/ttm_placement.h' included in 'drivers/gpu/drm/i915/selftests/mock_region.c' is duplicated. It is also included on the 9 line. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Ran Jianping <ran.jianping@zte.com.cn> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211019090205.1003458-1-ran.jianping@zte.com.cn
2021-10-18drm/i915: Catch yet another unconditioal clflushVille Syrjälä1-1/+1
Replace the unconditional clflush() with drm_clflush_virt_range() which does the wbinvd() fallback when clflush is not available. This time no justification is given for the clflush in the offending commit. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: 2c8ab3339e39 ("drm/i915: Pin timeline map after first timeline pin, v4.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-4-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-10-18drm/i915: Convert unconditional clflush to drm_clflush_virt_range()Ville Syrjälä1-1/+1
This one is apparently a "clflush for good measure", so bit more justification (if you can call it that) than some of the others. Convert to drm_clflush_virt_range() again so that machines without clflush will survive the ordeal. Cc: stable@vger.kernel.org Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@intel.com> #v1 Fixes: 12ca695d2c1e ("drm/i915: Do not share hwsp across contexts any more, v8.") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-3-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-10-18drm/i915: Replace the unconditional clflush with drm_clflush_virt_range()Ville Syrjälä1-1/+1
Not all machines have clflush, so don't go assuming they do. Not really sure why the clflush is even here since hwsp is supposed to get snooped I thought. Although in my case we're talking about a i830 machine where render/blitter snooping is definitely busted. But it might work for the hswp perhaps. Haven't really reverse engineered that one fully. Cc: stable@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Fixes: b436a5f8b6c8 ("drm/i915/gt: Track all timelines created using the HWSP") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-2-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com>
2021-10-15drm/i915: Clean up PXP Kconfig info.Rodrigo Vivi1-5/+5
During the review I focused on stop the using of the "+" to reference the newer platforms, but I forgot that we are in a process of making things more clear and differentiate graphics and display versions. So, let me to clean up this a bit. Also, we don't need any version mentioned in the config menu entry, only in the help. Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211015090916.82968-1-rodrigo.vivi@intel.com
2021-10-15drm/i915: Enable multi-bb execbufMatthew Brost1-3/+0
Enable multi-bb execbuf by enabling the set_parallel extension. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-25-matthew.brost@intel.com
2021-10-15drm/i915: Update I915_GEM_BUSY IOCTL to understand composite fencesMatthew Brost3-13/+55
Parallel submission create composite fences (dma_fence_array) for excl / shared slots in objects. The I915_GEM_BUSY IOCTL checks these slots to determine the busyness of the object. Prior to patch it only check if the fence in the slot was a i915_request. Update the check to understand composite fences and correctly report the busyness. v2: (Tvrtko) - Remove duplicate BUILD_BUG_ON Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-24-matthew.brost@intel.com
2021-10-15drm/i915: Make request conflict tracking understand parallel submitsMatthew Brost1-14/+29
If an object in the excl or shared slot is a composite fence from a parallel submit and the current request in the conflict tracking is from the same parallel context there is no need to enforce ordering as the ordering is already implicit. Make the request conflict tracking understand this by comparing a parallel submit's parent context and skipping conflict insertion if the values match. v2: (John Harrison) - Reword commit message Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-23-matthew.brost@intel.com
2021-10-15drm/i915/guc: Handle errors in multi-lrc requestsMatthew Brost1-3/+66
If an error occurs in the front end when multi-lrc requests are getting generated we need to skip these in the backend but we still need to emit the breadcrumbs seqno. An issues arises because with multi-lrc breadcrumbs there is a handshake between the parent and children to make forward progress. If all the requests are not present this handshake doesn't work. To work around this, if multi-lrc request has an error we skip the handshake but still emit the breadcrumbs seqno. v2: (John Harrison) - Add comment explaining the skipping of the handshake logic - Fix typos in the commit message v3: (John Harrison) - Fix up some comments about the math to NOP the ring Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-22-matthew.brost@intel.com
2021-10-15drm/i915: Multi-BB execbufMatthew Brost7-251/+595
Allow multiple batch buffers to be submitted in a single execbuf IOCTL after a context has been configured with the 'set_parallel' extension. The number batches is implicit based on the contexts configuration. This is implemented with a series of loops. First a loop is used to find all the batches, a loop to pin all the HW contexts, a loop to create all the requests, a loop to submit (emit BB start, etc...) all the requests, a loop to tie the requests to the VMAs they touch, and finally a loop to commit the requests to the backend. A composite fence is also created for the generated requests to return to the user and to stick in dma resv slots. No behavior from the existing IOCTL should be changed aside from when throttling because the ring for a context is full. In this situation, i915 will now wait while holding the object locks. This change was done because the code is much simpler to wait while holding the locks and we believe there isn't a huge benefit of dropping these locks. If this proves false we can restructure the code to drop the locks during the wait. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Matthew Brost) - Return proper error value if i915_request_create fails v3: (John Harrison) - Add comment explaining create / add order loops + locking - Update commit message explaining different in IOCTL behavior - Line wrap some comments - eb_add_request returns void - Return -EINVAL rather triggering BUG_ON if cmd parser used (Checkpatch) - Check eb->batch_len[*current_batch] v4: (CI) - Set batch len if passed if via execbuf args - Call __i915_request_skip after __i915_request_commit (Kernel test robot) - Initialize rq to NULL in eb_pin_timeline v5: (John Harrison) - Fix typo in comments near bb order loops Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-21-matthew.brost@intel.com
2021-10-15drm/i915/guc: Implement no mid batch preemption for multi-lrcMatthew Brost4-13/+326
For some users of multi-lrc, e.g. split frame, it isn't safe to preempt mid BB. To safely enable preemption at the BB boundary, a handshake between parent and child is needed, syncing the set of BBs at the beginning and end of each batch. This is implemented via custom emit_bb_start & emit_fini_breadcrumb functions and enabled by default if a context is configured by set parallel extension. Lastly, this patch updates the process descriptor to the correct size as the memory used in the handshake is directly after the process descriptor. v2: (John Harrison) - Fix a few comments wording - Add struture for parent page layout v3: (John Harrison) - A structure for sync semaphore - Use offsetof to calc address - Update commit message v4: (John Harrison) - Fix typos in comment explaining memory map of scratch page Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-20-matthew.brost@intel.com
2021-10-15drm/i915/guc: Add basic GuC multi-lrc selftestMatthew Brost3-0/+181
Add very basic (single submission) multi-lrc selftest. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-19-matthew.brost@intel.com
2021-10-15drm/i915/doc: Update parallel submit doc to point to i915_drm.hMatthew Brost2-124/+2
Update parallel submit doc to point to i915_drm.h Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-18-matthew.brost@intel.com
2021-10-15drm/i915/guc: Connect UAPI to GuC multi-lrc interfaceMatthew Brost9-31/+505
Introduce 'set parallel submit' extension to connect UAPI to GuC multi-lrc interface. Kernel doc in new uAPI should explain it all. IGT: https://patchwork.freedesktop.org/patch/447008/?series=93071&rev=1 media UMD: https://github.com/intel/media-driver/pull/1252 v2: (Daniel Vetter) - Add IGT link and placeholder for media UMD link v3: (Kernel test robot) - Fix warning in unpin engines call (John Harrison) - Reword a bunch of the kernel doc v4: (John Harrison) - Add comment why perma-pin is done after setting gem context - Update some comments / docs for proto contexts v5: (John Harrison) - Rework perma-pin comment - Add BUG_IN if context is pinned when setting gem context Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014172005.27155-17-matthew.brost@intel.com