linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-06-14	drm/i915: Make closing request flush mandatory	Chris Wilson	1	-16/+2
	For symmetry, simplicity and ensuring the request is always truly idle upon its completion, always emit the closing flush prior to emitting the request breadcrumb. Previously, we would only emit the flush if we had started a user batch, but this just leaves all the other paths open to speculation (do they affect the GPU caches or not?) With mm switching, a key requirement is that the GPU is flushed and invalidated before hand, so for absolute safety, we want that closing flush be mandatory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk
2018-06-11	drm/i915/ringbuffer: Fix context restore upon reset	Chris Wilson	1	-0/+2
	The discovery with trying to enable full-ppgtt was that we were completely failing to the load both the mm and context following the reset. Although we were performing mmio to set the PP_DIR (per-process GTT) and CCID (context), these were taking no effect (the assumption was that this would trigger reload of the context and restore the page tables). It was not until we performed the LRI + MI_SET_CONTEXT in a following context switch would anything occur. Since we are then required to reset the context image and PP_DIR using CS commands, we place those commands into every batch. The hardware should recognise the no-ops and eliminate the expensive context loads, but we still have to pay the cost of using cross-powerwell register writes. In practice, this has no effect on actual context switch times, and only adds a few hundred nanoseconds to no-op switches. We can improve the latter by eliminating the w/a around known no-op switches, but there is an ulterior motive to keeping them. Always emitting the context switch at the beginning of the request (and relying on HW to skip unneeded switches) does have one key advantage. Should we implement request reordering on Haswell, we will not know in advance what the previous executing context was on the GPU and so we would not be able to elide the MI_SET_CONTEXT commands ourselves and always have to emit them. Having our hand forced now actually prepares us for later. Now since that context and mm follow the request, we no longer (and not for a long time since requests took over!) require a trace point to tell when we write the switch into the ring, since it is always. (This is even more important when you remember that simply writing into the ring bears no relation to the current mm.) v2: Sandybridge has to agree to use LRI as well. Testcase: igt/drv_selftests/live_hangcheck Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
2018-05-24	drm/i915: Look for an active kernel context before switching	Chris Wilson	1	-1/+4
	We were not very carefully checking to see if an older request on the engine was an earlier switch-to-kernel-context before deciding to emit a new switch. The end result would be that we could get into a permanent loop of trying to emit a new request to perform the switch simply to flush the existing switch. What we need is a means of tracking the completion of each timeline versus the kernel context, that is to detect if a more recent request has been submitted that would result in a switch away from the kernel context. To realise this, we need only to look in our syncmap on the kernel context and check that we have synchronized against all active rings. v2: Since all ringbuffer clients currently share the same timeline, we do have to use the gem_context to distinguish clients. As a bonus, include all the tracing used to debug the death inside suspend. v3: Test, test, test. Construct a selftest to exercise and assert the expected behaviour that multiple switch-to-contexts do not emit redundant requests. Reported-by: Mika Kuoppala <mika.kuoppala@intel.com> Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk
2018-05-18	drm/i915: Store a pointer to intel_context in i915_request	Chris Wilson	1	-17/+17
	To ease the frequent and ugly pointer dance of &request->gem_context->engine[request->engine->id] during request submission, store that pointer as request->hw_context. One major advantage that we will exploit later is that this decouples the logical context state from the engine itself. v2: Set mock_context->ops so we don't crash and burn in selftests. Cleanups from Tvrtko. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk
2018-05-18	drm/i915: Move request->ctx aside	Chris Wilson	1	-6/+6
	In the next patch, we want to store the intel_context pointer inside i915_request, as it is frequently access via a convoluted dance when submitting the request to hw. Having two context pointers inside i915_request leads to confusion so first rename the existing i915_gem_context pointer to i915_request.gem_context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk
2018-05-08	drm/i915: Annotate timeline lock nesting	Chris Wilson	1	-1/+1
	CI noticed <4>[ 23.430701] ============================================ <4>[ 23.430706] WARNING: possible recursive locking detected <4>[ 23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted <4>[ 23.430720] -------------------------------------------- <4>[ 23.430725] systemd-udevd/169 is trying to acquire lock: <4>[ 23.430732] (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915] <4>[ 23.430888] but task is already holding lock: <4>[ 23.430894] (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] <4>[ 23.430995] other info that might help us debug this: <4>[ 23.431002] Possible unsafe locking scenario: <4>[ 23.431007] CPU0 <4>[ 23.431010] ---- <4>[ 23.431013] lock(&(&timeline->lock)->rlock); <4>[ 23.431021] lock(&(&timeline->lock)->rlock); <4>[ 23.431028] * DEADLOCK * <4>[ 23.431036] May be due to missing lock nesting notation <4>[ 23.431044] 5 locks held by systemd-udevd/169: <4>[ 23.431049] #0: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0 <4>[ 23.431065] #1: (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0 <4>[ 23.431078] #2: (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915] <4>[ 23.431174] #3: (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915] <4>[ 23.431271] #4: (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915] <4>[ 23.431369] stack backtrace: <4>[ 23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1 <4>[ 23.431385] Hardware name: Dell Inc. OptiPlex GX280 /0G8310, BIOS A04 02/09/2005 <4>[ 23.431394] Call Trace: <4>[ 23.431403] dump_stack+0x67/0x9b <4>[ 23.431411] __lock_acquire+0xc67/0x1b50 <4>[ 23.431421] ? ring_buffer_lock_reserve+0x154/0x3f0 <4>[ 23.431429] ? lock_acquire+0xa6/0x210 <4>[ 23.431435] lock_acquire+0xa6/0x210 <4>[ 23.431530] ? move_to_timeline+0x48/0x12c [i915] <4>[ 23.431540] _raw_spin_lock+0x2a/0x40 <4>[ 23.431634] ? move_to_timeline+0x48/0x12c [i915] <4>[ 23.431730] move_to_timeline+0x48/0x12c [i915] <4>[ 23.431826] __i915_request_submit+0xfa/0x280 [i915] <4>[ 23.431923] i915_request_submit+0x25/0x40 [i915] <4>[ 23.432024] i9xx_submit_request+0x11/0x140 [i915] <4>[ 23.432120] submit_notify+0x8d/0x124 [i915] <4>[ 23.432202] __i915_sw_fence_complete+0x81/0x250 [i915] <4>[ 23.432300] __i915_request_add+0x31c/0x7c0 [i915] <4>[ 23.432395] i915_gem_init+0x621/0x630 [i915] <4>[ 23.432476] i915_driver_load+0xbee/0x10b0 [i915] <4>[ 23.432485] ? trace_hardirqs_on_caller+0xe0/0x1b0 <4>[ 23.432566] i915_pci_probe+0x29/0x90 [i915] <4>[ 23.432574] pci_device_probe+0xa1/0x130 <4>[ 23.432582] driver_probe_device+0x306/0x480 <4>[ 23.432589] __driver_attach+0xb7/0xe0 <4>[ 23.432596] ? driver_probe_device+0x480/0x480 <4>[ 23.432602] ? driver_probe_device+0x480/0x480 <4>[ 23.432609] bus_for_each_dev+0x74/0xc0 <4>[ 23.432616] bus_add_driver+0x15f/0x250 <4>[ 23.432623] ? 0xffffffffa02d7000 <4>[ 23.432629] driver_register+0x52/0xc0 <4>[ 23.432635] ? 0xffffffffa02d7000 <4>[ 23.432642] do_one_initcall+0x58/0x370 <4>[ 23.432653] ? do_init_module+0x1d/0x1ea <4>[ 23.432660] ? rcu_read_lock_sched_held+0x6f/0x80 <4>[ 23.432667] ? kmem_cache_alloc_trace+0x282/0x2e0 <4>[ 23.432675] do_init_module+0x56/0x1ea <4>[ 23.432682] load_module+0x2435/0x2b20 <4>[ 23.432694] ? __se_sys_finit_module+0xd3/0xf0 <4>[ 23.432701] __se_sys_finit_module+0xd3/0xf0 <4>[ 23.432710] do_syscall_64+0x55/0x190 <4>[ 23.432717] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 23.432724] RIP: 0033:0x7fa780782839 <4>[ 23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 <4>[ 23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839 <4>[ 23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e <4>[ 23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780 <4>[ 23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000 <4>[ 23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30 but did not report it as an issue as it only occurred during the first module on boot. This is due to the removal of the distinct global timeline, and its separate lock class. So instead mark up the expected nesting. An alternative would be to define a separate lock class for the engine, but since we only expect to have a single point of nesting, we can avoid having multiple lock classes for the struct. Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Tested-by: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180508153514.20251-1-chris@chris-wilson.co.uk
2018-05-08	drm/i915: Disable tasklet scheduling across initial scheduling	Chris Wilson	1	-3/+2
	During request submission, we call the engine->schedule() function so that we may reorder the active requests as required for inheriting the new request's priority. This may schedule several tasklets to run on the local CPU, but we will need to schedule the tasklets again for the new request. Delay all the local tasklets until the end, so that we only have to process the queue just once. v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a superset of rcu_read_lock(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180507135731.10587-2-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2018-05-04	drm/i915: Remove assertion of active_rings must be non-empty if active_requests	Chris Wilson	1	-3/+0
	"An outstanding request must still be on an active ring somewhere" is only true if we haven't just been interrupted by the shrinker in the middle of allocating the request itself. (At the start of i915_request_alloc() we pin the context and prepare the GT for activity, marking it as active, and then try to allocate the request. If this allocation invokes the shrinker, we try to reclaim some space by calling i915_retire_requests() which may then be confused by the pre-reservation of active_requests.) <3>[ 125.472695] i915_retire_requests:1429 GEM_BUG_ON(list_empty(&i915->gt.active_rings)) <2>[ 125.472792] kernel BUG at drivers/gpu/drm/i915/i915_request.c:1429! <4>[ 125.472822] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI <4>[ 125.498764] Modules linked in: snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb btrtl btbcm btintel cdc_ether snd_hda_codec_realtek bluetooth i915 snd_hda_codec_generic usbnet r8152 mii ecdh_generic lpc_ich mei_me snd_hda_intel snd_hda_codec mei snd_hwdep snd_hda_core snd_pcm prime_numbers <4>[ 125.498923] CPU: 0 PID: 1115 Comm: gem_exec_create Tainted: G U 4.17.0-rc3-gc49cbe0d1eb8-kasan_32+ #1 <4>[ 125.498955] Hardware name: GOOGLE Peppy/Peppy, BIOS MrChromebox 02/04/2018 <4>[ 125.499074] RIP: 0010:i915_retire_requests+0x3f2/0x590 [i915] <4>[ 125.499095] RSP: 0018:ffff88004e5dec40 EFLAGS: 00010282 <4>[ 125.499117] RAX: 0000000000000010 RBX: ffff8800458f0000 RCX: 0000000000000000 <4>[ 125.499140] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff880060c2f6f0 <4>[ 125.499164] RBP: ffff88004e5dee30 R08: ffffed000c185ee6 R09: ffffed000c185ee6 <4>[ 125.499187] R10: 0000000000000001 R11: ffffed000c185ee5 R12: ffff8800553da160 <4>[ 125.499210] R13: dffffc0000000000 R14: 0000000000000000 R15: ffff8800458faed0 <4>[ 125.499235] FS: 00007fe18f052980(0000) GS:ffff880065400000(0000) knlGS:0000000000000000 <4>[ 125.499262] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 125.499282] CR2: 00007f01df11efb8 CR3: 00000000518d4001 CR4: 00000000000606f0 <4>[ 125.499304] Call Trace: <4>[ 125.499417] i915_gem_shrink+0x576/0xb50 [i915] <4>[ 125.499532] ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915] <4>[ 125.499561] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.499671] ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915] <4>[ 125.499782] ? i915_gem_shrinker_scan+0xc4/0x320 [i915] <4>[ 125.499889] i915_gem_shrinker_scan+0xc4/0x320 [i915] <4>[ 125.499997] ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915] <4>[ 125.500021] ? do_raw_spin_unlock+0x4f/0x240 <4>[ 125.500042] ? _raw_spin_unlock+0x29/0x40 <4>[ 125.500149] ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915] <4>[ 125.500177] shrink_slab.part.18+0x23e/0x8f0 <4>[ 125.500202] ? unregister_shrinker+0x1f0/0x1f0 <4>[ 125.500226] ? mem_cgroup_iter+0x379/0xcc0 <4>[ 125.500249] shrink_node+0xa7e/0x1180 <4>[ 125.500276] ? shrink_node_memcg+0x11f0/0x11f0 <4>[ 125.500297] ? __delayacct_freepages_start+0x38/0x80 <4>[ 125.500319] ? __is_insn_slot_addr+0xe3/0x1a0 <4>[ 125.500342] ? recalibrate_cpu_khz+0x10/0x10 <4>[ 125.500361] ? ktime_get+0xb2/0x140 <4>[ 125.500382] do_try_to_free_pages+0x2d3/0xe40 <4>[ 125.500407] ? allow_direct_reclaim.part.23+0x1e0/0x1e0 <4>[ 125.500429] ? shrink_node+0x1180/0x1180 <4>[ 125.500450] ? __read_once_size_nocheck.constprop.4+0x10/0x10 <4>[ 125.500476] try_to_free_pages+0x1af/0x560 <4>[ 125.500497] ? do_try_to_free_pages+0xe40/0xe40 <4>[ 125.500525] __alloc_pages_nodemask+0xadc/0x2130 <4>[ 125.500553] ? gfp_pfmemalloc_allowed+0x150/0x150 <4>[ 125.500654] ? i915_gem_do_execbuffer+0x219d/0x32e0 [i915] <4>[ 125.500678] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.500701] ? __debug_object_init+0x322/0xd90 <4>[ 125.500722] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.500827] ? i915_gem_do_execbuffer+0xdc2/0x32e0 [i915] <4>[ 125.500942] ? i915_request_alloc+0x5b5/0x13f0 [i915] <4>[ 125.500964] ? page_frag_free+0x170/0x170 <4>[ 125.500984] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.501008] new_slab+0x21d/0x5c0 <4>[ 125.501029] ___slab_alloc.constprop.35+0x322/0x3e0 <4>[ 125.501052] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501074] ? __ww_mutex_lock.constprop.3+0x1104/0x2cf0 <4>[ 125.501097] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4>[ 125.501120] ? fs_reclaim_acquire+0x10/0x10 <4>[ 125.501138] ? lock_acquire+0x138/0x3c0 <4>[ 125.501156] ? lock_acquire+0x3c0/0x3c0 <4>[ 125.501176] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501198] ? __slab_alloc.isra.27.constprop.34+0x3d/0x70 <4>[ 125.501219] __slab_alloc.isra.27.constprop.34+0x3d/0x70 <4>[ 125.501243] ? reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501265] __kmalloc_track_caller+0x313/0x350 <4>[ 125.501287] krealloc+0x62/0xb0 <4>[ 125.501305] reservation_object_reserve_shared+0x10b/0x250 <4>[ 125.501411] i915_gem_do_execbuffer+0x2040/0x32e0 [i915] <4>[ 125.501522] ? eb_relocate_slow+0xad0/0xad0 [i915] <4>[ 125.501544] ? debug_check_no_locks_freed+0x2a0/0x2a0 <4>[ 125.501646] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.501755] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.501779] ? drm_dev_get+0x20/0x20 <4>[ 125.501803] ? __might_fault+0xea/0x1a0 <4>[ 125.501902] ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915] <4>[ 125.502012] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502116] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502218] i915_gem_execbuffer2_ioctl+0x3c5/0x770 [i915] <4>[ 125.502243] ? drm_dev_enter+0xe0/0xe0 <4>[ 125.502260] ? lock_acquire+0x138/0x3c0 <4>[ 125.502362] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502470] ? i915_gem_object_create.part.28+0x570/0x570 [i915] <4>[ 125.502575] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502680] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502702] drm_ioctl_kernel+0x151/0x200 <4>[ 125.502721] ? drm_ioctl_permit+0x2a0/0x2a0 <4>[ 125.502746] drm_ioctl+0x63a/0x920 <4>[ 125.502844] ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915] <4>[ 125.502868] ? drm_getstats+0x20/0x20 <4>[ 125.502886] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.502919] do_vfs_ioctl+0x173/0xe90 <4>[ 125.502936] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.502957] ? ioctl_preallocate+0x170/0x170 <4>[ 125.502978] ? trace_hardirqs_on_thunk+0x1a/0x1c <4>[ 125.503002] ? retint_kernel+0x2d/0x2d <4>[ 125.503024] ksys_ioctl+0x35/0x60 <4>[ 125.503043] __x64_sys_ioctl+0x6a/0xb0 <4>[ 125.503061] do_syscall_64+0x97/0x400 <4>[ 125.503081] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 125.503101] RIP: 0033:0x7fe18e4f65d7 <4>[ 125.503116] RSP: 002b:00007ffe2ffc06a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4>[ 125.503145] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe18e4f65d7 <4>[ 125.503168] RDX: 00007ffe2ffc07f0 RSI: 0000000040406469 RDI: 0000000000000003 <4>[ 125.503191] RBP: 00007ffe2ffc07f0 R08: 0000000000000004 R09: 00007ffe2ffcf080 <4>[ 125.503215] R10: 000000000002c7de R11: 0000000000000246 R12: 0000000040406469 <4>[ 125.503238] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 <4>[ 125.503268] Code: e8 18 a0 c9 da 48 8b 35 25 3a 47 00 49 c7 c0 a0 3b 88 c0 b9 95 05 00 00 48 c7 c2 e0 49 88 c0 48 c7 c7 8d 3b 5d c0 e8 ee 7e db da <0f> 0b 48 89 ef e8 a4 26 f5 da e9 51 fe ff ff e8 8a 26 f5 da e9 <1>[ 125.503548] RIP: i915_retire_requests+0x3f2/0x590 [i915] RSP: ffff88004e5dec40 Fixes: 643b450a594e ("drm/i915: Only track live rings for retiring") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180504101147.26286-1-chris@chris-wilson.co.uk
2018-05-04	drm/i915: Keep one request in our ring_list	Chris Wilson	1	-3/+3
	Don't pre-emptively retire the oldest request in our ring's list if it is the only request. We keep various bits of state alive using the active reference from the request and would rather transfer that state over to a new request rather than the more involved process of retiring and reacquiring it. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-2-chris@chris-wilson.co.uk
2018-05-03	drm/i915: Reset the hangcheck timestamp before repeating a seqno	Chris Wilson	1	-0/+1
	In the unusual circumstance where we reuse a seqno (for example, in igt), make sure that we reset the hangcheck timestamp before it sees the same seqno again. References: https://bugs.freedesktop.org/show_bug.cgi?id=106215 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502220313.6459-1-chris@chris-wilson.co.uk
2018-05-02	drm/i915: Split i915_gem_timeline into individual timelines	Chris Wilson	1	-35/+33
	We need to move to a more flexible timeline that doesn't assume one fence context per engine, and so allow for a single timeline to be used across a combination of engines. This means that preallocating a fence context per engine is now a hindrance, and so we want to introduce the singular timeline. From the code perspective, this has the notable advantage of clearing up a lot of mirky semantics and some clumsy pointer chasing. By splitting the timeline up into a single entity rather than an array of per-engine timelines, we can realise the goal of the previous patch of tracking the timeline alongside the ring. v2: Tweak wait_for_idle to stop the compiling thinking that ret may be uninitialised. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
2018-05-02	drm/i915: Move timeline from GTT to ring	Chris Wilson	1	-7/+6
	In the future, we want to move a request between engines. To achieve this, we first realise that we have two timelines in effect here. The first runs through the GTT is required for ordering vma access, which is tracked currently by engine. The second is implied by sequential execution of commands inside the ringbuffer. This timeline is one that maps to userspace's expectations when submitting requests (i.e. given the same context, batch A is executed before batch B). As the rings's timelines map to userspace and the GTT timeline an implementation detail, move the timeline from the GTT into the ring itself (per-context in logical-ring-contexts/execlists, or a global per-engine timeline for the shared ringbuffers in legacy submission. The two timelines are still assumed to be equivalent at the moment (no migrating requests between engines yet) and so we can simply move from one to the other without adding extra ordering. v2: Reinforce that one isn't allowed to mix the engine execution timeline with the client timeline from userspace (on the ring). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
2018-04-30	drm/i915: Only track live rings for retiring	Chris Wilson	1	-2/+8
	We don't need to track every ring for its lifetime as they are managed by the contexts/engines. What we do want to track are the live rings so that we can sporadically clean up requests if userspace falls behind. We can simply restrict the gt->rings list to being only gt->live_rings. v2: s/live/active/ for consistency with gt.active_requests Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk
2018-04-30	drm/i915: Retire requests along rings	Chris Wilson	1	-56/+94
	In the next patch, rings are the central timeline as requests may jump between engines. Therefore in the future as we retire in order along the engine timeline, we may retire out-of-order within a ring (as the ring now occurs along multiple engines), leading to much hilarity in miscomputing the position of ring->head. As an added bonus, retiring along the ring reduces the penalty of having one execlists client do cleanup for another (old legacy submission shares a ring between all clients). The downside is that slow and irregular (off the critical path) process of cleaning up stale requests after userspace becomes a modicum less efficient. In the long run, it will become apparent that the ordered ring->request_list matches the ring->timeline, a fun challenge for the future will be unifying the two lists to avoid duplication! v2: We need both engine-order and ring-order processing to maintain our knowledge of where individual rings have completed upto as well as knowing what was last executing on any engine. And finally by decoupling retiring the contexts on the engine and the timelines along the rings, we do have to keep a reference to the context on each request (previously it was guaranteed by the context being pinned). v3: Not just a reference to the context, but we need to keep it pinned as we manipulate the rings; i.e. we need a pin for both the manipulation of the engine state during its retirements, and a separate pin for the manipulation of the ring state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
2018-04-30	drm/i915: Wrap engine->context_pin() and engine->context_unpin()	Chris Wilson	1	-3/+3
	Make life easier in upcoming patches by moving the context_pin and context_unpin vfuncs into inline helpers. v2: Fixup mock_engine to mark the context as pinned on use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk
2018-04-30	drm/i915: Stop tracking timeline->inflight_seqnos	Chris Wilson	1	-16/+17
	In commit 9b6586ae9f6b ("drm/i915: Keep a global seqno per-engine"), we moved from a global inflight counter to per-engine counters in the hope that will be easy to run concurrently in future. However, with the advent of the desire to move requests between engines, we do need a global counter to preserve the semantics that no engine wraps in the middle of a submit. (Although this semantic is now only required for gen7 semaphore support, which only supports greater-then comparisons!) v2: Keep a global counter of all requests ever submitted and force the reset when it wraps. References: 9b6586ae9f6b ("drm/i915: Keep a global seqno per-engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-1-chris@chris-wilson.co.uk
2018-04-18	drm/i915: Pack params to engine->schedule() into a struct	Chris Wilson	1	-2/+2
	Today we only want to pass along the priority to engine->schedule(), but in the future we want to have much more control over the various aspects of the GPU during a context's execution, for example controlling the frequency allowed. As we need an ever growing number of parameters for scheduling, move those into a struct for convenience. v2: Move the anonymous struct into its own function for legibility and ye olde gcc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk
2018-04-18	drm/i915: Rename priotree to sched	Chris Wilson	1	-32/+34
	Having moved the priotree struct into i915_scheduler.h, identify it as the scheduling element and rebrand into i915_sched. This becomes more useful as we start attaching more information we require to propagate through the scheduler. v2: Use i915_sched_node for future distinctiveness Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-2-chris@chris-wilson.co.uk
2018-04-09	drm/i915/execlists: Log fence context & seqno throughout GEM_TRACE	Tvrtko Ursulin	1	-3/+3
	Include fence context and seqno in low level tracing so it is easier to follow flows of individual requests when things go bad. Also added tracing on the reset side of things. v2: Chris Wilson: * Standardize global_seqno and seqno as global. * Include current hws seqno in execlists_cancel_port_requests. v3: * Fix port printk format for all builds. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v2 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180406123514.5809-1-tvrtko.ursulin@linux.intel.com
2018-04-06	drm/i915: Pass the set of guilty engines to i915_reset()	Chris Wilson	1	-2/+4
	Currently, we rely on inspecting the hangcheck state from within the i915_reset() routines to determine which engines were guilty of the hang. This is problematic for cases where we want to run i915_handle_error() and call i915_reset() independently of hangcheck. Instead of relying on the indirect parameter passing, turn it into an explicit parameter providing the set of stalled engines which then are treated as guilty until proven innocent. While we are removing the implicit stalled parameter, also make the reason into an explicit parameter to i915_reset(). We still need a back-channel for i915_handle_error() to hand over the task to the locked waiter, but let's keep that its own channel rather than incriminate another. This leaves stalled/seqno as being private to hangcheck, with no more nefarious snooping by reset, be it whole-device or per-engine. \o/ The only real issue now is that this makes it crystal clear that we don't actually do any testing of hangcheck per se in drv_selftest/live_hangcheck, merely of resets! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-2-chris@chris-wilson.co.uk
2018-04-06	drm/i915: Split out parking from the idle worker for reuse	Chris Wilson	1	-49/+3
	We will want to park GEM before disengaging the drive^W^W^W unwedging. Since we already do the work for idling, expose the guts as a new function that we can then reuse. v2: Just skip if already parked; makes it more forgiving to use by future callers. v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to i915_gem_park so that we can evaluate it for symmetry more easily. Calling GEM from inside i915_request looks to be a bit of a layering violation, for the moment I am imaging them as being notify_cb. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk
2018-03-29	drm/i915: Include the HW breadcrumb whenever we trace the global_seqno	Chris Wilson	1	-11/+17
	When we include a request's global_seqno in a GEM_TRACE it often helps to know how that relates to the current breadcrumb as seen by the hardware. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180327210157.16896-3-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2018-03-22	drm/i915: Remove local timeline var from submit/unsubmit	Chris Wilson	1	-15/+15
	Both request_submit and request_unsubmit deal with transferring the request from the client's timeline onto the execution timeline and back again. As both functions deal with a pair of timeline's, using a shorthand for just one of them is slightly confusing, especially as the different functions use the shorthand for the alternate timeline. Instead, use the full version of each timeline so it should be easier to keep track of the transfer between the request/client and the engine. v2: Refactor the common lock+list_move v3: Be clear we require the other timeline list to be locked as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180322131034.6036-1-chris@chris-wilson.co.uk
2018-03-22	drm/i915: Fix tracing of submit seqno	Chris Wilson	1	-1/+1
	We pre-increment the timeline->seqno when handing it to the request, make sure the GEM_TRACE takes this into account. Otherwise, it appears that we go backwards over a preemption point: 1d..1 157681077us : __i915_request_unsubmit: vcs0 fence 75e:3 <- global_seqno 17 0d.s1 157681113us : __i915_request_submit: vcs0 fence 75e:3 -> global_seqno 16 Fixes: d9b13c4dde6c ("drm/i915: Trace GEM steps between submit and wedging") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180322110059.4467-1-chris@chris-wilson.co.uk
2018-03-20	drm/i915: Add control flags to i915_handle_error()	Chris Wilson	1	-1/+1
	Not all callers want the GPU error to handled in the same way, so expose a control parameter. In the first instance, some callers do not want the heavyweight error capture so add a bit to request the state to be captured and saved. v2: Pass msg down to i915_reset/i915_reset_engine so that we include the reason for the reset in the dev_notice(), superseding the earlier option to not print that notice. v3: Stash the reason inside the i915->gpu_error to handover to the direct reset from the blocking waiter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jeff McGee <jeff.mcgee@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180320100449.1360-2-chris@chris-wilson.co.uk
2018-03-16	drm/i915: Trace GEM steps between submit and wedging	Chris Wilson	1	-0/+23
	We still have an odd race with wedging/unwedging as shown by igt/gem_eio that defies expectations. Add some more trace_printks to try and visualize the flow over the precipice. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180315131451.4060-1-chris@chris-wilson.co.uk
2018-03-12	drm/i915: Remove the impedance mismatch around intel_engine_enable_signaling	Chris Wilson	1	-5/+1
	There is some redundancy between dma_fence->ops->enable_signaling (via i915_fence_enable_signaling) and our backend, intel_engine_enable_signaling() in that both levels recheck the fence status multiple times. If we convert intel_engine_enable_signaling() to return the information desired by dma_fence->ops->enable_signaling, we can reduce i915_fence_enable_signaling to a simple stub and avoid trying to reinterpret the same information. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180308140732.25090-1-chris@chris-wilson.co.uk
2018-03-09	drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection	Chris Wilson	1	-0/+2
	Similar to the staging around handling of engine->submit_request, we need to stop adding to the execlists->queue prior to calling engine->cancel_requests. cancel_requests will move requests from the queue onto the timeline, so if we add a request onto the queue after that point, it will be lost. Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk
2018-03-09	drm/i915: Update ring position from request on retiring	Chris Wilson	1	-1/+1
	When wedged, we do not update the ring->tail as we submit the requests causing us to leak the ring->space upon cleaning up the wedged driver. We can just use the value stored in rq->tail, and keep the submission backend details away from set-wedge. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-3-chris@chris-wilson.co.uk
2018-03-06	drm/i915: Flush waiters on seqno wraparound	Chris Wilson	1	-4/+2
	Previously, we would spin waiting for all waiters to wake up and notice their request had completed before we would reset the seqno upon wraparound. However, we can mark their waits as complete and wake them up directly using the existing machinery for handling the flushing of missed wakeups when idling. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-2-chris@chris-wilson.co.uk
2018-03-06	drm/i915: Stop kicking the signaling thread on seqno wraparound	Chris Wilson	1	-0/+2
	Since commit fd10e2ce9905 ("drm/i915/breadcrumbs: Ignore unsubmitted signalers"), we cancel the signaler when retiring the request and so upon wraparound, where we wait for all requests to be retired, we no longer need to spin waiting for the signaling thread to release its references to the in-flight requests, and so we can assert that the signaler is idle. References: fd10e2ce9905 ("drm/i915/breadcrumbs: Ignore unsubmitted signalers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-1-chris@chris-wilson.co.uk
2018-02-23	drm/i915: Update missing parts after the rename to i915_request	Michel Thierry	1	-2/+2
	Mostly doc/print messages that were not updated after commit e61e0f51ba79 ("drm/i915: Rename drm_i915_gem_request to i915_request"). Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180222172405.11386-1-michel.thierry@intel.com
2018-02-21	drm/i915: Rename drm_i915_gem_request to i915_request	Chris Wilson	1	-0/+1411
	We want to de-emphasize the link between the request (dependency, execution and fence tracking) from GEM and so rename the struct from drm_i915_gem_request to i915_request. That is we may implement the GEM user interface on top of requests, but they are an abstraction for tracking execution rather than an implementation detail of GEM. (Since they are not tied to HW, we keep the i915 prefix as opposed to intel.) In short, the spatch: @@ @@ - struct drm_i915_gem_request + struct i915_request A corollary to contracting the type name, we also harmonise on using 'rq' shorthand for local variables where space if of the essence and repetition makes 'request' unwieldy. For globals and struct members, 'request' is still much preferred for its clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>