aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/gpu/drm/i915/i915_gem.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-08-18drm/i915: Replace execbuf vma ht with an idrChris Wilson1-15/+31
This was the competing idea long ago, but it was only with the rewrite of the idr as an radixtree and using the radixtree directly ourselves, along with the realisation that we can store the vma directly in the radixtree and only need a list for the reverse mapping, that made the patch performant enough to displace using a hashtable. Though the vma ht is fast and doesn't require any extra allocation (as we can embed the node inside the vma), it does require a thread for resizing and serialization and will have the occasional slow lookup. That is hairy enough to investigate alternatives and favour them if equivalent in peak performance. One advantage of allocating an indirection entry is that we can support a single shared bo between many clients, something that was done on a first-come first-serve basis for shared GGTT vma previously. To offset the extra allocations, we create yet another kmem_cache for them. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
2017-08-15drm/i915: Handle full s64 precision for wait-ioctlChris Wilson1-1/+5
The wait-ioctl is optionally supplied a timeout with nanosecond precision in a s64 field. We use nsecs_to_jiffies64() to convert that into the jiffies consumed by the scheduler, but internally nsecs_to_jiffies64() does not guard against overflow (as it's purpose is for use by the scheduler and not drivers!). So we must guard against the overflow ourselves, and in the process note that we may then return much earlier than the timeout selected by the user, so don't report ETIME unless we do hit the timeout. (Woe betold us though if the user waits for a year (32bit) and the request is still not complete!) v2: Refine overflow detection (to not include an overffow itself) Reported-by: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170811105731.9482-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-08-15drm/i915: Split obj->cache_coherent to track r/wChris Wilson1-12/+13
Another month, another story in the cache coherency saga. This time, we come to the realisation that i915_gem_object_is_coherent() has been reporting whether we can read from the target without requiring a cache invalidate; but we were using it in places for testing whether we could write into the object without requiring a cache flush. So split the tracking into two, one to decide before reads, one after writes. See commit e27ab73d17ef ("drm/i915: Mark CPU cache as dirty on every transition for CPU writes") for the previous entry in this saga. v2: Be verbose v3: Remove unused function (i915_gem_object_is_coherent) v4: Fix inverted coherency check prior to execbuf (from v2) v5: Add comment for nasty code where we are optimising on gcc's behalf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555 Testcase: igt/kms_mmap_write_crc Testcase: igt/kms_pwrite_crc Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-07-27drm/i915: Call the unlocked version of i915_gem_object_get_pages()Chris Wilson1-1/+1
When we hold for the lock for swapping out the shmem pages for the physically contiguous pages, we have to call the unlocked version of get_pages! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101934 Fixes: 35d23516946e ("drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-2-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27drm/i915: Move i915_gem_object_phys_attach()Chris Wilson1-60/+58
Prevent a forward declaration in the next patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-1-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriatelyChris Wilson1-15/+35
Actually transferring from shmemfs to the physically contiguous set of pages should be wholly guarded by its obj->mm.lock! v2: Remember to free the old pages. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170726160038.29487-2-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27drm/i915: Don't touch fence->error when resetting an innocent requestChris Wilson1-19/+34
If the request has been completed before the reset took effect, we don't need to mark it up as being a victim. Touching fence->error after the fence has been signaled is detected by dma_fence_set_error() and triggers a BUG: [ 231.743133] kernel BUG at ./include/linux/dma-fence.h:434! [ 231.743156] invalid opcode: 0000 [#1] SMP KASAN [ 231.743172] Modules linked in: i915 drm_kms_helper drm iptable_nat nf_nat_ipv4 nf_nat x86_pkg_temp_thermal iosf_mbi i2c_algo_bit cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev [last unloaded: drm] [ 231.743221] CPU: 2 PID: 20 Comm: kworker/2:0 Tainted: G U 4.13.0-rc1+ #52 [ 231.743236] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011 [ 231.743363] Workqueue: events_long i915_hangcheck_elapsed [i915] [ 231.743382] task: ffff8801f42e9780 task.stack: ffff8801f42f8000 [ 231.743489] RIP: 0010:i915_gem_reset_engine+0x45a/0x460 [i915] [ 231.743505] RSP: 0018:ffff8801f42ff770 EFLAGS: 00010202 [ 231.743521] RAX: 0000000000000007 RBX: ffff8801bf6b1880 RCX: ffffffffa02881a6 [ 231.743537] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8801bf6b18c8 [ 231.743551] RBP: ffff8801f42ff7c8 R08: 0000000000000001 R09: 0000000000000000 [ 231.743566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801edb02d00 [ 231.743581] R13: ffff8801e19d4200 R14: 000000000000001d R15: ffff8801ce2a4000 [ 231.743599] FS: 0000000000000000(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000 [ 231.743614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.743629] CR2: 00007f0ebd1add10 CR3: 0000000002621000 CR4: 00000000000406e0 [ 231.743643] Call Trace: [ 231.743752] i915_gem_reset+0x6c/0x150 [i915] [ 231.743853] i915_reset+0x175/0x210 [i915] [ 231.743958] i915_reset_device+0x33b/0x350 [i915] [ 231.744061] ? valleyview_pipestat_irq_handler+0xe0/0xe0 [i915] [ 231.744081] ? trace_hardirqs_off_caller+0x70/0x110 [ 231.744102] ? _raw_spin_unlock_irqrestore+0x46/0x50 [ 231.744120] ? find_held_lock+0x119/0x150 [ 231.744138] ? mark_lock+0x6d/0x850 [ 231.744241] ? gen8_gt_irq_ack+0x1f0/0x1f0 [i915] [ 231.744262] ? work_on_cpu_safe+0x60/0x60 [ 231.744284] ? rcu_read_lock_sched_held+0x57/0xa0 [ 231.744400] ? gen6_read32+0x2ba/0x320 [i915] [ 231.744506] i915_handle_error+0x382/0x5f0 [i915] [ 231.744611] ? gen6_rps_reset_ei+0x20/0x20 [i915] [ 231.744630] ? vsnprintf+0x128/0x8e0 [ 231.744649] ? pointer+0x6b0/0x6b0 [ 231.744667] ? debug_check_no_locks_freed+0x1a0/0x1a0 [ 231.744688] ? scnprintf+0x92/0xe0 [ 231.744706] ? snprintf+0xb0/0xb0 [ 231.744820] hangcheck_declare_hang+0x15a/0x1a0 [i915] [ 231.744932] ? engine_stuck+0x440/0x440 [i915] [ 231.744951] ? rcu_read_lock_sched_held+0x57/0xa0 [ 231.745062] ? gen6_read32+0x2ba/0x320 [i915] [ 231.745173] ? gen6_read16+0x320/0x320 [i915] [ 231.745284] ? intel_engine_get_active_head+0x91/0x170 [i915] [ 231.745401] i915_hangcheck_elapsed+0x3d8/0x400 [i915] [ 231.745424] process_one_work+0x3e8/0xac0 [ 231.745444] ? pwq_dec_nr_in_flight+0x110/0x110 [ 231.745464] ? do_raw_spin_lock+0x8e/0x120 [ 231.745484] worker_thread+0x8d/0x720 [ 231.745506] kthread+0x19e/0x1f0 [ 231.745524] ? process_one_work+0xac0/0xac0 [ 231.745541] ? kthread_create_on_node+0xa0/0xa0 [ 231.745560] ret_from_fork+0x27/0x40 [ 231.745581] Code: 8b 7d c8 e8 49 0d 02 e1 49 8b 7f 38 48 8b 75 b8 48 83 c7 10 e8 b8 89 be e1 e9 95 fc ff ff 4c 89 e7 e8 4b b9 ff ff e9 30 ff ff ff <0f> 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe [ 231.745767] RIP: i915_gem_reset_engine+0x45a/0x460 [i915] RSP: ffff8801f42ff770 At first glance this looks to be related to commit c64992e035d7 ("drm/i915: Look for active requests earlier in the reset path"), but it could easily happen before as well. On the other hand, we no longer logged victims due to the active_request being dropped earlier. v2: Be trickier to unwind the incomplete request as we cannot rely on request retirement for the lockless per-engine reset. v3: Reprobe the active request at the time of the reset. Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> Fixes: c64992e035d7 ("drm/i915: Look for active requests earlier in the reset path") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-15-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1 Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27drm/i915: Make i915_gem_context_mark_guilty() safe for unlocked updatesChris Wilson1-14/+18
Since we make call i915_gem_context_mark_guilty() concurrently when resetting different engines in parallel, we need to make sure that our updates are safe for the unlocked access. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-12-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27drm/i915: Clear engine irq posted following a resetChris Wilson1-0/+2
When the GPU is reset, we want to discard all pending notifications as either we have manually completed them, or they are no longer applicable. Make sure we do reset the engine->irq_posted prior to re-enabling the engine (e.g. the interrupt tasklets) in i915_gem_reset_finish_engine(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-11-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27drm/i915: Assert that machine is wedged for nop_submit_requestChris Wilson1-0/+1
We should only ever do nop_submit_request when the machine is wedged, so assert it is so. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-10-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27drm/i915: Wake up waiters after setting the WEDGED bitChris Wilson1-1/+3
After setting the WEDGED bit, make sure that we do wake up waiters as they may not be waiting for a request completion yet, just for its execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-9-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27drm/i915: Clear execlist port[] before updating seqno on wedgingChris Wilson1-7/+7
When we wedge the device, we clear out the in-flight requests and advance the breadcrumb to indicate they are complete. However, the breadcrumb advance includes an assert that the engine is idle, so that advancement needs to be the last step to ensure we pass our own sanity checks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-7-chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27Merge airlied/drm-next into drm-intel-next-queuedDaniel Vetter1-1/+2
Resync with upstream to avoid git getting too badly confused. Also, we have a conflict with the drm_vblank_cleanup removal, which cannot be resolved by simply taking our side. Bake that in properly. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-20drm/i915: Remove intel_flip_work infrastructureDaniel Vetter1-2/+0
This gets rid of all the interactions between the legacy flip code and the modeset code. Yay! This highlights an ommission in the atomic paths, where we fail to apply a boost to the pending rendering when we miss the target vblank. But the existing code is still dead and can be removed. v2: Note that the boosting doesn't work in atomic (Chris). Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170720175754.30751-7-daniel.vetter@ffwll.ch
2017-07-20Merge tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel into drm-nextDave Airlie1-86/+79
2nd round of 4.14 features: - prep for deferred fbdev setup - refactor fixed 16.16 computations and skl+ wm code (Mahesh Kumar) - more cnl paches (Rodrigo, Imre et al) - tighten context cleanup and handling (Chris Wilson) - fix interlaced handling on skl+ (Mahesh Kumar) - small bits as usual * tag 'drm-intel-next-2017-07-17' of git://anongit.freedesktop.org/git/drm-intel: (84 commits) drm/i915: Update DRIVER_DATE to 20170717 drm/i915: Protect against deferred fbdev setup drm/i915/fbdev: Always forward hotplug events drm/i915/skl+: unify cpp value in WM calculation drm/i915/skl+: WM calculation don't require height drm/i915: Addition wrapper for fixed16.16 operation drm/i915: cleanup fixed-point wrappers naming drm/i915: Always perform internal fixed16 division in 64 bits drm/i915: take-out common clamping code of fixed16 wrappers drm/i915/cnl: Add missing type case. drm/i915/cnl: Add max allowed Cannonlake DC. drm/i915: Make DP-MST connector info work drm/i915/cnl: Get DDI clock based on PLLs. drm/i915/cnl: Inherit RPS stuff from previous platforms. drm/i915/cnl: Gen10 render context size. drm/i915/cnl: Don't trust VBT's alternate pin for port D for now. drm/i915: Fix the kernel panic when using aliasing ppgtt drm/i915/cnl: Cannonlake color init. drm/i915/cnl: Add force wake for gen10+. x86/gpu: CNL uses the same GMS values as SKL ...
2017-07-12drm/i915: use __GFP_RETRY_MAYFAILMichal Hocko1-1/+2
Commit 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") has tried to remove disruptive OOM killer because the userspace should be able to cope with allocation failures. At the time only __GFP_NORETRY could achieve that and it turned out that this would fail the allocations just too easily. So "drm/i915: Remove __GFP_NORETRY from our buffer allocator" removed it and hoped for a better solution. __GFP_RETRY_MAYFAIL is that solution. It will keep retrying the allocation until there is no more progress and we would go OOM. Instead we fail the allocation and let the caller to deal with it. Link: http://lkml.kernel.org/r/20170623085345.11304-6-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Alex Belits <alex.belits@cavium.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David Daney <david.daney@cavium.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-28drm/i915: Avoid keeping waitboost active for signaling threadsChris Wilson1-24/+1
Once a client has requested a waitboost, we keep that waitboost active until all clients are no longer waiting. This is because we don't distinguish which waiter deserves the boost. However, with the advent of fence signaling, the signaler threads appear as waiters to the RPS interrupt handler. So instead of using a single boolean to track when to keep the waitboost active, use a counter of all outstanding waitboosted requests. At this point, I have removed all vestiges of the rate limiting on clients. Whilst this means that compositors should remain more fluid, it also means that boosts are more prevalent. See commit b29c19b64528 ("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion on the pros and cons of both approaches. A drawback of this implementation is that it requires constant request submission to keep the waitboost trimmed (as it is now cancelled when the request is completed). This will be fine for a busy system, but near idle the boosts may be kept for longer than desired (effectively tens of vblanks worstcase) and there is a reliance on rc6 instead. v2: Remove defunct rps.client_lock Reported-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk
2017-06-28drm/i915: Drop flushing of the object free list/worker from i915_gem_suspendChris Wilson1-2/+0
i915_gem_suspend() is called from all of our finalization paths (suspend, hibernate, unload). i915_gem_drain_freed_objects() adds an arbitrary delay as it uses an rcu_barrier() to ensure that there are no more freed objects in flight, and this delay causes a large amount of variability in suspend timings. For S3 suspend, we do not need to free pages as doing so does not impact at all upon the system in its suspended state, unlike S4 hibernation where we do want the hibernation image to be as small as possible. Therefore we can forgo waiting inside i915_gem_suspend(), so long as we ensure that we do cleanup before unload (see i915_gem_load_cleanup()) and prefer to reap our objects prior to hibernation (see i915_gem_freeze()). Removing the rcu_barrier() from i915_gem_suspend() improves S3 latency by about 30ms on Skylake (ymmv). Reported-by: David Weinehall <david.weinehall@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170627173731.11566-1-chris@chris-wilson.co.uk Tested-by: David Weinehall <david.weinehall@linux.intel.com> Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
2017-06-23drm/i915: Break modeset deadlocks on resetChris Wilson1-14/+4
Trying to do a modeset from within a reset is fraught with danger. We can fall into a cyclic deadlock where the modeset is waiting on a previous modeset that is waiting on a request, and since the GPU hung that request completion is waiting on the reset. As modesetting doesn't allow its locks to be broken and restarted, or for its *own* reset mechanism to take over the display, we have to do something very evil instead. If we detect that we are stuck waiting to prepare the display reset (by using a very simple timeout), resort to cancelling all in-flight requests and throwing the user data into /dev/null, which is marginally better than the driver locking up and keeping that data to itself. This is not a fix; this is just a workaround that unbreaks machines until we can resolve the deadlock in a way that doesn't lose data! v2: Move the retirement from set-wegded to the i915_reset() error path, after which we no longer any delayed worker cleanup for i915_handle_error() v3: C abuse for syntactic sugar v4: Cover all waits with the timeout to catch more driver breakage References: https://bugs.freedesktop.org/show_bug.cgi?id=99093 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170622105625.16952-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-21drm/i915: Cancel pending execlist tasklet upon wedgingChris Wilson1-0/+7
Highly unlikely, but if the stop_machine() did suspend the tasklet, we want to make sure that when it wakes it finds there is nothing to do. Otherwise, it will loudly complain that the ELSP port tracking no longer matches the hardware, and we will be mightly confused. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170621124804.4529-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-20drm/i915: Add support for per engine reset recoveryMichel Thierry1-36/+57
This change implements support for per-engine reset as an initial, less intrusive hang recovery option to be attempted before falling back to the legacy full GPU reset recovery mode if necessary. This is only supported from Gen8 onwards. Hangchecker determines which engines are hung and invokes error handler to recover from it. Error handler schedules recovery for each of those engines that are hung. The recovery procedure is as follows, - identifies the request that caused the hang and it is dropped - force engine to idle: this is done by issuing a reset request - reset the engine - re-init the engine to resume submissions. If engine reset fails then we fall back to heavy weight full gpu reset which resets all engines and reinitiazes complete state of HW and SW. v2: Rebase. v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before calling i915_gem_reset_engine (Chris). v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and reuse the function for reset_engine. v5: intel_reset_engine_start/cancel instead of request/unrequest_reset. v6: Clean up reset_engine function to not require mutex, i.e. no need to call revoke/restore_fences and _retire_requests (Chris). v7: Remove leftovers from v5, i.e. no need to disable irq, hold forcewake or wakeup the handoff bit (Chris). v8: engine_retire_requests should be (and it was) static; explain that we have to re-init the engine after reset, which is why the init_hw call is needed; check reset-in-progress flag (Chris). v9: Rebase, include code to pass the active request to gem_reset_engine (as it is already done in full reset). Remove unnecessary intel_reset_engine_start/cancel, these are executed as part of the reset. v10: Rebase, use the right I915_RESET_ENGINE flag. v11: Fixup to call reset_finish_engine even on error. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Tomas Elf <tomas.elf@intel.com> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-6-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-6-chris@chris-wilson.co.uk
2017-06-20drm/i915: Look for active requests earlier in the reset pathMichel Thierry1-7/+8
And store the active request so that we only search for it once. v2: Check for request completion inside _prepare_engine, don't use ECANCELED, remove unnecessary null checks (Chris). v3: Capture active requests during reset_prepare and store it the engine hangcheck obj. v4: Rename commit, change i915_gem_reset_request to just confirm the active_request is still incomplete, instead of engine_stalled (Chris). v5: With style; pass the active request to gem_reset_engine, keep single return in reset_prepare_engine (Chris). v6: Moved before reset-engine code appears (Chris) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Signed-off-by: Michel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-2-michel.thierry@intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-3-chris@chris-wilson.co.uk
2017-06-20drm/i915: Group all the global context information togetherChris Wilson1-7/+6
Create a substruct to hold all the global context state under drm_i915_private. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk
2017-06-16drm/i915: Async GPU relocation processingChris Wilson1-1/+0
If the user requires patching of their batch or auxiliary buffers, we currently make the alterations on the cpu. If they are active on the GPU at the time, we wait under the struct_mutex for them to finish executing before we rewrite the contents. This happens if shared relocation trees are used between different contexts with separate address space (and the buffers then have different addresses in each), the 3D state will need to be adjusted between execution on each context. However, we don't need to use the CPU to do the relocation patching, as we could queue commands to the GPU to perform it and use fences to serialise the operation with the current activity and future - so the operation on the GPU appears just as atomic as performing it immediately. Performing the relocation rewrites on the GPU is not free, in terms of pure throughput, the number of relocations/s is about halved - but more importantly so is the time under the struct_mutex. v2: Break out the request/batch allocation for clearer error flow. v3: A few asserts to ensure rq ordering is maintained Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16drm/i915: Wait upon userptr get-user-pages within execbufferChris Wilson1-1/+3
This simply hides the EAGAIN caused by userptr when userspace causes resource contention. However, it is quite beneficial with highly contended userptr users as we avoid repeating the setup costs and kernel-user context switches. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-06-16drm/i915: Store a direct lookup from object handle to vmaChris Wilson1-1/+4
The advent of full-ppgtt lead to an extra indirection between the object and its binding. That extra indirection has a noticeable impact on how fast we can convert from the user handles to our internal vma for execbuffer. In order to bypass the extra indirection, we use a resizable hashtable to jump from the object to the per-ctx vma. rhashtable was considered but we don't need the online resizing feature and the extra complexity proved to undermine its usefulness. Instead, we simply reallocate the hastable on demand in a background task and serialize it before iterating. In non-full-ppgtt modes, multiple files and multiple contexts can share the same vma. This leads to having multiple possible handle->vma links, so we only use the first to establish the fast path. The majority of buffers are not shared and so we should still be able to realise speedups with multiple clients. v2: Prettier names, more magic. v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-16drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirtyChris Wilson1-7/+7
For ease of use (i.e. avoiding a few checks and function calls), store the object's cache coherency next to the cache is dirty bit. Specifically this patch aims to reduce the frequency of no-op calls to i915_gem_object_clflush() to counter-act the increase of such calls for GPU only objects in the previous patch. v2: Replace cache_dirty & ~cache_coherent with cache_dirty && !cache_coherent as gcc generates much better code for the latter (Tvrtko) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Dongwon Kim <dongwon.kim@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-16drm/i915: Mark CPU cache as dirty on every transition for CPU writesChris Wilson1-30/+46
Currently, we only mark the CPU cache as dirty if we skip a clflush. This leads to some confusion where we have to ask if the object is in the write domain or missed a clflush. If we always mark the cache as dirty, this becomes a much simply question to answer. The goal remains to do as few clflushes as required and to do them as late as possible, in the hope of deferring the work to a kthread and not block the caller (e.g. execbuf, flips). v2: Always call clflush before GPU execution when the cache_dirty flag is set. This may cause some extra work on llc systems that migrate dirty buffers back and forth - but we do try to limit that by only setting cache_dirty at the end of the gpu sequence. v3: Always mark the cache as dirty upon a level change, as we need to invalidate any stale cachelines due to external writes. Reported-by: Dongwon Kim <dongwon.kim@intel.com> Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dongwon Kim <dongwon.kim@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Tested-by: Dongwon Kim <dongwon.kim@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-06-14drm/i915: Only restrict noreclaim in the early shrink passesChris Wilson1-2/+1
In our first pass, we do not want to use reclaim at all as we want to solely reap the i915 buffer caches (its purgeable pages). But we don't mind it initiates IO or pulls via the FS (but it shouldn't anyway as we say no to reclaim!). Just drop the GFP_IO constraint for simplicity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-3-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-14drm/i915: Remove __GFP_NORETRY from our buffer allocatorChris Wilson1-1/+14
I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It struggles with handling reclaim of our dirty buffers and relies on reclaim via kswapd. As a result, a single pass of direct reclaim is unreliable when i915 occupies the majority of available memory, and the only means of effectively waiting on kswapd to amke progress is by not setting the __GFP_NORETRY flag and lopping. That leaves us with the dilemma of invoking the oomkiller instead of propagating the allocation failure back to userspace where it can be handled more gracefully (one hopes). In the future we may have __GFP_MAYFAIL to allow repeats up until we genuinely run out of memory and the oomkiller would have been invoked. Until then, let the oomkiller wreck havoc. v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL v3: Update comments that direct reclaim only appears to be ignoring our dirty buffers! Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") Testcase: igt/gem_tiled_swapping Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Michal Hocko <mhocko@suse.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-06-14drm/i915: Encourage our shrinker more when our shmemfs allocations failsChris Wilson1-21/+29
Commit 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") made the bold decision to try and avoid the oomkiller by reporting -ENOMEM to userspace if our allocation failed after attempting to free enough buffer objects. In short, it appears we were giving up too easily (even before we start wondering if one pass of reclaim is as strong as we would like). Part of the problem is that if we only shrink just enough pages for our expected allocation, the likelihood of those pages becoming available to us is less than 100% To counter-act that we ask for twice the number of pages to be made available. Furthermore, we allow the shrinker to pull pages from the active list in later passes. v2: Be a little more cautious in paging out gfx buffers, and leave that to a more balanced approach from shrink_slab(). Important when combined with "drm/i915: Start writeback from the shrinker" as anything shrunk is immediately swapped out and so should be more conservative. Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk
2017-06-02drm/i915: return the correct usable aperture size under gvt environmentWeinan Li1-2/+2
I915_GEM_GET_APERTURE ioctl is used to probe aperture size from userspace. In gvt environment, each vm only use the ballooned part of aperture, so we should return the correct available aperture size exclude the reserved part by balloon. v2: add 'reserved' in struct i915_address_space to record the reserved size in ggtt (Chris) v3: remain aper_size as total, adjust aper_available_size exclude reserved and pinned. UMD driver need to adjust the max allocation size according to the available aperture size but not total size. KMD return the correct usable aperture size any time (Chris, Joonas) v4: decrease reserved in deballoon (Joonas) v5: add onion teardown in balloon, add vgt_deballoon_space (Joonas) v6: change title name (Zhenyu) v7: code style refine (Joonas) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Weinan Li <weinan.z.li@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1496198152-14175-1-git-send-email-weinan.z.li@intel.com Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-05-30drm/i915: Short-circuit i915_gem_wait_for_idle() if already idleChris Wilson1-0/+4
If the device is asleep (no GT wakeref), we know the GPU is already idle. If we add an early return, we can avoid touching registers and checking hw state outside of the assumed GT wakelock. This prevents causing such errors whilst debugging: [ 2613.401647] RPM wakelock ref not held during HW access [ 2613.401684] ------------[ cut here ]------------ [ 2613.401720] WARNING: CPU: 5 PID: 7739 at drivers/gpu/drm/i915/intel_drv.h:1787 gen6_read32+0x21f/0x2b0 [i915] [ 2613.401731] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me lpc_ich mei prime_numbers [last unloaded: i915] [ 2613.401823] CPU: 5 PID: 7739 Comm: drv_missed_irq Tainted: G U 4.12.0-rc2-CI-CI_DRM_421+ #1 [ 2613.401825] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016 [ 2613.401840] task: ffff880409e3a740 task.stack: ffffc900084dc000 [ 2613.401861] RIP: 0010:gen6_read32+0x21f/0x2b0 [i915] [ 2613.401863] RSP: 0018:ffffc900084dfce8 EFLAGS: 00010292 [ 2613.401869] RAX: 000000000000002a RBX: ffff8804016a8000 RCX: 0000000000000006 [ 2613.401871] RDX: 0000000000000006 RSI: ffffffff81cbf2d9 RDI: ffffffff81c9e3a7 [ 2613.401874] RBP: ffffc900084dfd18 R08: ffff880409e3afc8 R09: 0000000000000000 [ 2613.401877] R10: 000000008a1c483f R11: 0000000000000000 R12: 000000000000209c [ 2613.401879] R13: 0000000000000001 R14: ffff8804016a8000 R15: ffff8804016ac150 [ 2613.401882] FS: 00007f39ef3dd8c0(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000 [ 2613.401885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2613.401887] CR2: 00000000023717c8 CR3: 00000002e7b34000 CR4: 00000000001406e0 [ 2613.401889] Call Trace: [ 2613.401912] intel_engine_is_idle+0x76/0x90 [i915] [ 2613.401931] i915_gem_wait_for_idle+0xe6/0x1e0 [i915] [ 2613.401951] fault_irq_set+0x40/0x90 [i915] [ 2613.401970] i915_ring_test_irq_set+0x42/0x50 [i915] [ 2613.401976] simple_attr_write+0xc7/0xe0 [ 2613.401981] full_proxy_write+0x4f/0x70 [ 2613.401987] __vfs_write+0x23/0x120 [ 2613.401992] ? rcu_read_lock_sched_held+0x75/0x80 [ 2613.401996] ? rcu_sync_lockdep_assert+0x2a/0x50 [ 2613.401999] ? __sb_start_write+0xfa/0x1f0 [ 2613.402004] vfs_write+0xc5/0x1d0 [ 2613.402008] ? trace_hardirqs_on_caller+0xe7/0x1c0 [ 2613.402013] SyS_write+0x44/0xb0 [ 2613.402020] entry_SYSCALL_64_fastpath+0x1c/0xb1 [ 2613.402022] RIP: 0033:0x7f39eded6670 [ 2613.402025] RSP: 002b:00007fffdcdcb1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 2613.402030] RAX: ffffffffffffffda RBX: ffffffff81470203 RCX: 00007f39eded6670 [ 2613.402033] RDX: 0000000000000001 RSI: 000000000041bc33 RDI: 0000000000000006 [ 2613.402036] RBP: ffffc900084dff88 R08: 00007f39ef3dd8c0 R09: 0000000000000001 [ 2613.402038] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000041bc33 [ 2613.402041] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000 [ 2613.402046] ? __this_cpu_preempt_check+0x13/0x20 [ 2613.402052] Code: 01 9b fa e0 0f ff e9 28 fe ff ff 80 3d 6a dd 0e 00 00 0f 85 29 fe ff ff 48 c7 c7 48 19 29 a0 c6 05 56 dd 0e 00 01 e8 da 9a fa e0 <0f> ff e9 0f fe ff ff b9 01 00 00 00 ba 01 00 00 00 44 89 e6 48 [ 2613.402199] ---[ end trace 31f0cfa93ab632bf ]--- Fixes: 25112b64b3d2 ("drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170530121334.17364-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-05-30Merge tag 'drm-intel-next-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel into drm-nextDave Airlie1-96/+168
More stuff for 4.13: - skl+ wm fixes from Mahesh Kumar - some refactor and tests for i915_sw_fence (Chris) - tune execlist/scheduler code (Chris) - g4x,g33 gpu reset improvements (Chris, Mika) - guc code cleanup (Michal Wajdeczko, Michał Winiarski) - dp aux backlight improvements (Puthikorn Voravootivat) - buffer based guc/host communication (Michal Wajdeczko) * tag 'drm-intel-next-2017-05-29' of git://anongit.freedesktop.org/git/drm-intel: (253 commits) drm/i915: Update DRIVER_DATE to 20170529 drm/i915: Keep the forcewake timer alive for 1ms past the most recent use drm/i915/guc: capture GuC logs if FW fails to load drm/i915/guc: Introduce buffer based cmd transport drm/i915/guc: Disable send function on fini drm: Add definition for eDP backlight frequency drm/i915: Drop AUX backlight enable check for backlight control drm/i915: Consolidate #ifdef CONFIG_INTEL_IOMMU drm/i915: Only GGTT vma may be pinned and prevent shrinking drm/i915: Serialize GTT/Aperture accesses on BXT drm/i915: Convert i915_gem_object_ops->flags values to use BIT() drm/i915/selftests: Silence compiler warning in igt_ctx_exec drm/i915/guc: Skip port assign on first iteration of GuC dequeue drm/i915: Remove misleading comment in request_alloc drm/i915/g33: Improve reset reliability Revert "drm/i915: Restore lost "Initialized i915" welcome message" drm/i915/huc: Update GLK HuC version drm/i915: Check for allocation failure drm/i915/guc: Remove action status and statistics from debugfs drm/i915/g4x: Improve gpu reset reliability ...
2017-05-25drm/i915: Consolidate #ifdef CONFIG_INTEL_IOMMUChris Wilson1-3/+1
We depend on intel_iommu_gfx_mapped for various workarounds, but that is only available under an #ifdef CONFIG_INTEL_IOMMU. Refactor all the cut-and-paste ifdefs to a common routine. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170525121612.2190-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-05-18drm: drop drm_[cm]alloc* helpersMichal Hocko1-2/+2
Now that drm_[cm]alloc* helpers are simple one line wrappers around kvmalloc_array and drm_free_large is just kvfree alias we can drop them and replace by their native forms. This shouldn't introduce any functional change. Changes since v1 - fix typo in drivers/gpu//drm/etnaviv/etnaviv_gem.c - noticed by 0day build robot Suggested-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Michal Hocko <mhocko@suse.com>drm: drop drm_[cm]alloc* helpers [danvet: Fixup vgem which grew another user very recently.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Christian König <christian.koenig@amd.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517122312.GK18247@dhcp22.suse.cz
2017-05-17drm/i915: Create a kmem_cache to allocate struct i915_priolist fromChris Wilson1-1/+8
The i915_priolist are allocated within an atomic context on a path where we wish to minimise latency. If we use a dedicated kmem_cache, we have the advantage of a local freelist from which to service new requests that should keep the latency impact of an allocation small. Though currently we expect the majority of requests to be at default priority (and so hit the preallocate priolist), once userspace starts using priorities they are likely to use many fine grained policies improving the utilisation of a private slab. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
2017-05-17drm/i915: Split execlist priority queue into rbtree + linked listChris Wilson1-6/+1
All the requests at the same priority are executed in FIFO order. They do not need to be stored in the rbtree themselves, as they are a simple list within a level. If we move the requests at one priority into a list, we can then reduce the rbtree to the set of priorities. This should keep the height of the rbtree small, as the number of active priorities can not exceed the number of active requests and should be typically only a few. Currently, we have ~2k possible different priority levels, that may increase to allow even more fine grained selection. Allocating those in advance seems a waste (and may be impossible), so we opt for allocating upon first use, and freeing after its requests are depleted. To avoid the possibility of an allocation failure causing us to lose a request, we preallocate the default priority (0) and bump any request to that priority if we fail to allocate it the appropriate plist. Having a request (that is ready to run, so not leading to corruption) execute out-of-order is better than leaking the request (and its dependency tree) entirely. There should be a benefit to reducing execlists_dequeue() to principally using a simple list (and reducing the frequency of both rbtree iteration and balancing on erase) but for typical workloads, request coalescing should be small enough that we don't notice any change. The main gain is from improving PI calls to schedule, and the explicit list within a level should make request unwinding simpler (we just need to insert at the head of the list rather than the tail and not have to make the rbtree search more complicated). v2: Avoid use-after-free when deleting a depleted priolist v3: Michał found the solution to handling the allocation failure gracefully. If we disable all priority scheduling following the allocation failure, those requests will be executed in fifo and we will ensure that this request and its dependencies are in strict fifo (even when it doesn't realise it is only a single list). Normal scheduling is restored once we know the device is idle, until the next failure! Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
2017-05-17drm/i915/execlists: Pack the count into the low bits of the port.requestChris Wilson1-2/+4
add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187) function old new delta execlists_submit_ports 262 471 +209 port_assign.isra - 136 +136 capture 6344 6359 +15 reset_common_ring 438 452 +14 execlists_submit_request 228 238 +10 gen8_init_common_ring 334 341 +7 intel_engine_is_idle 106 105 -1 i915_engine_info 2314 2290 -24 __i915_gem_set_wedged_BKL 485 411 -74 intel_lrc_irq_handler 1789 1604 -185 execlists_update_context 294 - -294 The most important change there is the improve to the intel_lrc_irq_handler and excclist_submit_ports (net improvement since execlists_update_context is now inlined). v2: Use the port_api() for guc as well (even though currently we do not pack any counters in there, yet) and hide all port->request_count inside the helpers. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
2017-05-17drm/i915: Redefine ptr_pack_bits() and friendsChris Wilson1-3/+3
Rebrand the current (pointer | bits) pack/unpack utility macros as explicit bit twiddling for PAGE_SIZE so that we can use the more flexible underlying macros for different bits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk
2017-05-17drm/i915: Make ptr_unpack_bits() more function-likeChris Wilson1-1/+1
ptr_unpack_bits() is a function-like macro, as such it is meant to be replaceable by a function. In this case, we should be passing in the out-param as a pointer. Bizarrely this does affect code generation: function old new delta i915_gem_object_pin_map 409 389 -20 An improvement(?) in this case, but one can't help wonder what strict-aliasing optimisations we are preventing. The generated code looks identical in using ptr_unpack_bits (no extra motions to stack, the pointer and bits appear to be kept in registers), the difference appears to be code ordering and with a reorder it is able to use smaller forward jumps. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-3-chris@chris-wilson.co.uk
2017-05-10Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...
2017-05-03drm/i915: Squash repeated awaits on the same fenceChris Wilson1-0/+1
Track the latest fence waited upon on each context, and only add a new asynchronous wait if the new fence is more recent than the recorded fence for that context. This requires us to filter out unordered timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the absence of a universal identifier, we have to use our own i915->mm.unordered_timeline token. v2: Throw around the debug crutches v3: Inline the likely case of the pre-allocation cache being full. v4: Drop the pre-allocation support, we can lose the most recent fence in case of allocation failure -- it just means we may emit more awaits than strictly necessary but will not break. v5: Trim allocation size for leaf nodes, they only need an array of u32 not pointers. v6: Create mock_timeline to tidy selftest writing v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko) v8: Prune the stale sync points when we idle. v9: Include a small benchmark in the kselftests v10: Separate the idr implementation into its own compartment. (Tvrkto) v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto) v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves v13: kselftests to investigate struct i915_syncmap itself (Tvrtko) v14: Foray into ascii art graphs v15: Take into account that the random lookup/insert does 2 prng calls, not 1, when benchmarking, and use for_each_set_bit() (Tvrtko) v16: Improved ascii art Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk
2017-05-03drm/i915: Mark up clflushes as belonging to an unordered timelineChris Wilson1-1/+1
2 clflushes on two different objects are not ordered, and so do not belong to the same timeline (context). Either we use a unique context for each, or we reserve a special global context to mean unordered. Ideally, we would reserve 0 to mean unordered (DMA_FENCE_NO_CONTEXT) to have the same semantics everywhere. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-1-chris@chris-wilson.co.uk
2017-04-29Merge tag 'drm-intel-next-fixes-2017-04-27' of git://anongit.freedesktop.org/git/drm-intel into drm-nextDave Airlie1-1/+1
drm/i915 and gvt fixes for drm-next/v4.12 * tag 'drm-intel-next-fixes-2017-04-27' of git://anongit.freedesktop.org/git/drm-intel: drm/i915: Confirm the request is still active before adding it to the await drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio drm/i915/selftests: Allocate inode/file dynamically drm/i915: Fix system hang with EI UP masked on Haswell drm/i915: checking for NULL instead of IS_ERR() in mock selftests drm/i915: Perform link quality check unconditionally during long pulse drm/i915: Fix use after free in lpe_audio_platdev_destroy() drm/i915: Use the right mapping_gfp_mask for final shmem allocation drm/i915: Make legacy cursor updates more unsynced drm/i915: Apply a cond_resched() to the saturated signaler drm/i915: Park the signaler before sleeping drm/i915/gvt: fix a bounds check in ring_id_to_context_switch_event() drm/i915/gvt: Fix PTE write flush for taking runtime pm properly drm/i915/gvt: remove some debug messages in scheduler timer handler drm/i915/gvt: add mmio init for virtual display drm/i915/gvt: use directly assignment for structure copying drm/i915/gvt: remove redundant ring id check which cause significant CPU misprediction drm/i915/gvt: remove redundant platform check for mocs load/restore drm/i915/gvt: Align render mmio list to cacheline drm/i915/gvt: cleanup some too chatty scheduler message
2017-04-28drm/i915: Reset ILK during GEM sanitizationJoonas Lahtinen1-3/+2
ILK should survive a reset without display corruption. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-28drm/i915: Eliminate HAS_HW_CONTEXTSJoonas Lahtinen1-1/+1
HAS_HW_CONTEXTS is misleading condition for GPU reset and CCID, replace it with Gen specific (to be updated in next patches). HAS_HW_CONTEXTS in i915_l3_write is bogus because each HAS_L3_DPF match also has .has_hw_contexts = 1 set. This leads to us being able to get rid of the property completely. v2: - Keep the checks at Gen6 for no functional change (Ville) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
2017-04-26drm/i915: Use the right mapping_gfp_mask for final shmem allocationChris Wilson1-1/+1
Many sightings report the greater prevalence of allocation failures. This is all due to the incorrect use of mapping_gfp_constraint(), so remove it in favour of just querying the mapping_gfp_mask() which are the exact gfp_t we wanted in the first place. We still do expect a higher chance of reporting ENOMEM, as that is the intention of using __GFP_NORETRY -- to fail rather than oom after having reclaimed from our bo caches, and having done a direct|kswapd reclaim pass. Reported-by: Jason Ekstrand <jason.ekstrand@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100594 Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20170405221514.23251-1-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit b268d9fe0f10544f5f7a1b7015e2b97075e6215d) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-04-23Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcuIngo Molnar1-1/+1
Pull RCU updates from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Parallelize SRCU callback handling (plus overlapping patches). Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-19Merge tag 'v4.11-rc7' into drm-nextDave Airlie1-0/+2
Backmerge Linux 4.11-rc7 from Linus tree, to fix some conflicts that were causing problems with the rerere cache in drm-tip.