linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-11-19	drm/i915: Create a dummy object for gen6 ppgtt	Maarten Lankhorst	4	-72/+100
	We currently have to special case vma->obj being NULL because of gen6 ppgtt and mock_engine. Fix gen6 ppgtt, so we may soon be able to remove a few checks. As the object only exists as a fake object pointing to ggtt, we have no backing storage, so no real object is created. It just has to look real enough. Also kill pin_mutex, it's not compatible with ww locking, and we can use the vm lock instead. v2: - Drop IS_SHRINKABLE and shorten overly long line v3: - Checkpatch fix for alignment Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211117142024.1043017-2-matthew.auld@intel.com
2021-11-19	drm/i915: move the pre_pin earlier	Matthew Auld	1	-6/+6
	In intel_context_do_pin_ww, when calling into the pre_pin hook(which is passed the ww context) it could in theory return -EDEADLK(which is very likely with debug kernels), once we start adding more ww locking in there, like in the next patch. If so then we need to be mindful of having to restart the do_pin at this point. If this is the kernel_context, or some other early in-kernel context where we have yet to setup the default_state, then we always inhibit the context restore, and instead rely on the delayed active_release to set the CONTEXT_VALID_BIT for us(if we even care), which should indicate that we have context switched away, and that our newly saved context state should now be valid. However, since we currently grab the active reference before the potential ww dance, we can end up setting the CONTEXT_VALID_BIT much too early, if we need to backoff, and then upon re-trying the do_pin, we could potentially cause the hardware to incorrectly load some garbage context state when later context switching to that context, but at the very least this will trigger the GEM_BUG_ON() in __engine_unpark. For now let's just move any ww dance stuff prior to arming the active reference. For normal user contexts this shouldn't be a concern, since we should already have the default_state ready when initialising the lrc state, and so there should be no concern with active_release somehow prematurely setting the CONTEXT_VALID_BIT. v2(Thomas): - Also re-order the onion unwind Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211117142024.1043017-1-matthew.auld@intel.com
2021-11-16	drm/i915/guc: fix NULL vs IS_ERR() checking	Dan Carpenter	1	-2/+2
	The intel_engine_create_virtual() function does not return NULL. It returns error pointers. Fixes: e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211116114916.GB11936@kili
2021-11-16	drm/i915: Skip error capture when wedged on init	Tvrtko Ursulin	2	-0/+4
	Trying to capture uninitialised engines when we wedged on init ends in tears. Skip that together with uC capture, since failure to initialise the latter can actually be one of the reasons for wedging on init. v2: * Use i915_disable_error_state when wedging on init/fini. v3: * Handle mock tests. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> # v1 Link: https://patchwork.freedesktop.org/patch/msgid/20211111130634.266098-1-tvrtko.ursulin@linux.intel.com
2021-11-15	drm/i915: Don't read query SSEU for non-existent slice 0 on old platforms	Matt Roper	1	-2/+9
	Pre-HSW platforms don't use the gt SSEU structures; this means that calling intel_sseu_get_subslices() on slice 0 for these platforms will trip a GEM_BUG_ON(slice >= sseu->max_slices) warning. Let's move the DSS lookup for a DG2 workaround into a helper function that will only get called after we've already decided that we're on a DG2 platform. Fixes: 645cc0b9d972 ("drm/i915/dg2: Add initial gt/ctx/engine workarounds") Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211112160107.1593906-1-matthew.d.roper@intel.com
2021-11-12	drm/i915/guc/slpc: Check GuC status before freq boost	Vinay Belgaumkar	1	-0/+4
	It's possible that i915 might get wedged between a boost and un-boost. Validate the i915-GuC connection before trying to send a H2G to change the min frequency. Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/4464 Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211112071016.9640-1-vinay.belgaumkar@intel.com
2021-11-11	drm/i915/dg2: Program recommended HW settings	Matt Roper	2	-1/+34
	The bspec's performance guide suggests programming specific values into a few registers for optimal performance. Although these aren't workarounds, it's easiest to handle them inside the GT workaround functions (which will also ensure that the values set here are properly melded with other bits in the same registers that _are_ set by workarounds). Bspec: 68331, 45395 Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Siddiqui Ayaz A <ayaz.siddiqui@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211102222511.534310-4-matthew.d.roper@intel.com
2021-11-11	drm/i915/dg2: Add initial gt/ctx/engine workarounds	Matt Roper	3	-21/+372
	Bspec: 54077,68173,54833 Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211102222511.534310-3-matthew.d.roper@intel.com
2021-11-11	drm/i915/xehpsdv: Add initial workarounds	Stuart Summers	3	-13/+146
	Add the initial set of workarounds for Xe_HP SDV. There are some additional workarounds specific to the compute engines that we're holding back for now. Those will be added later, after general compute engine support lands. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211102222511.534310-2-matthew.d.roper@intel.com
2021-11-11	drm/i915/ttm: Fix illegal addition to shrinker list	Thomas Hellström	1	-10/+21
	There's a small window of opportunity during which the adjust_lru() function can be called with a GEM refcount of zero from the TTM eviction code. This results in a kernel BUG(). Ensure that we don't attempt to modify the GEM shrinker lists unless we have a GEM refcount. Fixes: ebd4a8ec7799 ("drm/i915/ttm: move shrinker management into adjust_lru") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211110085527.1033475-1-thomas.hellstrom@linux.intel.com
2021-11-10	drm/i915: split general MMIO setup from per-GT uncore init	Daniele Ceraolo Spurio	3	-15/+13
	In coming patches we'll be doing the actual tile initialization between these two uncore init phases. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211029032817.3747750-3-matthew.d.roper@intel.com
2021-11-10	drm/i915: rework some irq functions to take intel_gt as argument	Paulo Zanoni	1	-11/+15
	We'll be adding multi-tile support soon; on multi-tile platforms interrupts are per-tile and every tile has the full set of interrupt registers. In this commit we start passing intel_gt instead of dev_priv for the functions that are related to Xe_HP irq handling. Right now we're still passing tile 0 everywhere, but in later patches we'll start actually passing the correct tile. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Co-authored-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211029032817.3747750-2-matthew.d.roper@intel.com
2021-11-10	drm/i915/selftests: Use clear_and_wake_up_bit() for the per-engine reset bitlocks	Thomas Hellström	2	-9/+13
	Some selftests assume that nothing will attempt to grab these bitlocks while they are held by the selftests. With GuC, for example, that is not true because the hanging workloads may cause the GuC code to attempt to grab them for a global reset, and that may cause it to end up sleeping on the bit never waking up. Regardless whether that will be the final solution for GuC, use clear_and_wake_up_bit() pending a more thorough investigation on how this should be handled moving forward. To be clear this needs to be a temporary solution. If we can't find an in-kernel locking primitive to use here, we should at the very least add lockdep annotation to these bitlocks with a thorough explanation as to why we need to use bits. v3: - Use GEM_BUG_ON(test_and_set_bit()) rather than set_bit() to verify the assumption that nothing is holding the reset locks when we attempt to grab them. (Chris Wilson) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211105150146.834052-1-thomas.hellstrom@linux.intel.com
2021-11-10	drm/i915/gem: Fix gem_madvise for ttm+shmem objects	Thomas Hellström	1	-1/+2
	Gem-TTM objects that are backed by shmem might have populated page-vectors without having the GEM pages set. Those objects aren't moved to the correct shrinker / purge list by gem_madvise. For such objects, identified by having the _SELF_MANAGED_SHRINK_LIST set, make sure they end up on the correct list. v2: - Revert a change that made swapped-out objects inaccessible for truncating. (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211108123637.929617-1-thomas.hellstrom@linux.intel.com
2021-11-09	drm/i915/guc: Refcount context during error capture	John Harrison	1	-0/+14
	When i915 receives a context reset notification from GuC, it triggers an error capture before resetting any outstanding requsts of that context. Unfortunately, the error capture is not a time bound operation. In certain situations it can take a long time, particularly when multiple large LMEM buffers must be read back and eoncoded. If this delay is longer than other timeouts (heartbeat, test recovery, etc.) then a full GT reset can be triggered in the middle. That can result in the context being reset by GuC actually being destroyed before the error capture completes and the GuC submission code resumes. Thus, the GuC side can start dereferencing stale pointers and Bad Things ensue. So add a refcount get of the context during the entire reset operation. That way, the context can't be destroyed part way through no matter what other resets or user interactions occur. v2: (Matthew Brost) - Update patch to work with async error capture v3: (Matthew Brost) - Drop async capture support as that hasn't landed yet Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211108164054.23588-1-matthew.brost@intel.com
2021-11-09	drm/i915/resets: Don't set / test for per-engine reset bits with GuC submission	Matthew Brost	1	-10/+17
	Don't set, test for, or clear per-engine reset bits with GuC submission as the GuC owns the per engine resets not the i915. Setting, testing for, and clearing these bits is causing issues with the hangcheck selftest. Rather than change to test to not use these bits, rip the use of these bits out from the reset code. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028224224.32693-1-matthew.brost@intel.com
2021-11-05	drm/i915/selftests: Rework context handling in hugepages selftests	Maarten Lankhorst	1	-48/+80
	In the next commit, we don't evict when refcount = 0, so we need to call drain freed objects, because we want to pin new bo's in the same place, causing a test failure. Furthermore, since each subtest is separated, it's a lot better to use i915_live_selftests, so each subtest starts with a clean slate, and a clean address space. v2(Reported-by: kernel test robot <lkp@intel.com>): - Make hugepage_ctx static. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028125855.3281674-9-matthew.auld@intel.com
2021-11-05	drm/i915: Remove gen6_ppgtt_unpin_all	Maarten Lankhorst	2	-12/+0
	gen6_ppgtt_unpin_all is unused, kill it. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028125855.3281674-3-matthew.auld@intel.com
2021-11-05	drm/i915/ttm: Failsafe migration blits	Thomas Hellström	3	-56/+298
	If the initial fill blit or copy blit of an object fails, the old content of the data might be exposed and read as soon as either CPU- or GPU PTEs are set up to point at the pages. Intercept the blit fence with an async callback that checks the blit fence for errors and if there are errors performs an async cpu blit instead. If there is a failure to allocate the async dma_fence_work, allocate it on the stack and sync wait for the blit to complete. Add selftests that simulate gpu blit failures and failure to allocate the async dma_fence_work. A previous version of this pach used dma_fence_work, now that's opencoded which adds more code but might lower the latency somewhat in the common non-error case. v3: - Style fixes (Matthew Auld) v4: - Use "#if IS_ENABLED()" instead of #ifdef (Matthew Auld) v5: - Fix an issue where we, if the dependency was already signaled, might end up waiting for a memcpy fence that would never signal. v6: - Add a missing i915_ttm_memcpy_release() (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211104110718.688420-3-thomas.hellstrom@linux.intel.com
2021-11-05	drm/i915/ttm: Reorganize the ttm move code	Thomas Hellström	5	-280/+430
	We are about to introduce failsafe- and asynchronous migration and ttm moves. This will add complexity and code to the TTM move code so it makes sense to split it out to a separate file to make the i915 TTM code easer to digest. Split the i915 TTM move code out and since we will have to change the name of the gpu_binds_iomem() and cpu_maps_iomem() functions anyway, we alter the name of gpu_binds_iomem() to i915_ttm_gtt_binds_lmem() which is more reflecting what it is used for. With this we also add some more documentation. Otherwise there should be no functional change. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211104110718.688420-2-thomas.hellstrom@linux.intel.com
2021-11-04	drm/i915: fixup dma_fence_wait usage	Matthew Auld	1	-1/+1
	dma_fence_wait expects a boolean for whether it should be interruptible, not a timeout value. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211102155055.100138-1-matthew.auld@intel.com
2021-11-03	drm/i915/guc/slpc: Update boost sysfs hooks for SLPC	Vinay Belgaumkar	5	-16/+82
	Add a helper to sort through the SLPC/RPS paths of get/set methods. Boost frequency will be modified as long as it is within the constraints of RP0 and if it is different from the existing one. We will set min freq to boost only if there is at least one active waiter. v2: Add num_boosts to guc_slpc_info and changes for worker function v3: Review comments (Ashutosh) Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211102012608.8609-4-vinay.belgaumkar@intel.com
2021-11-03	drm/i915/guc/slpc: Add waitboost functionality for SLPC	Vinay Belgaumkar	5	-1/+48
	Add helper in RPS code for handling SLPC and non-SLPC paths. When boost is requested in the SLPC path, we can ask GuC to ramp up the frequency req by setting the minimum frequency to boost freq. Reset freq back to the min softlimit when there are no more waiters. v2: Schedule a worker thread which can boost freq from within an interrupt context as well. v3: No need to check against requested freq before scheduling boost work (Ashutosh) Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211102012608.8609-3-vinay.belgaumkar@intel.com
2021-11-03	drm/i915/guc/slpc: Define and initialize boost frequency	Vinay Belgaumkar	3	-23/+93
	Define helpers and struct members required to record boost info. Boost frequency is initialized to RP0 at SLPC init. Also define num_waiters which can track the pending boost requests. Boost will be done by scheduling a worker thread. This will avoid the need to make H2G calls inside an interrupt context. Initialize the worker function during SLPC init as well. Had to move intel_guc_slpc_init a few lines below to accommodate this. v2: Add a workqueue to handle waitboost v3: Code review comments (Ashutosh) Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211102012608.8609-2-vinay.belgaumkar@intel.com
2021-11-02	drm/i915: Rename GT_STEP to GRAPHICS_STEP	José Roberto de Souza	8	-52/+51
	As now graphics and media can have different steppings this patch is renaming all _GT_STEP macros to _GRAPHICS_STEP. Future platforms will properly choose between _MEDIA_STEP and _GRAPHICS_STEP for each new workaround. Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Caz Yokoyama <caz.yokoyama@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211020002353.193893-3-jose.souza@intel.com
2021-11-02	drm/i915: Track media IP stepping separated from GT	José Roberto de Souza	3	-34/+41
	Graphics and media IPs can have different stepping so a new field is needed in intel_step_info. The next patch will take care of rename gt_step to graphics_step. Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211020002353.193893-2-jose.souza@intel.com
2021-11-02	drm/i915: Add struct to hold IP version	José Roberto de Souza	6	-28/+37
	Adding a structure to standardize access to IP versioning as future platforms will have this information populated at runtime. The constant platform display version is not using this new struct but the runtime variant will definitely use it. Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Caz Yokoyama <caz.yokoyama@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211020002353.193893-1-jose.souza@intel.com
2021-11-02	drm/i915/dmabuf: drop the flush on discrete	Matthew Auld	1	-2/+13
	We were overzealous here; even though discrete is non-LLC, it should still be always coherent. v2(Thomas & Daniel) - Be extra cautious and limit to DG1 - Add some more commentary Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211029122137.3484203-1-matthew.auld@intel.com
2021-11-02	drm/i915: stop setting cache_dirty on discrete	Matthew Auld	4	-3/+22
	Should not be needed. Even with non-coherent display, we should be using device local-memory there, and not system memory. v2: also add a warning in i915_gem_clflush_object Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-4-matthew.auld@intel.com
2021-11-02	drm/i915: move cpu_write_needs_clflush	Matthew Auld	3	-14/+15
	Move it next to its partner in crime; gpu_write_needs_clflush. For better readability lets keep gpu vs cpu at least in the same file. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-3-matthew.auld@intel.com
2021-11-02	drm/i915/clflush: disallow on discrete	Matthew Auld	1	-2/+3
	We seem to have an unfortunate issue where we arrive from: i915_gem_object_flush_if_display+0x86/0xd0 [i915] intel_user_framebuffer_dirty+0x1a/0x50 [i915] drm_mode_dirtyfb_ioctl+0xfb/0x1b0 which can be before the pages are populated(and pinned for display), and so i915_gem_object_has_struct_page() might still return true, as per the ttm backend. We could re-order the later get_pages() call here, but since on discrete everything should already be coherent, with the exception of the display engine, and even there display surfaces must be allocated in device local-memory anyway, so there should in theory be no conceivable reason to ever call i915_gem_clflush_object() on discrete. References: https://gitlab.freedesktop.org/drm/intel/-/issues/4320 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-2-matthew.auld@intel.com
2021-11-02	drm/i915/clflush: fixup handling of cache_dirty	Matthew Auld	1	-1/+9
	In theory if clflush_work_create() somehow fails here, and we don't yet have mm.pages populated then we end up resetting cache_dirty, which is likely wrong, since that will potentially skip the flush-on-acquire, if it was needed. It looks like intel_user_framebuffer_dirty() can arrive here before the pages are populated. v2(Thomas): - Move setting cache_dirty out of the async portion, also add a comment for why that should still be safe. v3: - Add Thomas' irc r-b Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-1-matthew.auld@intel.com
2021-11-01	drm/i915: Introduce refcounted sg-tables	Thomas Hellström	9	-147/+277
	As we start to introduce asynchronous failsafe object migration, where we update the object state and then submit asynchronous commands we need to record what memory resources are actually used by various part of the command stream. Initially for three purposes: 1) Error capture. 2) Asynchronous migration error recovery. 3) Asynchronous vma bind. At the time where these happens, the object state may have been updated to be several migrations ahead and object sg-tables discarded. In order to make it possible to keep sg-tables with memory resource information for these operations, introduce refcounted sg-tables that aren't freed until the last user is done with them. The alternative would be to reference information sitting on the corresponding ttm_resources which typically have the same lifetime as these refcountes sg_tables, but that leads to other awkward constructs: Due to the design direction chosen for ttm resource managers that would lead to diamond-style inheritance, the LMEM resources may sometimes be prematurely freed, and finally the subclassed struct ttm_resource would have to bleed into the asynchronous vma bind code. v3: - Address a number of style issues (Matthew Auld) v4: - Dont check for st->sgl being NULL in i915_ttm_tt__shmem_unpopulate(), that should never happen. (Matthew Auld) v5: - Fix a Potential double-free (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211101122444.114607-1-thomas.hellstrom@linux.intel.com
2021-11-01	drm/i915: Enable WaProgramMgsrForCorrectSliceSpecificMmioReads for Gen9	Cooper Chiou	1	-0/+41
	This implements WaProgramMgsrForCorrectSliceSpecificMmioReads which was omitted by mistake from Gen9 documentation, while it is actually applicable to fused off parts. Workaround consists of making sure MCR packet control register is programmed to point to enabled slice/subslice pair before doing any MMIO reads from the affected registers. Failure do to this can result in complete system hangs when running certain workloads. Two known cases which can cause system hangs are: 1. "test_basic progvar_prog_scope_uninit" test which is part of Khronos OpenCL conformance suite (https://github.com/KhronosGroup/OpenCL-CTS) with the Intel OpenCL driver (https://github.com/intel/compute-runtime). 2. VP8 media hardware encoding using the full-feature build of the Intel media-driver (https://github.com/intel/media-driver) and ffmpeg. For the former case patch was verified to fix the hard system hang when executing the OCL test on Intel Pentium CPU 6405U which contains fused off GT1 graphics. Reference: HSD#1508045018,1405586840, BSID#0575 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: William Tseng <william.tseng@intel.com> Cc: Shawn C Lee <shawn.c.lee@intel.com> Cc: Pawel Wilma <pawel.wilma@intel.com> Signed-off-by: Cooper Chiou <cooper.chiou@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211025042623.3876-1-cooper.chiou@intel.com
2021-10-29	drm/i915: Remove some dead struct fwd decl from i915_drv.h	Daniel Vetter	1	-2/+0
	Gone with userptr rewrite by Maarten in ed29c2691188 ("drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.") Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211022082200.2684194-1-daniel.vetter@ffwll.ch
2021-10-29	drm/i915/gtt: stop caching the scratch page	Matthew Auld	1	-2/+2
	Normal users shouldn't be hitting this, likely this would indicate a userspace bug. So don't bother caching, which should be safe now that we manually flush the page. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-2-matthew.auld@intel.com
2021-10-29	drm/i915/gtt: flush the scratch page	Matthew Auld	1	-0/+3
	The scratch page is directly visible in the users address space, and while this is forced as CACHE_LLC, by the kernel, we still have to contend with things like "Bypass-LLC" MOCS. So just flush no matter what. v2(Thomas): - Make sure we use drm_clflush_virt_range here, in case clflush support is missing. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-1-matthew.auld@intel.com
2021-10-28	drm/i915/pmu: Connect engine busyness stats from GuC to pmu	Umesh Nerlige Ramappa	13	-28/+453
	With GuC handling scheduling, i915 is not aware of the time that a context is scheduled in and out of the engine. Since i915 pmu relies on this info to provide engine busyness to the user, GuC shares this info with i915 for all engines using shared memory. For each engine, this info contains: - total busyness: total time that the context was running (total) - id: id of the running context (id) - start timestamp: timestamp when the context started running (start) At the time (now) of sampling the engine busyness, if the id is valid (!= ~0), and start is non-zero, then the context is considered to be active and the engine busyness is calculated using the below equation engine busyness = total + (now - start) All times are obtained from the gt clock base. For inactive contexts, engine busyness is just equal to the total. The start and total values provided by GuC are 32 bits and wrap around in a few minutes. Since perf pmu provides busyness as 64 bit monotonically increasing values, there is a need for this implementation to account for overflows and extend the time to 64 bits before returning busyness to the user. In order to do that, a worker runs periodically at frequency = 1/8th the time it takes for the timestamp to wrap. As an example, that would be once in 27 seconds for a gt clock frequency of 19.2 MHz. Note: There might be an over-accounting of busyness due to the fact that GuC may be updating the total and start values while kmd is reading them. (i.e kmd may read the updated total and the stale start). In such a case, user may see higher busyness value followed by smaller ones which would eventually catch up to the higher value. v2: (Tvrtko) - Include details in commit message - Move intel engine busyness function into execlist code - Use union inside engine->stats - Use natural type for ping delay jiffies - Drop active_work condition checks - Use for_each_engine if iterating all engines - Drop seq locking, use spinlock at GuC level to update engine stats - Document worker specific details v3: (Tvrtko/Umesh) - Demarcate GuC and execlist stat objects with comments - Document known over-accounting issue in commit - Provide a consistent view of GuC state - Add hooks to gt park/unpark for GuC busyness - Stop/start worker in gt park/unpark path - Drop inline - Move spinlock and worker inits to GuC initialization - Drop helpers that are called only once v4: (Tvrtko/Matt/Umesh) - Drop addressed opens from commit message - Get runtime pm in ping, remove from the park path - Use cancel_delayed_work_sync in disable_submission path - Update stats during reset prepare - Skip ping if reset in progress - Explicitly name execlists and GuC stats objects - Since disable_submission is called from many places, move resetting stats to intel_guc_submission_reset_prepare v5: (Tvrtko) - Add a trylock helper that does not sleep and synchronize PMU event callbacks and worker with gt reset v6: (CI BAT failures) - DUTs using execlist submission failed to boot since __gt_unpark is called during i915 load. This ends up calling the GuC busyness unpark hook and results in kick-starting an uninitialized worker. Let park/unpark hooks check if GuC submission has been initialized. - drop cant_sleep() from trylock helper since rcu_read_lock takes care of that. v7: (CI) Fix igt@i915_selftest@live@gt_engines - For GuC mode of submission the engine busyness is derived from gt time domain. Use gt time elapsed as reference in the selftest. - Increase busyness calculation to 10ms duration to ensure batch runs longer and falls within the busyness tolerances in selftest. v8: - Use ktime_get in selftest as before - intel_reset_trylock_no_wait results in a lockdep splat that is not trivial to fix since the PMU callback runs in irq context and the reset paths are tightly knit into the driver. The test that uncovers this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait, instead use the reset_count to synchronize with gt reset during pmu callback. For the ping, continue to use intel_reset_trylock since ping is not run in irq context. - GuC PM timestamp does not tick when GuC is idle. This can potentially result in wrong busyness values when a context is active on the engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to process the GuC busyness stats. This works since both GuC timestamp and RING timestamp are synced with the same clock. - The busyness stats may get updated after the batch starts running. This delay causes the busyness reported for 100us duration to fall below 95% in the selftest. The only option at this time is to wait for GuC busyness to change from idle to active before we sample busyness over a 100us period. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
2021-10-28	drm/i915/pmu: Add a name to the execlists stats	Umesh Nerlige Ramappa	3	-46/+53
	In preparation for GuC pmu stats, add a name to the execlists stats structure so that it can be differentiated from the GuC stats. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-1-umesh.nerlige.ramappa@intel.com
2021-10-27	drm/i915: Revert 'guc_id' from i915_request tracepoint	Joonas Lahtinen	1	-5/+2
	Avoid adding backend specific data to the tracepoints outside of the LOW_LEVEL_TRACEPOINTS kernel config protection. These bits of information are bound to change depending on the selected submission method per platform and are not necessarily possible to maintain in the future. Fixes: dbf9da8d55ef ("drm/i915/guc: Add trace point for GuC submit") Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211027093255.66489-1-joonas.lahtinen@linux.intel.com
2021-10-25	drm/i915: Use ERR_CAST instead of ERR_PTR(PTR_ERR())	Wan Jiabing	1	-1/+1
	Fix following coccicheck warning: ./drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:3117:15-22: WARNING: ERR_CAST can be used with eb->requests[i]. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211025113316.24424-1-wanjiabing@vivo.com
2021-10-25	drm/i915/selftests: Fix inconsistent IS_ERR and PTR_ERR	Kai Song	1	-2/+2
	Fix inconsistent IS_ERR and PTR_ERR in i915_gem_dmabuf.c Signed-off-by: Kai Song <songkai01@inspur.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211022120655.22173-1-songkai01@inspur.com
2021-10-22	drm/i915/guc: Fix recursive lock in GuC submission	Matthew Brost	1	-1/+2
	Use __release_guc_id (lock held) rather than release_guc_id (acquires lock), add lockdep annotations. 213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr [ 213.283459] ============================================ [ 213.283462] WARNING: possible recursive locking detected {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }} [ 213.283470] -------------------------------------------- [ 213.283472] kworker/u24:0/8 is trying to acquire lock: [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915] {{[ 213.283618] }} {{ but task is already holding lock:}} [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283720] }} {{ other info that might help us debug this:}} [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0 [ 213.283728] ---- [ 213.283730] lock(&guc->submission_state.lock); [ 213.283734] lock(&guc->submission_state.lock); {{[ 213.283737] }} {{ * DEADLOCK *}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8: [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550 [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915] {{[ 213.283860] }} {{ stack backtrace:}} [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18 [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915] [ 213.283957] Call Trace: [ 213.283960] dump_stack_lvl+0x57/0x72 [ 213.283966] __lock_acquire.cold+0x191/0x2d3 [ 213.283972] lock_acquire+0xb5/0x2b0 [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915] [ 213.284139] ? lock_release+0xb9/0x280 [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60 [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915] [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915] [ 213.284310] process_one_work+0x270/0x550 [ 213.284315] worker_thread+0x52/0x3b0 [ 213.284319] ? process_one_work+0x550/0x550 [ 213.284322] kthread+0x135/0x160 [ 213.284326] ? set_kthread_struct+0x40/0x40 [ 213.284331] ret_from_fork+0x1f/0x30 and a bit later in the trace: {{ 227.499864] do_raw_spin_lock+0x94/0xa0}} [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60 [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915] [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915] [ 227.500209] ? mark_held_locks+0x49/0x70 [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915] [ 227.500320] reset_prepare+0x78/0x90 [i915] [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915] [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915] [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915] [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915] [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915] [ 227.500767] i915_pci_probe+0x84/0x170 [i915] [ 227.500838] local_pci_probe+0x42/0x80 [ 227.500842] pci_device_probe+0xd9/0x190 [ 227.500844] really_probe+0x1f2/0x3f0 [ 227.500847] __driver_probe_device+0xfe/0x180 [ 227.500848] driver_probe_device+0x1e/0x90 [ 227.500850] __driver_attach+0xc4/0x1d0 [ 227.500851] ? __device_attach_driver+0xe0/0xe0 [ 227.500853] ? __device_attach_driver+0xe0/0xe0 [ 227.500854] bus_for_each_dev+0x64/0x90 [ 227.500856] bus_add_driver+0x12e/0x1f0 [ 227.500857] driver_register+0x8f/0xe0 [ 227.500859] i915_init+0x1d/0x8f [i915] [ 227.500934] ? 0xffffffffc144a000 [ 227.500936] do_one_initcall+0x58/0x2d0 [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80 [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0 [ 227.500944] do_init_module+0x5c/0x270 [ 227.500946] __do_sys_finit_module+0x95/0xe0 [ 227.500949] do_syscall_64+0x38/0x90 [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 227.500953] RIP: 0033:0x7ffa59d2ae0d [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48 [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004 [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070 [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90 [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0 v2: (CI build) - Fix build error Fixes: 1a52faed31311 ("drm/i915/guc: Take GT PM ref when deregistering context") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com
2021-10-22	drm/i915/selftests: Update live.evict to wait on requests / idle GPU after each loop	Matthew Brost	1	-0/+19
	Update live.evict to wait on last request and idle GPU after each loop. This not only enhances the test to fill the GGTT on each engine class but also avoid timeouts from igt_flush_test when using GuC submission. igt_flush_test (idle GPU) can take a long time with GuC submission if losts of contexts are created due to H2G / G2H required to destroy contexts. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211021214040.33292-1-matthew.brost@intel.com
2021-10-22	drm/i915/selftests: Increase timeout in requests perf selftest	Matthew Brost	1	-2/+2
	perf_parallel_engines is micro benchmark to test i915 request scheduling. The test creates a thread per physical engine and submits NOP requests and waits the requests to complete in a loop. In execlists mode this works perfectly fine as powerful CPU has enough cores to feed each engine and process the CSBs. With GuC submission the uC gets overwhelmed as all threads feed into a single CTB channel and the GuC gets bombarded with CSBs as contexts are immediately switched in and out on the engines due to the zero runtime of the requests. When the GuC is overwhelmed scheduling of contexts is unfair due to the nature of the GuC scheduling algorithm. This behavior is understood and deemed acceptable as this micro benchmark isn't close to real world use case. Increasing the timeout of wait period for requests to complete. This makes the test understand that is ok for contexts to get starved in this scenario. A future patch / cleanup may just delete these micro benchmark tests as they basically mean nothing. We care about real workloads not made up ones. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211011175704.28509-1-matthew.brost@intel.com
2021-10-22	drm/i915/ttm: enable shmem tt backend	Matthew Auld	1	-1/+2
	Turn on the shmem tt backend, and enable shrinking. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-8-matthew.auld@intel.com
2021-10-22	drm/i915/ttm: use cached system pages when evicting lmem	Matthew Auld	1	-4/+4
	This should let us do an accelerated copy directly to the shmem pages when temporarily moving lmem-only objects, where the i915-gem shrinker can later kick in to swap out the pages, if needed. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-7-matthew.auld@intel.com
2021-10-22	drm/i915/ttm: move shrinker management into adjust_lru	Matthew Auld	5	-22/+137
	We currently just evict lmem objects to system memory when under memory pressure. For this case we might lack the usual object mm.pages, which effectively hides the pages from the i915-gem shrinker, until we actually "attach" the TT to the object, or in the case of lmem-only objects it just gets migrated back to lmem when touched again. For all cases we can just adjust the i915 shrinker LRU each time we also adjust the TTM LRU. The two cases we care about are: 1) When something is moved by TTM, including when initially populating an object. Importantly this covers the case where TTM moves something from lmem <-> smem, outside of the normal get_pages() interface, which should still ensure the shmem pages underneath are reclaimable. 2) When calling into i915_gem_object_unlock(). The unlock should ensure the object is removed from the shinker LRU, if it was indeed swapped out, or just purged, when the shrinker drops the object lock. v2(Thomas): - Handle managing the shrinker LRU in adjust_lru, where it is always safe to touch the object. v3(Thomas): - Pretty much a re-write. This time piggy back off the shrink_pin stuff, which actually seems to fit quite well for what we want here. v4(Thomas): - Just use a simple boolean for tracking ttm_shrinkable. v5: - Ensure we call adjust_lru when faulting the object, to ensure the pages are visible to the shrinker, if needed. - Add back the adjust_lru when in i915_ttm_move (Thomas) v6(Reported-by: kernel test robot <lkp@intel.com>): - Remove unused i915_tt Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #v4 Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-6-matthew.auld@intel.com
2021-10-22	drm/i915: add some kernel-doc for shrink_pin and friends	Matthew Auld	2	-1/+54
	Attempt to document shrink_pin and the other relevant interfaces that interact with it, before we start messing with it. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-5-matthew.auld@intel.com
2021-10-22	drm/i915: drop unneeded make_unshrinkable in free_object	Matthew Auld	1	-9/+0
	The comment here is no longer accurate, since the current shrinker code requires a full ref before touching any objects. Also unset_pages() should already do the required make_unshrinkable() for us, if needed, which is also nicely balanced with set_pages(). Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211018091055.1998191-4-matthew.auld@intel.com