Age | Commit message (Collapse) | Author | Files | Lines |
|
While the uabi_engines_llist is populated in intel_engines_init() during
driver load, the corresponding function intel_engines_release() does not
correctly get rid of it. This can lead to a UAF if, after failed
initialization (for example when gt is set wedged on init), we try to
access the engines.
Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Krzysztof Niemiec <krzysztof.niemiec@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240801154047.115176-2-krzysztof.niemiec@intel.com
|
|
Wa_14019789679 implementation for MTL, ARL and DG2.
v2: Corrected condition
v3:
- Fix indentation (Jani Nikula)
- dword size should be 0x1 and
initialize dword to 0 instead of MI_NOOP (Tejas)
- Use IS_GFX_GT_IP_RANGE() (Tejas)
v4:
- 3DSTATE_MESH_CONTROL instruction is 3 dwords long
Align with dword size. (Roper, Matthew D)
- Add RCS engine check. (Tejas)
Bspec: 47083
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240731155614.3460645-1-nitin.r.gote@intel.com
|
|
There is a new part to an existing workaround, so enable that piece as
well.
v2: Extend even further.
v3: Drop DG2 as there are CI failures still to resolve. Also re-order
the parameters to a function to reduce excessive line wrapping.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240622004636.662081-3-John.C.Harrison@Intel.com
|
|
The context switch out workaround also applies to ARL.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240622004636.662081-2-John.C.Harrison@Intel.com
|
|
Prevent a NULL pointer access in intel_memory_regions_hw_probe.
Fixes: 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup warning")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11704
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240712214156.3969584-1-jonathan.cavitt@intel.com
|
|
We're seeing a GPU hang issue on a CHV platform, which was caused by commit
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for
Gen8").
The Gen8 platform only supports timeslicing and doesn't have a preemption
mechanism, as its engines do not have a preemption timer.
Commit 751f82b353a6 ("drm/i915/gt: Only disable preemption on Gen8 render
engines") addressed this issue only for render engines. This patch extends
that fix by ensuring that preemption is not considered for all engines on
Gen8 platforms.
v4:
- Use the correct Fixes tag (Rodrigo Vivi)
- Reworded commit log (Andi Shyti)
v3:
- Inside need_preempt(), condition of can_preempt() is not required
as simplified can_preempt() is enough. (Chris Wilson)
v2: Simplify can_preempt() function (Tvrtko Ursulin)
Fixes: 751f82b353a6 ("drm/i915/gt: Only disable preemption on gen8 render engines")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti <andi.shyti@intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
CC: <stable@vger.kernel.org> # v5.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240711163208.1355736-1-nitin.r.gote@intel.com
|
|
PWR_CLK_STATE only needs to be modified up until gen11. For gen12 this
code is not applicable. Remove code to update context image with
PWR_CLK_STATE for gen12.
Fixes: 00a7f0d7155c ("drm/i915/tgl: Add perf support on TGL")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240629005643.3050678-1-umesh.nerlige.ramappa@intel.com
|
|
We report object allocation failures to userspace with ENOMEM
so add __GFP_NOWARN to remove superfluous oom warnings.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/4936
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240626143318.11600-1-nirmoy.das@intel.com
|
|
Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
warning") adds a debug message where the "lmem_size" and
"dsm_base" variables are printed using the %lli identifier.
However, these variables are defined as resource_size_t, which
are unsigned long for 32-bit machines and unsigned long long for
64-bit machines.
The documentation (core-api/printk-formats.rst) recommends using
the %pa specifier for printing addresses and sizes of resources.
Replace %lli with %pa.
This patch also mutes the following sparse warning when compiling
with:
make W=1 ARCH=i386 drivers/gpu/drm/i915
>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c:941:5: error: format '%lli'
expects argument of type 'long long int', but argument 5 has type
'resource_size_t' {aka 'unsigned int'} [-Werror=format=]
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240617184243.330231-3-andi.shyti@linux.intel.com
|
|
Commit 05da7d9f717b ("drm/i915/gem: Downgrade stolen lmem setup
warning") returns '0' from i915_gem_stolen_lmem_setup(), but it's
supposed to return a pointer to the intel_memory_region
structure.
Sparse complains with the following message:
>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c:943:32: sparse: sparse:
Using plain integer as NULL pointer
Return NULL.
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240617184243.330231-2-andi.shyti@linux.intel.com
|
|
CI has been sporadically reporting the following issue triggered by
igt@i915_selftest@live@hangcheck on ADL-P and similar machines:
<6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence
...
<6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
<6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
<3> [414.070354] Unable to pin Y-tiled fence; err:-4
<3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active))
...
<4>[ 609.603992] ------------[ cut here ]------------
<2>[ 609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301!
<4>[ 609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4>[ 609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G U W 6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1
<4>[ 609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
<4>[ 609.604010] Workqueue: i915 __i915_gem_free_work [i915]
<4>[ 609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915]
...
<4>[ 609.604271] Call Trace:
<4>[ 609.604273] <TASK>
...
<4>[ 609.604716] __i915_vma_evict+0x2e9/0x550 [i915]
<4>[ 609.604852] __i915_vma_unbind+0x7c/0x160 [i915]
<4>[ 609.604977] force_unbind+0x24/0xa0 [i915]
<4>[ 609.605098] i915_vma_destroy+0x2f/0xa0 [i915]
<4>[ 609.605210] __i915_gem_object_pages_fini+0x51/0x2f0 [i915]
<4>[ 609.605330] __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915]
<4>[ 609.605440] process_scheduled_works+0x351/0x690
...
In the past, there were similar failures reported by CI from other IGT
tests, observed on other platforms.
Before commit 63baf4f3d587 ("drm/i915/gt: Only wait for GPU activity
before unbinding a GGTT fence"), i915_vma_revoke_fence() was waiting for
idleness of vma->active via fence_update(). That commit introduced
vma->fence->active in order for the fence_update() to be able to wait
selectively on that one instead of vma->active since only idleness of
fence registers was needed. But then, another commit 0d86ee35097a
("drm/i915/gt: Make fence revocation unequivocal") replaced the call to
fence_update() in i915_vma_revoke_fence() with only fence_write(), and
also added that GEM_BUG_ON(!i915_active_is_idle(&fence->active)) in front.
No justification was provided on why we might then expect idleness of
vma->fence->active without first waiting on it.
The issue can be potentially caused by a race among revocation of fence
registers on one side and sequential execution of signal callbacks invoked
on completion of a request that was using them on the other, still
processed in parallel to revocation of those fence registers. Fix it by
waiting for idleness of vma->fence->active in i915_vma_revoke_fence().
Fixes: 0d86ee35097a ("drm/i915/gt: Make fence revocation unequivocal")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/10021
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: stable@vger.kernel.org # v5.8+
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240603195446.297690-2-janusz.krzysztofik@linux.intel.com
|
|
The ce->guc_state.lock was made to protect guc_prio, which
indicates the GuC priority level.
But at the begnning of the function we perform some sanity check
of guc_prio outside its protected section. Move them within the
locked region.
Use this occasion to expand the if statement to make it clearer.
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240613222402.551625-2-andi.shyti@linux.intel.com
|
|
Replace "dynmically" with "dynamically".
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240613222837.552277-1-andi.shyti@linux.intel.com
|
|
In the case where lmem_size < dsm_base, hardware is reporting that
stolen lmem is unusable. In this case, instead of throwing a warning,
we can continue execution as normal by disabling stolen LMEM support.
For example, this change will allow the following error report from
ATS-M to no longer apply:
<6> [144.859887] pcieport 0000:4b:00.0: bridge window [mem 0xb1000000-0xb11fffff]
<6> [144.859900] pcieport 0000:4b:00.0: bridge window [mem 0x3bbc00000000-0x3bbc17ffffff 64bit pref]
<6> [144.859917] pcieport 0000:4c:01.0: PCI bridge to [bus 4d-4e]
<6> [144.859932] pcieport 0000:4c:01.0: bridge window [mem 0xb1000000-0xb11fffff]
<6> [144.859945] pcieport 0000:4c:01.0: bridge window [mem 0x3bbc00000000-0x3bbc17ffffff 64bit pref]
<6> [144.859984] i915 0000:4d:00.0: [drm] BAR2 resized to 256M
<6> [144.860640] i915 0000:4d:00.0: [drm] Using a reduced BAR size of 256MiB. Consider enabling 'Resizable BAR' or similar, if available in the BIOS.
<4> [144.860719] -----------[ cut here ]-----------
<4> [144.860727] WARNING: CPU: 17 PID: 1815 at drivers/gpu/drm/i915/gem/i915_gem_stolen.c:939 i915_gem_stolen_lmem_setup+0x38c/0x430 [i915]
<4> [144.861430] Modules linked in: i915 snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem drm_shmem_helper prime_numbers i2c_algo_bit ttm video drm_display_helper drm_buddy fuse x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe mdio irqbypass ptp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pps_core i2c_i801 mei_me i2c_smbus mei wmi acpi_power_meter [last unloaded: i915]
<4> [144.861611] CPU: 17 PID: 1815 Comm: i915_module_loa Tainted: G U 6.8.0-rc5-drmtip_1515-g78f49af27723+ #1
<4> [144.861624] Hardware name: Intel Corporation WHITLEY/WHITLEY, BIOS SE5C6200.86B.0020.P41.2109300305 09/30/2021
<4> [144.861632] RIP: 0010:i915_gem_stolen_lmem_setup+0x38c/0x430 [i915]
<4> [144.862287] Code: ff 41 c1 e4 05 e9 ac fe ff ff 4d 63 e4 48 89 ef 48 85 ed 74 04 48 8b 7d 08 48 c7 c6 10 a3 7b a0 e8 e9 90 43 e1 e9 ee fd ff ff <0f> 0b 49 c7 c4 ed ff ff ff e9 e0 fd ff ff 0f b7 d2 48 c7 c6 00 d9
<4> [144.862299] RSP: 0018:ffffc90005607980 EFLAGS: 00010207
<4> [144.862315] RAX: fffffffffff00000 RBX: 0000000000000003 RCX: 0000000000000000
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10833
Suggested-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240422135959.4127003-1-jonathan.cavitt@intel.com
|
|
The forcewake count and domains listing is multi process critical
and the uncore provides a spinlock for such cases.
Lock the forcewake evaluation section in the fw_domains_show()
debugfs interface.
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240607145131.217251-1-andi.shyti@linux.intel.com
|
|
The WA should be extended to cover VDBOX engine. We found that
28-channels 1080p VP9 encoding may hit this issue.
v3: update the WA number and explain the reason why
this workaround is needed
v2: add WA number
v1: initial version
Signed-off-by: Angus Chen <angus.chen@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240426073638.45775-1-angus.chen@intel.com
|
|
Enable another workaround that is implemented inside the GuC.
v2: Use the correct Gen12 w/a id rather than the Xe version (review
feedback from Matthew R) also extend to include ARL.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240528230515.479395-1-John.C.Harrison@Intel.com
|
|
The test is trying to push the heartbeat frequency to the limit, which
might sometimes fail. Such a failure does not provide valuable
information, because it does not indicate that there is something
necessarily wrong with either the driver or the hardware.
Remove the test to prevent random, unnecessary failures from appearing
in CI.
Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Niemiec, Krzysztof <krzysztof.niemiec@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fe2vu5h7v7ooxbhwpbfsypxg5mjrnt56gc3cgrqpnhgrgce334@qfrv2skxrp47
|
|
Following the guidelines it takes 3 seconds to perform an FLR
reset. Let's give it a bit more slack because this time can
change depending on the platform and on the firmware
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240523235853.171796-1-andi.shyti@linux.intel.com
|
|
The whole point of the previous fixes has been to change the CCS
hardware configuration to generate only one stream available to
the compute users. We did this by changing the info.engine_mask
that is set during device probe, reset during the detection of
the fused engines, and finally reset again when choosing the CCS
mode.
We can't use the engine_mask variable anymore, as with the
current configuration, it imposes only one CCS no matter what the
hardware configuration is.
Before changing the engine_mask for the third time, save it and
use it for calculating the CCS mode.
After the previous changes, the user reported a performance drop
to around 1/4. We have tested that the compute operations, with
the current patch, have improved by the same factor.
Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Gnattu OC <gnattuoc@me.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Jian Ye <jian.ye@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Tested-by: Gnattu OC <gnattuoc@me.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240517090616.242529-1-andi.shyti@linux.intel.com
|
|
With gcc-7 and earlier, there are lots of warnings like
In file included from <command-line>:0:0:
In function '__guc_context_policy_add_priority.isra.66',
inlined from '__guc_context_set_prio.isra.67' at drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3292:3,
inlined from 'guc_context_set_prio' at drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3320:2:
include/linux/compiler_types.h:399:38: error: call to '__compiletime_assert_631' declared with attribute error: FIELD_PREP: mask is not constant
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
...
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:2422:3: note: in expansion of macro 'FIELD_PREP'
FIELD_PREP(GUC_KLV_0_KEY, GUC_CONTEXT_POLICIES_KLV_ID_##id) | \
^~~~~~~~~~
Make sure that GUC_KLV_0_KEY is an unsigned value to avoid the warning.
Fixes: 77b6f79df66e ("drm/i915/guc: Update to GuC version 69.0.3")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430164809.482131-1-julia.filipchuk@intel.com
|
|
When debugging GPU hangs Mesa developers are finding it useful to replay
the captured error state against the simulator. But due various simulator
limitations which prevent replicating all hangs, one step further is being
able to replay against a real GPU.
This is almost doable today with the missing part being able to upload the
captured context image into the driver state prior to executing the
uploaded hanging batch and all the buffers.
To enable this last part we add a new context parameter called
I915_CONTEXT_PARAM_CONTEXT_IMAGE. It follows the existing SSEU
configuration pattern of being able to select which context to apply
against, paired with the actual image and its size.
Since this is adding a new concept of debug only uapi, we hide it behind
a new kconfig option and also require activation with a module parameter.
Together with a warning banner printed at driver load, all those combined
should be sufficient to guard against inadvertently enabling the feature.
In terms of implementation we allow the legacy context set param to be
used since that removes the need to record the per context data in the
proto context, while still allowing flexibility of specifying context
images for any context.
Mesa MR using the uapi can be seen at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27594
v2:
* Fix whitespace alignment as per checkpatch.
* Added warning on userspace misuse.
* Rebase for extracting ce->default_state shadowing.
v3:
* Rebase for I915_CONTEXT_PARAM_LOW_LATENCY.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Carlos Santa <carlos.santa@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tested-by: Carlos Santa <carlos.santa@intel.com>
Signed-off-by: Tvrtko Ursulin <tursulin@igalia.com>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514145939.87427-2-tursulin@igalia.com
|
|
To enable adding override of the default engine context image let us start
shadowing the per engine state in the context.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Carlos Santa <carlos.santa@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Tvrtko Ursulin <tursulin@igalia.com>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514145939.87427-1-tursulin@igalia.com
|
|
Allocate cleared blocks in the bias range when the DRM
buddy's clear avail is zero. This will validate the bias
range allocation in scenarios like system boot when no
cleared blocks are available and exercise the fallback
path too. The resulting blocks should always be dirty.
v1:(Matthew)
- move the size to the variable declaration section.
- move the mm.clear_avail init to allocator init.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514145636.16253-2-Arunpravin.PaneerSelvam@amd.com
|
|
Problem statement: During the system boot time, an application request
for the bulk volume of cleared range bias memory when the clear_avail
is zero, we dont fallback into normal allocation method as we had an
unnecessary clear_avail check which prevents the fallback method leads
to fb allocation failure following system goes into unresponsive state.
Solution: Remove the unnecessary clear_avail check in the range bias
allocation function.
v2: add a kunit for this corner case (Daniel Vetter)
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Fixes: 96950929eb23 ("drm/buddy: Implement tracking clear page feature")
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240514145636.16253-1-Arunpravin.PaneerSelvam@amd.com
|
|
The breadcrumbs use a GT wakeref for guarding the interrupt, but are
disarmed during release of the engine wakeref. This leaves a hole where
we may attach a breadcrumb just as the engine is parking (after it has
parked its breadcrumbs), execute the irq worker with some signalers still
attached, but never be woken again.
That issue manifests itself in CI with IGT runner timeouts while tests
are waiting indefinitely for release of all GT wakerefs.
<6> [209.151778] i915: Running live_engine_pm_selftests/live_engine_busy_stats
<7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_5
<7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_4
<7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_3
<7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_2
<7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling DC_off
<7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] Setting DC state from 00 to 02
<4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT.
...
<6> [299.356526] sysrq: Show State
...
<6> [299.373964] task:i915_selftest state:D stack:11784 pid:5578 tgid:5578 ppid:873 flags:0x00004002
<6> [299.373967] Call Trace:
<6> [299.373968] <TASK>
<6> [299.373970] __schedule+0x3bb/0xda0
<6> [299.373974] schedule+0x41/0x110
<6> [299.373976] intel_wakeref_wait_for_idle+0x82/0x100 [i915]
<6> [299.374083] ? __pfx_var_wake_function+0x10/0x10
<6> [299.374087] live_engine_busy_stats+0x9b/0x500 [i915]
<6> [299.374173] __i915_subtests+0xbe/0x240 [i915]
<6> [299.374277] ? __pfx___intel_gt_live_setup+0x10/0x10 [i915]
<6> [299.374369] ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915]
<6> [299.374456] intel_engine_live_selftests+0x1c/0x30 [i915]
<6> [299.374547] __run_selftests+0xbb/0x190 [i915]
<6> [299.374635] i915_live_selftests+0x4b/0x90 [i915]
<6> [299.374717] i915_pci_probe+0x10d/0x210 [i915]
At the end of the interrupt worker, if there are no more engines awake,
disarm the breadcrumb and go to sleep.
Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: <stable@vger.kernel.org> # v5.12+
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423165505.465734-2-janusz.krzysztofik@linux.intel.com
|
|
The mapings should be replaced by mappings.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Deming Wang <wangdeming@inspur.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240513061451.1627-1-wangdeming@inspur.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Zero-length arrays are deprecated and flexible arrays should be
used instead: https://www.kernel.org/doc/html/v6.9-rc7/process/deprecated.html#zero-length-and-one-element-arrays
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202405051824.AmjAI5Pg-lkp@intel.com/
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240506141917.205714-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit ee7284230644e21fef0e38fc5bf8f907b6bb7f7c)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
System work queues are shared, use a dedicated work queue for G2H
processing to avoid G2H processing getting block behind system tasks.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240506034758.3697397-1-matthew.brost@intel.com
(cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
|
This extra macro level between the region IDs and their bitmasks
just makes it harder to see what is used where. Get rid of the
wrappers.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502121423.1002-3-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The name 'HAS_REGION()' suggests we are checking for a single
region, so seem more sensible to pass in the region ID rather
than a bitmask.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502121423.1002-2-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
HAS_REGION() takes a bitmask, not the region ID. This causes the
GEM_BUG_ON() to assert that the SMEM region is available rather
than the intended LMEM region. No real harm since SMEM is always
available, but also not checking what was intended.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502121423.1002-1-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This reverts commit 1f33dc0c1189efb9ae19c6fc22b64dd3e26261fb.
There was a patch supposed to fix an issue of illegal attempts to free a
still active i915 VMA object when parking a GT believed to be idle,
reported by CI on 2-GT Meteor Lake. As a solution, an extra wakeref for
a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit
f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform").
However, that fix occurred insufficient -- the issue was still reported by
CI. That wakeref was released on exit from i915_gem_do_execbuffer(), then
potentially before completion of the request and deactivation of its
associated VMAs. Moreover, CI reports indicated that single-GT platforms
also suffered sporadically from the same race.
Since that issue was fixed by another commit f3c71b2ded5c ("drm/i915/vma:
Fix UAF on destroy against retire race"), the changes introduced by that
insufficient fix were dropped as no longer useful. However, that series
resulted in another VMA UAF scenario now being triggered in CI.
<4> [260.290809] ------------[ cut here ]------------
<4> [260.290988] list_del corruption. prev->next should be ffff888118c5d990, but was ffff888118c5a510. (prev=ffff888118c5a510)
<4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 __list_del_entry_valid_or_report+0xb7/0xe0
..
<4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
<4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
<4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0
...
<4> [260.291087] Call Trace:
<4> [260.291089] <TASK>
<4> [260.291124] i915_vma_reopen+0x43/0x80 [i915]
<4> [260.291298] eb_lookup_vmas+0x9cb/0xcc0 [i915]
<4> [260.291579] i915_gem_do_execbuffer+0xc9a/0x26d0 [i915]
<4> [260.291883] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
...
<4> [260.292301] </TASK>
...
<4> [260.292506] ---[ end trace 0000000000000000 ]---
<4> [260.292782] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ca3: 0000 [#1] PREEMPT SMP NOPTI
<4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: G W 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
<4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
<4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915]
...
<4> [260.428756] Call Trace:
<4> [260.431192] <TASK>
<4> [639.283393] i915_gem_do_execbuffer+0xd05/0x26d0 [i915]
<4> [639.305245] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
...
<4> [639.411134] </TASK>
...
<4> [639.449979] ---[ end trace 0000000000000000 ]---
We defer actually closing, unbinding and destroying a VMA until next idle
point, or until the object is freed in the meantime. By postponing the
unbind, we allow for the VMA to be reopened by the client, avoiding the
work required to rebind the VMA.
Starting from commit b0647a5e79b1 ("drm/i915: Avoid live-lock with
i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA
would be reopened while we destroy them. That assumption is no longer
true in multi-GT configurations, where a VMA we reopen may be handled by a
GT different from the one that we already keep active via its engine while
we set up an execbuf request.
Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer()
processing path seems to fix this issue.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@linux.intel.com>
Fixes: 1f33dc0c1189 ("drm/i915: Remove extra multi-gt pm-references")
Link: https://patchwork.freedesktop.org/patch/msgid/20240506180253.96858-2-janusz.krzysztofik@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
We don't need to run the validation of the XML files if we are just
compiling the kernel. Skip the validation unless the user enables
corresponding Kconfig option. This removes a warning from gen_header.py
about lxml being not installed.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20240409120108.2303d0bd@canb.auug.org.au/
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/592558/
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
These tables were made non-const in commit 3cba4a2cdff3 ("drm/msm/a6xx:
Update ROQ size in coredump") in order to avoid powering up the GPU when
reading back a devcoredump. Instead let's just stash the count that is
potentially read from hw in struct a6xx_gpu_state_obj, and make the
tables const again.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/592699/
|
|
Add an a750 case to the various places where we choose a list of
registers.
Patchwork: https://patchwork.freedesktop.org/patch/592519/
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592519
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Use the kgsl-style list of indices, because this is about to change for
a750 and we want to reuse the downstream header directly.
Patchwork: https://patchwork.freedesktop.org/patch/592520/
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592520
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Update to Mesa commit e82d70d472cc ("freedreno/a7xx: Add
A7XX_HLSQ_DP_STR location from kgsl").
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592518/
Signed-off-by: Rob Clark <robdclark@chromium.org>
|
|
Add A7XX prefixes necessary because we use the same code for dumping
a6xx and a7xx, fix register name prefixes for upstream, and use the
upstream header.
Patchwork: https://patchwork.freedesktop.org/patch/592517/
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/592517
Signed-off-by: Rob Clark <robdclark@chromium.org>
|