Age | Commit message (Collapse) | Author | Files | Lines |
|
Add the callback for implementation for swsmu.
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 92e511d1cecc6a8fa7bdfc8657f16ece9ab4d456)
|
|
To be used for display idle optimizations when
we want to pause non-default profiles.
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6dafb5d4c7cdfc8f994e789d050e29e0d5ca6efd)
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert as much as possible of i9xx_wm.c to struct
intel_display.
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/bbee93f837fe7fedfd1627ff6fa295da8881df8d.1744119460.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The registers handled in i9xx_wm.c are mostly display registers. The
MCH_SSKPD and MLTR_ILK registers are not. Convert register access to
intel_de_*() interface where applicaple.
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/68367382759570413669d5648895a1da8f6c68f7.1744119460.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert the i9xx_wm.h interface to struct intel_display.
With this, we can make intel_wm.c independent of i915_drv.h.
v2: Also remove i915_drv.h, fix commit message
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/3e30634d85c0e0aac9c95f9a2f928131ba400271.1744119460.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert as much as possible of skl_watermarks.c to struct
intel_display.
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/61ae2013c5db962e90e072be7d37d630cb7dfc34.1744119460.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert the skl_watermark.h interface to struct intel_display.
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/cd2b1863dee25b69b4766090dd183a7467c4edea.1744119460.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert as much as possible of intel_wm.c to struct
intel_display.
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/6106c0313190ee904c7f7737d0b78b61983eed91.1744119460.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Going forward, struct intel_display is the main display device data
pointer. Convert the intel_wm.h interface as well as the hooks in struct
intel_wm_funcs to struct intel_display.
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/1085900b4e46bbb514e6918c321639ac380331ce.1744119460.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Unify the naming of the data and clock lane timing parameters, and
simplify their bounds checks. Drop the debug messages on out of bounds
parameters as excessive.
Clarify the comment while at it.
Cc: William Tseng <william.tseng@intel.com>
Reviewed-by: William Tseng <william.tseng@intel.com>
Tested-by: William Tseng <william.tseng@intel.com>
Link: https://lore.kernel.org/r/d1a75ae7b9d93a0b50976b5de45ba2ca798991ad.1743682608.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
The history of why the max of VBT clock and data lane prepare timing
parameter is used for both instead of each individually is
unknown. Separate them to follow what the Windows driver does.
Cc; William Tseng <william.tseng@intel.com>
Reviewed-by: William Tseng <william.tseng@intel.com>
Tested-by: William Tseng <william.tseng@intel.com>
Link: https://lore.kernel.org/r/079a26d0aae79f299aee0397dad2d6519cd55071.1743682608.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
HuC delayed loading fence, introduced with commit 27536e03271da
("drm/i915/huc: track delayed HuC load with a fence"), is registered with
object tracker early on driver probe but unregistered only from driver
remove, which is not called on early probe errors. Since its memory is
allocated under devres, then released anyway, it may happen to be
allocated again to the fence and reused on future driver probes, resulting
in kernel warnings that taint the kernel:
<4> [309.731371] ------------[ cut here ]------------
<3> [309.731373] ODEBUG: init destroyed (active state 0) object: ffff88813d7dd2e0 object type: i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x20 [i915]
<4> [309.731575] WARNING: CPU: 2 PID: 3161 at lib/debugobjects.c:612 debug_print_object+0x93/0xf0
...
<4> [309.731693] CPU: 2 UID: 0 PID: 3161 Comm: i915_module_loa Tainted: G U 6.14.0-CI_DRM_16362-gf0fd77956987+ #1
...
<4> [309.731700] RIP: 0010:debug_print_object+0x93/0xf0
...
<4> [309.731728] Call Trace:
<4> [309.731730] <TASK>
...
<4> [309.731949] __debug_object_init+0x17b/0x1c0
<4> [309.731957] debug_object_init+0x34/0x50
<4> [309.732126] __i915_sw_fence_init+0x34/0x60 [i915]
<4> [309.732256] intel_huc_init_early+0x4b/0x1d0 [i915]
<4> [309.732468] intel_uc_init_early+0x61/0x680 [i915]
<4> [309.732667] intel_gt_common_init_early+0x105/0x130 [i915]
<4> [309.732804] intel_root_gt_init_early+0x63/0x80 [i915]
<4> [309.732938] i915_driver_probe+0x1fa/0xeb0 [i915]
<4> [309.733075] i915_pci_probe+0xe6/0x220 [i915]
<4> [309.733198] local_pci_probe+0x44/0xb0
<4> [309.733203] pci_device_probe+0xf4/0x270
<4> [309.733209] really_probe+0xee/0x3c0
<4> [309.733215] __driver_probe_device+0x8c/0x180
<4> [309.733219] driver_probe_device+0x24/0xd0
<4> [309.733223] __driver_attach+0x10f/0x220
<4> [309.733230] bus_for_each_dev+0x7d/0xe0
<4> [309.733236] driver_attach+0x1e/0x30
<4> [309.733239] bus_add_driver+0x151/0x290
<4> [309.733244] driver_register+0x5e/0x130
<4> [309.733247] __pci_register_driver+0x7d/0x90
<4> [309.733251] i915_pci_register_driver+0x23/0x30 [i915]
<4> [309.733413] i915_init+0x34/0x120 [i915]
<4> [309.733655] do_one_initcall+0x62/0x3f0
<4> [309.733667] do_init_module+0x97/0x2a0
<4> [309.733671] load_module+0x25ff/0x2890
<4> [309.733688] init_module_from_file+0x97/0xe0
<4> [309.733701] idempotent_init_module+0x118/0x330
<4> [309.733711] __x64_sys_finit_module+0x77/0x100
<4> [309.733715] x64_sys_call+0x1f37/0x2650
<4> [309.733719] do_syscall_64+0x91/0x180
<4> [309.733763] entry_SYSCALL_64_after_hwframe+0x76/0x7e
<4> [309.733792] </TASK>
...
<4> [309.733806] ---[ end trace 0000000000000000 ]---
That scenario is most easily reproducible with
igt@i915_module_load@reload-with-fault-injection.
Fix the issue by moving the cleanup step to driver release path.
Fixes: 27536e03271da ("drm/i915/huc: track delayed HuC load with a fence")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13592
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250402172057.209924-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 795dbde92fe5c6996a02a5b579481de73035e7bf)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Add the missing vrr parameters in vrr_params_changed() helper.
This ensures that changes in vrr.vsync_{start,end} trigger a call to
appropriate helpers to update the VRR registers.
Fixes: e8cd188e91bb ("drm/i915/display: Compute vrr_vsync params")
Cc: Mitul Golani <mitulkumar.ajitkumar.golani@intel.com>
Cc: Arun R Murthy <arun.r.murthy@intel.com>
Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://lore.kernel.org/r/20250404080540.2059511-1-ankit.k.nautiyal@intel.com
(cherry picked from commit ced5e64f011cb5cd541988442997ceaa7385827e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Make CONFIG_DRM_EFIDRM a tristate to enable module builds.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://lore.kernel.org/r/20250408091837.407401-3-tzimmermann@suse.de
|
|
Enable survivability mode if supported and configfs attribute is set.
Enabling survivability mode manually is useful in cases where pcode does
not detect failure, validation and for IFR (in-field-repair).
To set configfs survivability mode attribute for a device
echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode
The card enters survivability mode if supported
v2: add a log if survivability mode is enabled for unsupported
platforms (Rodrigo)
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250407051414.1651616-4-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Add survivability mode document to pcode document as it is enabled
when pcode detects a failure.
v2: fix kernel-doc (Lucas)
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250407051414.1651616-3-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Registers a configfs subsystem called 'xe' that creates a
directory in the mounted configfs directory (/sys/kernel/config)
Userspace can then create the device that has to be configured
under the xe directory
mkdir /sys/kernel/config/xe/0000:03:00.0
The device created will have the following attributes to be
configured
/sys/kernel/config/xe/
.. 0000:03:00.0/
... survivability_mode
v2: fix kernel-doc
fix return value (Lucas)
v3: fix kernel-doc (Lucas)
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250407051414.1651616-2-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
When the parameter is set, disable user submissions
to kernel queues.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When the parameter is set, disable user submissions
to kernel queues.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For SDMA, we still need kernel queues for paging so
they need to be initialized, but we no not want to
accept submissions from userspace when disable_kq
is set.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Plumb in support for disabling kernel queues.
v2: use ring counts per Felix' suggestion
v3: fix stream fault handler, enable EOP interrupts
v4: fix MEC interrupt offset (Sunil)
v5: clean up after removing extra sched.ready settings
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Plumb in support for disabling kernel queues in
GFX11. We have to bring up a GFX queue briefly in
order to initialize the clear state. After that
we can disable it.
v2: use ring counts per Felix' suggestion
v3: fix stream fault handler, enable EOP interrupts
v4: fix MEC interrupt offset (Sunil)
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If we don't have kernel queues, the vmids can be used by
the MES for user queues.
Acked-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Make all resources available to user queues.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Suggested-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add proper checks for disable_kq functionality in
gfx helper functions. Add special logic for families
that require the clear state setup.
v2: use ring count as per Felix suggestion
v3: fix num_gfx_rings handling in amdgpu_gfx_graphics_queue_acquire()
v4: fix error code (Alex)
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This would be set by IPs which only accept submissions
from the kernel, not userspace, such as when kernel
queues are disabled. Don't expose the rings to userspace
and reject any submissions in the CS IOCTL.
v2: fix error code (Alex)
Reviewed-by: Sunil Khatri<sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
On chips that support user queues, setting this option
will disable kernel queues to be used to validate
user queues without kernel queues.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Similar to KFD, prevent runtime pm while user queues are active.
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
So we can iterate across them when we need to manage
all user queues.
v2: add uq_mgr to adev list in amdgpu_userq_mgr_init
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add the user queue IP support query to the drm_amdgpu_info_device
query.
Cc: marek.olsak@amd.com
Cc: prike.liang@amd.com
Cc: sunil.khatri@amd.com
Cc: yogesh.mohanmarimuthu@amd.com
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add an INFO query to check if user queues are supported.
v2: switch to a mask of IPs (Marek)
v3: move to drm_amdgpu_info_device (Marek)
Cc: marek.olsak@amd.com
Cc: prike.liang@amd.com
Cc: sunil.khatri@amd.com
Cc: yogesh.mohanmarimuthu@amd.com
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a separate switch statement for the userq callback
assignment so that we can assign the callbacks for each
asic as the firmware becomes available.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
With the ME details fixed, we can now consolidate
this state. Also split out the userq setup into a separate
switch statement so that we can set them per IP version
when the firmwares are ready.
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The display is freezing because the amdgpu_userq_wait_ioctl()
is waiting for a non-user queue fence(specifically, the PT update fence).
RootCause:
The resume_work is initiated by both amdgpu_userq_suspend and
amdgpu_userqueue_ensure_ev_fence at same time. The amdgpu_userq_suspend
signals a dma-fence and subsequently triggers the resume_work, which is
intended to replace the existing fence by creating new dma-fence. However,
following this, the amdgpu_userqueue_ensure_ev_fence schedules another
resume_work that generates a new dma-fence, thereby replacing the one
created by amdgpu_userq_suspend. Consequently, the original fence will
never be signaled.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Warn if the number of pipes exceeds what the MES supports.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move it to amdgpu_mes to align with the compute and
sdma hqd masks. No functional change.
v2: rebase on new changes
v3: misc optimizations
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri<sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This was leftover from MES bring up when we had MES
user queues in the kernel. It's no longer used so
remove it.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Leftover from the MES self tests that were removed previously.
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Make sure these are set properly to ensure compatibility if
we ever update the IOCTL interface.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Encountering a taint issue during the unloading of gpu_sched
due to the fence not being released/put. In this context,
amdgpu_vm_clear_freed is responsible for creating a job to
update the page table (PT). It allocates kmem_cache for
drm_sched_fence and returns the finished fence associated
with job->base.s_fence. In case of Usermode queue this finished
fence is added to the timeline sync object through
amdgpu_gem_update_bo_mapping, which is utilized by user
space to ensure the completion of the PT update.
[ 508.900587] =============================================================================
[ 508.900605] BUG drm_sched_fence (Tainted: G N): Objects remaining in drm_sched_fence on __kmem_cache_shutdown()
[ 508.900617] -----------------------------------------------------------------------------
[ 508.900627] Slab 0xffffe0cc04548780 objects=32 used=2 fp=0xffff8ea81521f000 flags=0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff)
[ 508.900645] CPU: 3 UID: 0 PID: 2337 Comm: rmmod Tainted: G N 6.12.0+ #1
[ 508.900651] Tainted: [N]=TEST
[ 508.900653] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F34 06/10/2021
[ 508.900656] Call Trace:
[ 508.900659] <TASK>
[ 508.900665] dump_stack_lvl+0x70/0x90
[ 508.900674] dump_stack+0x14/0x20
[ 508.900678] slab_err+0xcb/0x110
[ 508.900687] ? srso_return_thunk+0x5/0x5f
[ 508.900692] ? try_to_grab_pending+0xd3/0x1d0
[ 508.900697] ? srso_return_thunk+0x5/0x5f
[ 508.900701] ? mutex_lock+0x17/0x50
[ 508.900708] __kmem_cache_shutdown+0x144/0x2d0
[ 508.900713] ? flush_rcu_work+0x50/0x60
[ 508.900719] kmem_cache_destroy+0x46/0x1f0
[ 508.900728] drm_sched_fence_slab_fini+0x19/0x970 [gpu_sched]
[ 508.900736] __do_sys_delete_module.constprop.0+0x184/0x320
[ 508.900744] ? srso_return_thunk+0x5/0x5f
[ 508.900747] ? debug_smp_processor_id+0x1b/0x30
[ 508.900754] __x64_sys_delete_module+0x16/0x20
[ 508.900758] x64_sys_call+0xdf/0x20d0
[ 508.900763] do_syscall_64+0x51/0x120
[ 508.900769] entry_SYSCALL_64_after_hwframe+0x76/0x7e
v2: call dma_fence_put in amdgpu_gem_va_update_vm
v3: Addressed review comments from Christian.
- calling amdgpu_gem_update_timeline_node before switch.
- puting a dma_fence in case of error or !timeline_syncobj.
v4: Addressed review comments from Christian.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Le Ma <le.ma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
To align with other headers.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This can be enabled now. We have the firmware checks
in place.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Currently disabled until the firmwares are officially
released.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
s/CONFIG_DRM_AMD_USERQ_GFX/CONFIG_DRM_AMDGPU_NAVI3X_USERQ/
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The feature is not navi3x specific at this point.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
I'd swear this was already fixed, but I guess the patch never
landed. Add it now.
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Take a reference when we create a queue and drop it
when we destroy the queue. We need to keep the device
active while user queues are active.
v2: squash in fix from Sunil
v3: squash in fix from Prike
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the IP type to look up the userq functions rather
than hardcoding it.
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
A deadlock situation has arised between the userq
signal ioctl and the eviction fence. In this scenario,
the function amdgpu_userq_signal_ioctl() has acquired a reservation
lock on the read/write buffer object (BO) through drm_exec.
Subsequently, it calls amdgpu_userqueue_ensure_ev_fence(),
which is in a waiting for the userq resume work.
Meanwhile, the userq suspend worker has initiated the userq resume
work(amdgpu_userqueue_resume_worker). This userq resume work attempts
to validate the vm->done BO, leading to amdgpu_userqueue_validate_bos
also attempting to reservation lock the same write BO that is already
locked by amdgpu_userq_signal_ioctl.
As a result, the resume work becomes stalled, causing
amdgpu_userqueue_ensure_ev_fence to remain in a waiting state.
Call Trace:
[ 242.836469] INFO: task gnome-shel:cs0:1288 blocked for more than 120 seconds.
[ 242.836486] Tainted: G OE 6.12.0-rc2rebased-oct-24+ #4
[ 242.836491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.836494] task:gnome-shel:cs0 state:D stack:0 pid:1288 tgid:1282 ppid:1180 flags:0x00000002
[ 242.836503] Call Trace:
[ 242.836508] <TASK>
[ 242.836517] __schedule+0x3e0/0xb10
[ 242.836530] ? srso_return_thunk+0x5/0x5f
[ 242.836541] schedule+0x31/0x120
[ 242.836546] schedule_timeout+0x150/0x160
[ 242.836551] ? srso_return_thunk+0x5/0x5f
[ 242.836555] ? sysvec_call_function+0x69/0xd0
[ 242.836562] ? srso_return_thunk+0x5/0x5f
[ 242.836567] ? preempt_count_add+0x7f/0xd0
[ 242.836577] __wait_for_common+0x91/0x180
[ 242.836582] ? __pfx_schedule_timeout+0x10/0x10
[ 242.836590] wait_for_completion+0x28/0x30
[ 242.836595] __flush_work+0x16c/0x290
[ 242.836602] ? __pfx_wq_barrier_func+0x10/0x10
[ 242.836611] flush_delayed_work+0x3a/0x60
[ 242.836621] amdgpu_userqueue_ensure_ev_fence+0x2d/0xb0 [amdgpu]
[ 242.836966] amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu]
[ 242.837171] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[ 242.837365] drm_ioctl_kernel+0xae/0x100 [drm]
[ 242.837398] drm_ioctl+0x2a1/0x500 [drm]
[ 242.837420] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[ 242.837622] ? srso_return_thunk+0x5/0x5f
[ 242.837627] ? srso_return_thunk+0x5/0x5f
[ 242.837630] ? _raw_spin_unlock_irqrestore+0x2b/0x50
[ 242.837635] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 242.837811] __x64_sys_ioctl+0x99/0xd0
[ 242.837820] x64_sys_call+0x1209/0x20d0
[ 242.837825] do_syscall_64+0x51/0x120
[ 242.837830] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 242.837835] RIP: 0033:0x7f2f33f1a94f
[ 242.837838] RSP: 002b:00007f2f24ffea30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 242.837842] RAX: ffffffffffffffda RBX: 00007f2f24ffebd0 RCX: 00007f2f33f1a94f
[ 242.837845] RDX: 00007f2f24ffebd0 RSI: 00000000c0306457 RDI: 000000000000000d
[ 242.837847] RBP: 00007f2f24ffeab0 R08: 0000000000000000 R09: 0000000000000000
[ 242.837849] R10: 00007f2f24ffecd0 R11: 0000000000000246 R12: 00007f2f25000640
[ 242.837851] R13: 00000000c0306457 R14: 000000000000000d R15: 00007fff3b39c1e0
[ 242.837858] </TASK>
[ 242.837865] INFO: task Xwayland:cs0:1517 blocked for more than 120 seconds.
[ 242.837869] Tainted: G OE 6.12.0-rc2rebased-oct-24+ #4
[ 242.837872] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.837874] task:Xwayland:cs0 state:D stack:0 pid:1517 tgid:1338 ppid:1282 flags:0x00004002
[ 242.837878] Call Trace:
[ 242.837880] <TASK>
[ 242.837883] __schedule+0x3e0/0xb10
[ 242.837890] schedule+0x31/0x120
[ 242.837894] schedule_preempt_disabled+0x1c/0x30
[ 242.837897] __mutex_lock.constprop.0+0x386/0x6e0
[ 242.837902] ? srso_return_thunk+0x5/0x5f
[ 242.837905] ? __timer_delete_sync+0x81/0xe0
[ 242.837911] __mutex_lock_slowpath+0x13/0x20
[ 242.837915] mutex_lock+0x3b/0x50
[ 242.837919] amdgpu_userqueue_ensure_ev_fence+0x35/0xb0 [amdgpu]
[ 242.838138] amdgpu_userq_signal_ioctl+0x959/0xec0 [amdgpu]
[ 242.838340] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[ 242.838531] drm_ioctl_kernel+0xae/0x100 [drm]
[ 242.838559] drm_ioctl+0x2a1/0x500 [drm]
[ 242.838580] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu]
[ 242.838778] ? srso_return_thunk+0x5/0x5f
[ 242.838783] ? srso_return_thunk+0x5/0x5f
[ 242.838786] ? _raw_spin_unlock_irqrestore+0x2b/0x50
[ 242.838791] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 242.838967] __x64_sys_ioctl+0x99/0xd0
[ 242.838972] x64_sys_call+0x1209/0x20d0
[ 242.838975] do_syscall_64+0x51/0x120
[ 242.838979] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 242.838982] RIP: 0033:0x7f9118b1a94f
[ 242.838985] RSP: 002b:00007f910cdff760 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 242.838989] RAX: ffffffffffffffda RBX: 00007f910cdff910 RCX: 00007f9118b1a94f
[ 242.838991] RDX: 00007f910cdff910 RSI: 00000000c0306457 RDI: 000000000000000c
[ 242.838993] RBP: 00007f910cdff7e0 R08: 0000000000000000 R09: 0000000000000001
[ 242.838995] R10: 00007f910cdff9d4 R11: 0000000000000246 R12: 00007f910ce00640
[ 242.838997] R13: 00000000c0306457 R14: 000000000000000c R15: 00007fff9dd11d10
[ 242.839004] </TASK>
v2: Addressed review comemnts from Christian.
v3/v4: Addressed review comemnts from Christian.
- Move drm_exec drm_exec loop after userq fence create.
- cleanup the newly created userq fence in case of error.
v5 - Addressed review comemnts from Christian.
- Create a new amdgpu_userq_fence_alloc() function for allocation.
- Calling dma_fence_put for cleanup procedure.
- make amdgpu_userq_fence_create() function static.
- drm_exec_init is called after mutex_unlock.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The seq64 VM cache policy should be set to UC (Uncached) to
match with userqueue fence address kernel mapped memory's
cache settings.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|