Age | Commit message (Collapse) | Author | Files | Lines |
|
xe_exec_queue_create_ioctl() performs a lookup of the xe_gt for the GT
ID passed from userspace, but the result is never actually used. Since
there's already a separate (and earlier) check that the ID passed from
userspace is valid, the unnecessary lookup can be removed.
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218200511.4050060-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
No functional change. This patch only splits the xe_display_pm
suspend/resume functions in the regular suspend/resume from the
runtime/d3cold ones.
v2: - Rename d3cold functions (Jonathan)
- Rebase
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250218010330.761340-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Move dsm register/unregister calls from the drivers to under
intel_display_driver register/unregister.
v2: Rebase only
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250217200133.741758-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add a convenience function for minimal initialization of struct xe_mmio.
This function also validates that the entirety of the provided mmio region
is usable with struct xe_reg.
v2: Modify commit message, add kernel doc, refactor assert (Michal)
v3: Fix off-by-one bug, add clarifying macro (Michal)
v4: Derive bitfield width from size (Michal)
Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213093559.204652-1-ilia.levi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Rename so that xe_mmio_init() can be used in subsequent patches to
initialize an instance of struct xe_mmio.
Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130105057.136586-1-ilia.levi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
This should make it easier to understand from userspace why importing BO
fails.
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250117115305.53113-1-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
|
|
%s/uinitialized/uninitialized/gc
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213060838.32493-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
|
|
If we try to manually provision VFs using debugfs and then we
try to unload the driver, we will see complains like:
[ ] Memory manager not clean during takedown.
[ ] RIP: 0010:drm_mm_takedown+0x3f/0x100
[ ] [drm:drm_mm_takedown] *ERROR* node [fedff000 + 00001000]: inserted at
drm_mm_insert_node_in_range+0x2bd/0x520
xe_ggtt_node_insert+0x52/0x90 [xe]
pf_provision_vf_ggtt+0x1fa/0xac0 [xe]
xe_gt_sriov_pf_config_set_ggtt+0x79/0x7a0 [xe]
ggtt_set+0x53/0x80 [xe]
simple_attr_write_xsigned.isra.0+0xd2/0x150
simple_attr_write+0x14/0x30
debugfs_attr_write+0x4e/0x80
[ ] xe 0000:00:02.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
[ ] xe 0000:00:02.0: [drm] GT0: total 65535
[ ] xe 0000:00:02.0: [drm] GT0: used 1
[ ] xe 0000:00:02.0: [drm] GT0: range 65534..65534 (1)
[ ] xe 0000:00:02.0: [drm] *ERROR* GT0: GuC doorbells manager unclean (1/256)
[ ] xe 0000:00:02.0: [drm] GT0: count: 256
[ ] xe 0000:00:02.0: [drm] GT0: available range: 1..255 (255)
[ ] xe 0000:00:02.0: [drm] GT0: available total: 255
[ ] xe 0000:00:02.0: [drm] GT0: reserved range: 0..0 (1)
[ ] xe 0000:00:02.0: [drm] GT0: reserved total: 1
This could be easily fixed by adding config release action.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250211155034.1028-1-michal.wajdeczko@intel.com
|
|
Not registering hwmon because it's not available (SRIOV_VF and DGFX) is
different from failing the initialization. Handle the errors
appropriately.
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-13-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Now that previous callers in xe_device_probe() are handling the errors,
that can be done for xe_pmu_register() as well.
Cc: Riana Tauro <riana.tauro@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-12-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Let xe_oa_unregister() be handled by devm infra since it's only putting
the kobject. Also, since kobject_create_and_add may fail, handle the
error accordingly.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-11-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
This is not really display-related and needed for any sequence on driver
removal that has to interact with drm_dev_enter()/drm_dev_exit().
Just remove xe_device_remove_display() and inline it in the single
caller to make clear this is not done only for display.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-10-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Like done with other functions, cleanup the error handling in
xe_device_probe() by moving the OA fini to be handled by xe_oa
itself, which relies on devm to call the cleanup function.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-9-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Now that xe_gt_remove is handled entirely by xe_gt, it's clear there are
some extra calls to xe_hw_fence_irq_finish() that aren't necessary.
Neither all_fw_domain_init() or gt_fw_domain_init() need to do that
since it's handled by the caller on any error.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
The only thing in xe_gt_remove() that really needs to happen on the
device remove callback is the xe_uc_remove(). That's because of the
following call chain:
xe_gt_remove()
xe_uc_remove()
xe_gsc_remove()
xe_gsc_proxy_remove()
Move xe_gsc_proxy_remove() to be handled as a xe_device_remove_action,
so it's recorded when it should run during device removal. The rest can
be handled normally by devm infra.
Besides removing the deep call chain above, xe_device_probe() doesn't
have to unwind the gt loop and it's also more in line with the
xe_device_probe() style.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Not being able to initialize pxp is fatal if the platform is expected to
have it. Update comment after commit 9c9dc9ba4a00 ("drm/xe/pxp: Fail the
load if PXP fails to initialize").
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Make sure to differentiate normal behavior, e.g. there's no stolen, from
allocation errors or failure to initialize lower layers.
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Propagate the error to the caller so initialization properly stops if
sysfs creation fails.
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
When devm_add_action_or_reset() fails, it already calls the function
passed as parameter and that function is already free'ing the irqs.
Drop the goto and just return.
The caller, xe_device_probe(), should also do the same thing instead of
wrongly doing `goto err` and calling the unrelated xe_display_fini()
function.
Fixes: 14d25d8d684d ("drm/xe: change old msi irq api to a new one")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
xe_display_fini() undoes things from xe_display_init() (technically from
intel_display_driver_probe()). Those `goto err` in xe_device_probe()
were wrong and being accumulated over time.
Commit 65e366ace5ee ("drm/xe/display: Use a single early init call for
display") made it easier to fix now that we don't have xe_display_* init
calls spread on xe_device_probe(). Change xe_display_init() to use
devm_add_action_or_reset() that will finalize display in the right
order.
While at it, also add a newline and comment about calling
xe_driver_flr_fini.
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
xe device probe uses devm cleanup in most places. However there are a
few cases where this is not possible: when the driver interacts with
component add/del. In that case, the resource group would be cleanup
while the entire device resources are in the process of cleanup. One
example is the xe_gsc_proxy and display using that to interact with mei
and audio.
Add a callback-based remove so the exception doesn't make the probe
use multiple error handling styles.
v2: Change internal API to mimic the devm API. This will make it easier
to migrate in future when devm can be used.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213192909.996148-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
It is generally expected that the write() function should return a
positive value indicating the number of bytes written or a negative
error code if an error occurs. Returning 0 is unusual and can lead
to unexpected behavior.
When the user program writes the same value to wedged_mode twice in
a row, a lockup will occur, because the value expected to be
returned by the write() function inside the program should be equal
to the actual written value instead of 0.
To reproduce the issue:
echo 1 > /sys/kernel/debug/dri/0/wedged_mode
echo 1 > /sys/kernel/debug/dri/0/wedged_mode <- lockup here
Signed-off-by: Xin Wang <x.wang@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213223615.2327367-1-x.wang@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
xe_pm_runtime_put is missed in the failure path.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250213230322.1180621-1-shuicheng.lin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This function will synchronously cancel and wait for many display
work queue items, which might try to take the runtime pm reference
causing a bad deadlock. So, remove it from the runtime_pm suspend patch.
Reported-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250212192447.402715-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Continuing the alignment with i915 runtime pm sequence. Add
this missing call.
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131115014.29625-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Currently xe_guc_log_print_dmesg() is unused, as it's expected
developers to add those calls when needed. However it makes it hard to
guarantee it's working as nothing is testing it. Add a node in debugfs
so it can be tested. This is purely for testing purposes since with the
device probed and working, the guc log can be obtained by the regular
debugfs file.
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131171716.3998432-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Explicitly using NULL is the correct approach.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502050322.VUBMyUHc-lkp@intel.com/
Fixes: dcdd6b84d9ac ("drm/xe/pxp: Allocate PXP execution resources")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250204200144.1445800-1-daniele.ceraolospurio@intel.com
|
|
The top of stolen memory is WOPCM, which shouldn't be accessed. Remove
this portion from the stolen memory region for discrete platforms.
This was already done for integrated, but was missing for discrete
platforms.
This also moves get_wopcm_size() so detect_bar2_dgfx() and
detect_bar2_integrated can use the same function.
v2: Improve commit message and suitable stable version tag(Lucas)
Fixes: d8b52a02cb40 ("drm/xe: Implement stolen memory.")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org # v6.11+
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250210143654.2076747-1-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
|
|
bos_lock is to protect list of bos used by client, it is
not required to protect bo->client so bring it outside of
bos_lock.
Fixes: b27970f3e11c ("drm/xe: Add tracking support for bos per client")
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250205051042.1991192-1-tejas.upadhyay@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
|
|
VRAM manager is related directly to struct xe_vram_region so it
should be inside this structure.
Let's move the VRAM to struct xe_vram_region.
v2:
- remove xe_vram_region pointer from xe_ttm_vram_mgr
- stop use dynamic alloaction for xe_ttm_vram_mgr in xe_vram_region
- rename struct xe_ttm_vram_mgr vram_mgr to ttm
v3:
- fix "'ttm' not described in 'xe_vram_region'"
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250210081511.906452-3-piotr.piorkowski@intel.com
|
|
The xe_mem_region structure has so far been used only in the context
of VRAM regions. Also, the description of its fields clearly indicates
that it was designed for VRAM regions. This struct is strictly related
only to VRAM.
So let's be clear on this point and rename it to xe_vram_region.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250210081511.906452-2-piotr.piorkowski@intel.com
|
|
So far, the main condition for using LMTT has been to check that
the device is a discrete gfx.
Let's add a dedicated function to check if the device supports LMTT
as not all future discrete GPU platforms will require LMTT.
v2:
- use xe_has_device_lmtt only when necessary - leave IS_DGFX for other
things related to LMEM provisioning
v3:
- remove IS_SRIOV_PF condition from xe_device_has_lmtt (Michal
Wajdeczko)
- keep IS_SRIOV_PF asserts in LMTT-related code (Michal Wajdeczko)
v4:
- update commit description
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250207113111.853821-2-piotr.piorkowski@intel.com
|
|
We should now have sufficient changes in the driver to run it on
PTL platforms in the SR-IOV Physical Function (PF) mode, that would
allow us to enable SR-IOV Virtual Functions (VFs), and successfully
probe our driver in the VF mode on enabled VF devices.
To unblock SR-IOV PF mode you need to load xe with modparam:
xe.max_vfs=7
Then to enable VFs it is sufficient to use:
$ echo 7 > /sys/bus/pci/devices/0000:00:02.0/sriov_numvfs
Note that in default auto-provisioning all VFs are allocated with
some amount of shared resources (like unlimited GPU execution and
preemption times, fair GGTT space, fair GuC context IDs range, ...)
However with CONFIG_DEBUG_FS enabled it is possible to tweak most
of the SR-IOV configuration parameters using attributes like:
/sys/kernel/debug/dri/0000:00:02.0/gt0/
├── pf
│ ├── contexts_spare
│ ├── doorbells_spare
│ ├── exec_quantum_ms
│ ├── ggtt_spare
│ ├── preempt_timeout_us
│ ├── sched_priority
│ └── ...
├── vf1
│ ├── contexts_quota
│ ├── doorbells_quota
│ ├── exec_quantum_ms
│ ├── ggtt_quota
│ ├── preempt_timeout_us
│ ├── sched_priority
│ └── ...
├── vf2
│ └── ...
:
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250206214545.940-1-michal.wajdeczko@intel.com
|
|
Add new entries in stats for vma page faults. If CONFIG_DEBUG_FS is
enabled, the count and number of bytes can be viewed per GT in the
stat debugfs file. This helps when testing, to confirm page faults
have been triggered as expected. It also helps when looking at the
performance impact of page faults. Data is simply collected when
entering the page fault handler so there is no indication whether
it completed successfully, with or without retries, etc.
Example output:
cat /sys/kernel/debug/dri/0/gt0/stats
tlb_inval_count: 129
vma_pagefault_count: 12
vma_pagefault_bytes: 98304
v2: Rebase
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250206134551.1321265-1-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
|
|
Since commit a4d1c5d0b99b ("drm/xe/pf: Move VFs reprovisioning
to worker") and commit 78d5d1e20d1d ("drm/xe/relay: Don't use
GFP_KERNEL for new transactions") we should have no more lockdep
dependencies on the reclaim path when running in the SRIOV mode
so we believe that we can now mark SRIOV driver as reclaim safe.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250205120150.896-1-michal.wajdeczko@intel.com
|
|
A simple lazy buggy copy and paste of the PVC comment has brought
the attention to the incorrect masks of the PVC register for RPa
and RPe. So, let's fix them all.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250109195219.658557-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Add hwmon support for temp2_input and temp3_input attributes, which will
expose package and vram temperature in millidegree Celsius. With this in
place we can monitor temperature using lm-sensors tool.
v2: Reuse existing channels (Badal, Karthik)
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131054502.1528555-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
The PXP implementation mimics the i915 approach of allowing the load
to continue even if PXP init has failed. On Xe however we're taking an
harder stance on boot error and only allowing the load to complete if
everything is working, so update the code to fail if anything goes wrong
during PXP init.
While at it, update the return code in case of PXP not supported to be 0
instead of EOPNOTSUPP, to follow the standard of functions called by
xe_device_probe where every non-zero value means failure.
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250203234857.1419637-1-daniele.ceraolospurio@intel.com
|
|
VFs don't have access to the GDRST(0x941c) register that driver
uses to reset a GT. Attempt to trigger a reset using debugfs:
$ cat /sys/kernel/debug/dri/0000:00:02.1/gt0/force_reset
or due to a hang condition detected by the driver leads to:
[ ] xe 0000:00:02.1: [drm] GT0: trying reset from force_reset [xe]
[ ] xe 0000:00:02.1: [drm] GT0: reset queued
[ ] xe 0000:00:02.1: [drm] GT0: reset started
[ ] ------------[ cut here ]------------
[ ] xe 0000:00:02.1: [drm] GT0: VF is trying to write 0x1 to an inaccessible register 0x941c+0x0
[ ] WARNING: CPU: 3 PID: 3069 at drivers/gpu/drm/xe/xe_gt_sriov_vf.c:996 xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
[ ] RIP: 0010:xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
[ ] Call Trace:
[ ] <TASK>
[ ] ? show_regs+0x6c/0x80
[ ] ? __warn+0x93/0x1c0
[ ] ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
[ ] ? report_bug+0x182/0x1b0
[ ] ? handle_bug+0x6e/0xb0
[ ] ? exc_invalid_op+0x18/0x80
[ ] ? asm_exc_invalid_op+0x1b/0x20
[ ] ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
[ ] ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
[ ] ? xe_gt_tlb_invalidation_reset+0xef/0x110 [xe]
[ ] ? __mutex_unlock_slowpath+0x41/0x2e0
[ ] xe_mmio_write32+0x64/0x150 [xe]
[ ] do_gt_reset+0x2f/0xa0 [xe]
[ ] gt_reset_worker+0x14e/0x1e0 [xe]
[ ] process_one_work+0x21c/0x740
[ ] worker_thread+0x1db/0x3c0
Fix that by sending H2G VF_RESET(0x5507) action instead.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4078
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131182502.852-1-michal.wajdeczko@intel.com
|
|
VFs use a relay transaction during the resume/reset flow and use
of the GFP_KERNEL flag may conflict with the reclaim:
-> #0 (fs_reclaim){+.+.}-{0:0}:
[ ] __lock_acquire+0x1874/0x2bc0
[ ] lock_acquire+0xd2/0x310
[ ] fs_reclaim_acquire+0xc5/0x100
[ ] mempool_alloc_noprof+0x5c/0x1b0
[ ] __relay_get_transaction+0xdc/0xa10 [xe]
[ ] relay_send_to+0x251/0xe50 [xe]
[ ] xe_guc_relay_send_to_pf+0x79/0x3a0 [xe]
[ ] xe_gt_sriov_vf_connect+0x90/0x4d0 [xe]
[ ] xe_uc_init_hw+0x157/0x3b0 [xe]
[ ] do_gt_restart+0x1ae/0x650 [xe]
[ ] xe_gt_resume+0xb6/0x120 [xe]
[ ] xe_pm_runtime_resume+0x15b/0x370 [xe]
[ ] xe_pci_runtime_resume+0x73/0x90 [xe]
[ ] pci_pm_runtime_resume+0xa0/0x100
[ ] __rpm_callback+0x4d/0x170
[ ] rpm_callback+0x64/0x70
[ ] rpm_resume+0x594/0x790
[ ] __pm_runtime_resume+0x4e/0x90
[ ] xe_pm_runtime_get_ioctl+0x9c/0x160 [xe]
Since we have a preallocated pool of relay transactions, which
should cover all our normal relay use cases, we may use the
GFP_NOWAIT flag when allocating new outgoing transactions.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131153713.808-1-michal.wajdeczko@intel.com
|
|
max_remote_tiles is more related to the platform than the GT IP. Thus
move it to platform descriptor from graphics descriptor. Note that the
FIXME is no more required, thus it can be dropped.
v2: Rebase
v3: Change the position of comment (MattR)
Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130085804.4136497-3-sai.teja.pottumuttu@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
dma_mask_size is more related to the platform than the GT IP. Thus
move it to platform descriptors.
v2:
- Rebase
Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130085804.4136497-2-sai.teja.pottumuttu@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
Now that are the pieces are there, we can turn the feature on.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-14-daniele.ceraolospurio@intel.com
|
|
This patch introduces 2 PXP debugfs entries:
- info: prints the current PXP status and key instance
- terminate: simulate a termination interrupt
The first one is useful for debug, while the second one can be used for
testing the termination flow.
v2: move the info prints inside the lock (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-13-daniele.ceraolospurio@intel.com
|
|
The HW suspend flow kills all PXP HWDRM sessions, so we need to mark all
the queues and BOs as invalid and do a full termination when PXP is next
used.
v2: rebase
v3: rebase on new status flow, defer termination to next PXP use as it
makes things much easier and allows us to use the same function for all
types of suspend.
v4: fix the documentation of the suspend function (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-12-daniele.ceraolospurio@intel.com
|
|
The driver needs to know if a BO is encrypted with PXP to enable the
display decryption at flip time.
Furthermore, we want to keep track of the status of the encryption and
reject any operation that involves a BO that is encrypted using an old
key. There are two points in time where such checks can kick in:
1 - at VM bind time, all operations except for unmapping will be
rejected if the key used to encrypt the BO is no longer valid. This
check is opt-in via a new VM_BIND flag, to avoid a scenario where a
malicious app purposely shares an invalid BO with a non-PXP aware
app (such as a compositor). If the VM_BIND was failed, the
compositor would be unable to display anything at all. Allowing the
bind to go through means that output still works, it just displays
garbage data within the bounds of the illegal BO.
2 - at job submission time, if the queue is marked as using PXP, all
objects bound to the VM will be checked and the submission will be
rejected if any of them was encrypted with a key that is no longer
valid.
Note that there is no risk of leaking the encrypted data if a user does
not opt-in to those checks; the only consequence is that the user will
not realize that the encryption key is changed and that the data is no
longer valid.
v2: Better commnnts and descriptions (John), rebase
v3: Properly return the result of key_assign up the stack, do not use
xe_bo in display headers (Jani)
v4: improve key_instance variable documentation (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-11-daniele.ceraolospurio@intel.com
|
|
PXP prerequisites (SW proxy and HuC auth via GSC) are completed
asynchronously from driver load, which means that userspace can start
submitting before we're ready to start a PXP session. Therefore, we need
a query that userspace can use to check not only if PXP is supported but
also to wait until the prerequisites are done.
v2: Improve doc, do not report TYPE_NONE as supported (José)
v3: Better comments, remove unneeded copy_from_user (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-10-daniele.ceraolospurio@intel.com
|
|
Userspace is required to mark a queue as using PXP to guarantee that the
PXP instructions will work. In addition to managing the PXP sessions,
when a PXP queue is created the driver will set the relevant bits in
its context control register.
On submission of a valid PXP queue, the driver will validate all
encrypted objects mapped to the VM to ensured they were encrypted with
the current key.
v2: Remove pxp_types include outside of PXP code (Jani), better comments
and code cleanup (John)
v3: split the internal PXP management to a separate patch for ease of
review. re-order ioctl checks to always return -EINVAL if parameters are
invalid, rebase on msix changes.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-9-daniele.ceraolospurio@intel.com
|
|
We expect every queue that uses PXP to be marked as doing so, to allow
the driver to correctly manage the encryption status. The API for doing
this from userspace is coming in the next patch, while this patch
implement the management side of things. When a PXP queue is created,
the driver will do the following:
- Start the default PXP session if it is not already running;
- assign an rpm ref to the queue to keep for its lifetime (this is
required because PXP HWDRM sessions are killed by the HW suspend flow).
Since PXP start and termination can race each other, this patch also
introduces locking and a state machine to keep track of the pending
operations. Note that since we'll need to take the lock from the
suspend/resume paths as well, we can't do submissions while holding it,
which means we need a slightly more complicated state machine to keep
track of intermediate steps.
v4: new patch in the series, split from the following interface patch to
keep review manageable. Lock and status rework to not do submissions
under lock.
v5: Improve comments and error logs (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-8-daniele.ceraolospurio@intel.com
|
|
A session is initialized (i.e. started) by sending a message to the GSC.
The initialization will be triggered when a user opts-in to using PXP;
the interface for that is coming in a follow-up patch in the series.
v2: clean up error messages, use new ARB define (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-7-daniele.ceraolospurio@intel.com
|