aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/Documentation/gpu
diff options
context:
space:
mode:
authorMaxime Ripard <mripard@kernel.org>2025-02-06 13:47:32 +0100
committerMaxime Ripard <mripard@kernel.org>2025-02-06 13:47:32 +0100
commit93c7dd1b39444ebd5a6a98e56a363d7a4e646775 (patch)
tree6e186e041d4253059a0e6471cb444ea35da5db09 /Documentation/gpu
parentdrm/rockchip: vop2: Improve display modes handling on RK3588 HDMI0 (diff)
parentLinux 6.14-rc1 (diff)
downloadwireguard-linux-93c7dd1b39444ebd5a6a98e56a363d7a4e646775.tar.xz
wireguard-linux-93c7dd1b39444ebd5a6a98e56a363d7a4e646775.zip
Merge drm/drm-next into drm-misc-next
Bring rc1 to start the new release dev. Signed-off-by: Maxime Ripard <mripard@kernel.org>
Diffstat (limited to 'Documentation/gpu')
-rw-r--r--Documentation/gpu/drm-compute.rst54
-rw-r--r--Documentation/gpu/index.rst1
-rw-r--r--Documentation/gpu/xe/index.rst1
-rw-r--r--Documentation/gpu/xe/xe_devcoredump.rst14
4 files changed, 70 insertions, 0 deletions
diff --git a/Documentation/gpu/drm-compute.rst b/Documentation/gpu/drm-compute.rst
new file mode 100644
index 000000000000..f90c3e63aa9e
--- /dev/null
+++ b/Documentation/gpu/drm-compute.rst
@@ -0,0 +1,54 @@
+==================================
+Long running workloads and compute
+==================================
+
+Long running workloads (compute) are workloads that will not complete in 10
+seconds. (The time let the user wait before he reaches for the power button).
+This means that other techniques need to be used to manage those workloads,
+that cannot use fences.
+
+Some hardware may schedule compute jobs, and have no way to pre-empt them, or
+have their memory swapped out from them. Or they simply want their workload
+not to be preempted or swapped out at all.
+
+This means that it differs from what is described in driver-api/dma-buf.rst.
+
+As with normal compute jobs, dma-fence may not be used at all. In this case,
+not even to force preemption. The driver with is simply forced to unmap a BO
+from the long compute job's address space on unbind immediately, not even
+waiting for the workload to complete. Effectively this terminates the workload
+when there is no hardware support to recover.
+
+Since this is undesirable, there need to be mitigations to prevent a workload
+from being terminated. There are several possible approach, all with their
+advantages and drawbacks.
+
+The first approach you will likely try is to pin all buffers used by compute.
+This guarantees that the job will run uninterrupted, but also allows a very
+denial of service attack by pinning as much memory as possible, hogging the
+all GPU memory, and possibly a huge chunk of CPU memory.
+
+A second approach that will work slightly better on its own is adding an option
+not to evict when creating a new job (any kind). If all of userspace opts in
+to this flag, it would prevent cooperating userspace from forced terminating
+older compute jobs to start a new one.
+
+If job preemption and recoverable pagefaults are not available, those are the
+only approaches possible. So even with those, you want a separate way of
+controlling resources. The standard kernel way of doing so is cgroups.
+
+This creates a third option, using cgroups to prevent eviction. Both GPU and
+driver-allocated CPU memory would be accounted to the correct cgroup, and
+eviction would be made cgroup aware. This allows the GPU to be partitioned
+into cgroups, that will allow jobs to run next to each other without
+interference.
+
+The interface to the cgroup would be similar to the current CPU memory
+interface, with similar semantics for min/low/high/max, if eviction can
+be made cgroup aware.
+
+What should be noted is that each memory region (tiled memory for example)
+should have its own accounting.
+
+The key is set to the regionid set by the driver, for example "tile0".
+For the value of $card, we use drmGetUnique().
diff --git a/Documentation/gpu/index.rst b/Documentation/gpu/index.rst
index 37e383ccf73f..7dcb15850afd 100644
--- a/Documentation/gpu/index.rst
+++ b/Documentation/gpu/index.rst
@@ -13,6 +13,7 @@ GPU Driver Developer's Guide
drm-usage-stats
driver-uapi
drm-client
+ drm-compute
drivers
backlight
vga-switcheroo
diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst
index 3f07aa3b5432..92cfb25e64d3 100644
--- a/Documentation/gpu/xe/index.rst
+++ b/Documentation/gpu/xe/index.rst
@@ -23,4 +23,5 @@ DG2, etc is provided to prototype the driver.
xe_firmware
xe_tile
xe_debugging
+ xe_devcoredump
xe-drm-usage-stats.rst
diff --git a/Documentation/gpu/xe/xe_devcoredump.rst b/Documentation/gpu/xe/xe_devcoredump.rst
new file mode 100644
index 000000000000..ae4ec0e34dc0
--- /dev/null
+++ b/Documentation/gpu/xe/xe_devcoredump.rst
@@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+==================
+Xe Device Coredump
+==================
+
+.. kernel-doc:: drivers/gpu/drm/xe/xe_devcoredump.c
+ :doc: Xe device coredump
+
+Internal API
+============
+
+.. kernel-doc:: drivers/gpu/drm/xe/xe_devcoredump.c
+ :internal: