aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers/gpu/drm/scheduler (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-03-28Merge tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds4-89/+160
Pull drm updates from Dave Airlie: "Outside of drm there are some rust patches from Danilo who maintains that area in here, and some pieces for drm header check tests. The major things in here are a new driver supporting the touchbar displays on M1/M2, the nova-core stub driver which is just the vehicle for adding rust abstractions and start developing a real driver inside of. xe adds support for SVM with a non-driver specific SVM core abstraction that will hopefully be useful for other drivers, along with support for shrinking for TTM devices. I'm sure xe and AMD support new devices, but the pipeline depth on these things is hard to know what they end up being in the marketplace! uapi: - add mediatek tiled fourcc - add support for notifying userspace on device wedged new driver: - appletbdrm: support for Apple Touchbar displays on m1/m2 - nova-core: skeleton rust driver to develop nova inside off firmware: - add some rust firmware pieces rust: - add 'LocalModule' type alias component: - add helper to query bound status fbdev: - fbtft: remove access to page->index media: - cec: tda998x: import driver from drm dma-buf: - add fast path for single fence merging tests: - fix lockdep warnings atomic: - allow full modeset on connector changes - clarify semantics of allow_modeset and drm_atomic_helper_check - async-flip: support on arbitary planes - writeback: fix UAF - Document atomic-state history format-helper: - support ARGB8888 to ARGB4444 conversions buddy: - fix multi-root cleanup ci: - update IGT dp: - support extended wake timeout - mst: fix RAD to string conversion - increase DPCD eDP control CAP size to 5 bytes - add DPCD eDP v1.5 definition - add helpers for LTTPR transparent mode panic: - encode QR code according to Fido 2.2 scheduler: - add parameter struct for init - improve job peek/pop operations - optimise drm_sched_job struct layout ttm: - refactor pool allocation - add helpers for TTM shrinker panel-orientation: - add a bunch of new quirks panel: - convert panels to multi-style functions - edp: Add support for B140UAN04.4, BOE NV140FHM-NZ, CSW MNB601LS1-3, LG LP079QX1-SP0V, MNE007QS3-7, STA 116QHD024002, Starry 116KHD024006, Lenovo T14s Gen6 Snapdragon - himax-hx83102: Add support for CSOT PNA957QT1-1, Kingdisplay kd110n11-51ie, Starry 2082109qfh040022-50e - visionox-r66451: use multi-style MIPI-DSI functions - raydium-rm67200: Add driver for Raydium RM67200 - simple: Add support for BOE AV123Z7M-N17, BOE AV123Z7M-N17 - sony-td4353-jdi: Use MIPI-DSI multi-func interface - summit: Add driver for Apple Summit display panel - visionox-rm692e5: Add driver for Visionox RM692E5 bridge: - pass full atomic state to various callbacks - adv7511: Report correct capabilities - it6505: Fix HDCP V compare - snd65dsi86: fix device IDs - nwl-dsi: set bridge type - ti-sn65si83: add error recovery and set bridge type - synopsys: add HDMI audio support xe: - support device-wedged event - add mmap support for PCI memory barrier - perf pmu integration and expose per-engien activity - add EU stall sampling support - GPU SVM and Xe SVM implementation - use TTM shrinker - add survivability mode to allow the driver to do firmware updates in critical failure states - PXP HWDRM support for MTL and LNL - expose package/vram temps over hwmon - enable DP tunneling - drop mmio_ext abstraction - Reject BO evcition if BO is bound to current VM - Xe suballocator improvements - re-use display vmas when possible - add GuC Buffer Cache abstraction - PCI ID update for Panther Lake and Battlemage - Enable SRIOV for Panther Lake - Refactor VRAM manager location i915: - enable extends wake timeout - support device-wedged event - Enable DP 128b/132b SST DSC - FBC dirty rectangle support for display version 30+ - convert i915/xe to drm client setup - Compute HDMI PLLS for rates not in fixed tables - Allow DSB usage when PSR is enabled on LNL+ - Enable panel replay without full modeset - Enable async flips with compressed buffers on ICL+ - support luminance based brightness via DPCD for eDP - enable VRR enable/disable without full modeset - allow GuC SLPC default strategies on MTL+ for performance - lots of display refactoring in move to struct intel_display amdgpu: - add device wedged event - support async page flips on overlay planes - enable broadcast RGB drm property - add info ioctl for virt mode - OEM i2c support for RGB lights - GC 11.5.2 + 11.5.3 support - SDMA 6.1.3 support - NBIO 7.9.1 + 7.11.2 support - MMHUB 1.8.1 + 3.3.2 support - DCN 3.6.0 support - Add dynamic workload profile switching for GC 10-12 - support larger VBIOS sizes - Mark gttsize parameters as deprecated - Initial JPEG queue resset support amdkfd: - add KFD per process flags for setting precision - sync pasid values between KGD and KFD - improve GTT/VRAM handling for APUs - fix user queue validation on GC7/8 - SDMA queue reset support raedeon: - rs400 hyperz fix i2c: - td998x: drop platform_data, split driver into media and bridge ast: - transmitter chip detection refactoring - vbios display mode refactoring - astdp: fix connection status and filter unsupported modes - cursor handling refactoring imagination: - check job dependencies with sched helper ivpu: - improve command queue handling - use workqueue for IRQ handling - add support HW fault injection - locking fixes mgag200: - add support for G200eH5 msm: - dpu: add concurrent writeback support for DPU 10.x+ - use LTTPR helpers - GPU: - Fix obscure GMU suspend failure - Expose syncobj timeline support - Extend GPU devcoredump with pagetable info - a623 support - Fix a6xx gen1/gen2 indexed-register blocks in gpu snapshot / devcoredump - Display: - Add cpu-cfg interconnect paths on SM8560 and SM8650 - Introduce KMS OMMU fault handler, causing devcoredump snapshot - Fixed error pointer dereference in msm_kms_init_aspace() - DPU: - Fix mode_changing handling - Add writeback support on SM6150 (QCS615) - Fix DSC programming in 1:1:1 topology - Reworked hardware resource allocation, moving it to the CRTC code - Enabled support for Concurrent WriteBack (CWB) on SM8650 - Enabled CDM blocks on all relevant platforms - Reworked debugfs interface for BW/clocks debugging - Clear perf params before calculating bw - Support YUV formats on writeback - Fixed double inclusion - Fixed writeback in YUV formats when using cloned output, Dropped wb2_formats_rgb - Corrected dpu_crtc_check_mode_changed and struct dpu_encoder_virt kerneldocs - Fixed uninitialized variable in dpu_crtc_kickoff_clone_mode() - DSI: - DSC-related fixes - Rework clock programming - DSI PHY: - Fix 7nm (and lower) PHY programming - Add proper DT schema definitions for DSI PHY clocks - HDMI: - Rework the driver, enabling the use of the HDMI Connector framework - Bindings: - Added eDP PHY on SA8775P nouveau: - move drm_slave_encoder interface into driver - nvkm: refactor GSP RPC - use LTTPR helpers mediatek: - HDMI fixup and refinement - add MT8188 dsc compatible - MT8365 SoC support panthor: - Expose sizes of intenral BOs via fdinfo - Fix race between reset and suspend - Improve locking qaic: - Add support for AIC200 renesas: - Fix limits in DT bindings rockchip: - support rk3562-mali - rk3576: Add HDMI support - vop2: Add new display modes on RK3588 HDMI0 up to 4K - Don't change HDMI reference clock rate - Fix DT bindings - analogix_dp: add eDP support - fix shutodnw solomon: - Set SPI device table to silence warnings - Fix pixel and scanline encoding v3d: - handle clock vc4: - Use drm_exec - Use dma-resv for wait-BO ioctl - Remove seqno infrastructure virtgpu: - Support partial mappings of GEM objects - Reserve VGA resources during initialization - Fix UAF in virtgpu_dma_buf_free_obj() - Add panic support vkms: - Switch to a managed modesetting pipeline - Add support for ARGB8888 - fix UAf xlnx: - Set correct DMA segment size - use mutex guards - Fix error handling - Fix docs" * tag 'drm-next-2025-03-28' of https://gitlab.freedesktop.org/drm/kernel: (1762 commits) drm/amd/pm: Update feature list for smu_v13_0_6 drm/amdgpu: Add parameter documentation for amdgpu_sync_fence drm/amdgpu/discovery: optionally use fw based ip discovery drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics drm/amdgpu/discovery: check ip_discovery fw file available drm/amd/pm: Remove unnecessay UQ10 to UINT conversion drm/amd/pm: Remove unnecessay UQ10 to UINT conversion drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush drm/amd/amdgpu: Increase max rings to enable SDMA page ring drm/amdgpu: Decode deferred error type in gfx aca bank parser drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs drm/amdgpu/mes: clean up SDMA HQD loop drm/amdgpu/mes: enable compute pipes across all MEC drm/amdgpu/mes: drop MES 10.x leftovers drm/amdgpu/mes: optimize compute loop handling drm/amdgpu/sdma: guilty tracking is per instance drm/amdgpu/sdma: fix engine reset handling drm/amdgpu: remove invalid usage of sched.ready drm/amdgpu: add cleaner shader trace point ...
2025-03-13drm/sched: Fix fence reference count leakqianyi liu1-2/+9
The last_scheduled fence leaks when an entity is being killed and adding the cleanup callback fails. Decrement the reference count of prev when dma_fence_add_callback() fails, ensuring proper balance. Cc: stable@vger.kernel.org # v6.2+ [phasta: add git tag info for stable kernel] Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Signed-off-by: qianyi liu <liuqianyi125@gmail.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250311060251.4041101-1-liuqianyi125@gmail.com
2025-03-12Backmerge tag 'v6.14-rc6' into drm-nextDave Airlie1-2/+2
This is a backmerge from Linux 6.14-rc6, needed for the nova PR. Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-03-05drm/sched: drm_sched_job_cleanup(): correct false docPhilipp Stanner1-5/+7
drm_sched_job_cleanup()'s documentation claims that calling drm_sched_job_arm() is a "point of no return", implying that afterwards a job cannot be cancelled anymore. This is not correct, as proven by the function's code itself, which takes a previous call to drm_sched_job_arm() into account. In truth, the decisive factors are whether fences have been shared (e.g., with other processes) and if the job has been submitted to an entity already. Correct the wrong docstring. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250304141346.102683-2-phasta@kernel.org
2025-03-04drm/sched: stop passing non struct drm_device to drm_err() and friendsJani Nikula2-10/+12
The expectation is that the struct drm_device based logging helpers get passed an actual struct drm_device pointer rather than some random struct pointer where you can dereference the ->dev member. Convert drm_err(sched, ...) to dev_err(sched->dev, ...) and similar. This matches current usage, as struct drm_device is not available, but drops "[drm]" or "[drm] *ERROR*" prefix from logging. Unfortunately, there's no dev_WARN_ON(), so the conversion is not exactly the same. Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fe441dd1469d2b03e6b2ff247078bdde2011c6e3.1737644530.git.jani.nikula@intel.com
2025-03-03drm/sched: Fix preprocessor guardPhilipp Stanner1-2/+2
When writing the header guard for gpu_scheduler_trace.h, a typo, apparently, occurred. Fix the typo and document the scope of the guard. Fixes: 353da3c520b4 ("drm/amdgpu: add tracepoint for scheduler (v2)") Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250218124149.118002-2-phasta@kernel.org
2025-02-24drm/sched: Move internal prototypes to internal headerTvrtko Ursulin2-0/+32
Now that we have a header file for internal scheduler interfaces we can move some more prototypes into it. By doing that we eliminate the chance of drivers trying to use something which was not intended to be used. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-6-tvrtko.ursulin@igalia.com
2025-02-24drm/sched: Move drm_sched_entity_is_ready to internal headerTvrtko Ursulin2-12/+13
Helper is for scheduler internal use so lets hide it from DRM drivers completely. At the same time we change the method of checking whethere there is anything in the queue from peeking to looking at the node count. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-5-tvrtko.ursulin@igalia.com
2025-02-24drm/sched: Add internal job peek/pop APITvrtko Ursulin3-10/+56
Idea is to add helpers for peeking and popping jobs from entities with the goal of decoupling the hidden assumption in the code that queue_node is the first element in struct drm_sched_job. That assumption usually comes in the form of: while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue)))) Which breaks if the queue_node is re-positioned due to_drm_sched_job being implemented with a container_of. This also allows us to remove duplicate definitions of to_drm_sched_job. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <phasta@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250221105038.79665-2-tvrtko.ursulin@igalia.com
2025-02-12drm/sched: Use struct for drm_sched_init() paramsPhilipp Stanner1-33/+17
drm_sched_init() has a great many parameters and upcoming new functionality for the scheduler might add even more. Generally, the great number of parameters reduces readability and has already caused one missnaming, addressed in: commit 6f1cacf4eba7 ("drm/nouveau: Improve variable name in nouveau_sched_init()"). Introduce a new struct for the scheduler init parameters and port all users. Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Matthew Brost <matthew.brost@intel.com> # for Xe Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # for Panfrost and Panthor Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> # for Etnaviv Reviewed-by: Frank Binns <frank.binns@imgtec.com> # for Imagination Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> # for Sched Reviewed-by: Maíra Canal <mcanal@igalia.com> # for v3d Reviewed-by: Danilo Krummrich <dakr@kernel.org> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> # for amdxdna Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250211111422.21235-2-phasta@kernel.org
2025-01-20drm/sched: Add helper to check job dependenciesTvrtko Ursulin1-0/+23
Lets isolate scheduler internals from drivers such as pvr which currently walks the dependency array to look for fences. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Matt Coster <matt.coster@imgtec.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250113103341.43914-1-tvrtko.ursulin@igalia.com
2025-01-16drm/sched: Remove weak paused submission checksTvrtko Ursulin1-6/+0
There is no need to check the boolean in the work item's prologues since the boolean can be set at any later time anyway. The helper which pauses submission sets it and synchronously cancels the work and helpers which queue the work check for the flag so all should be good. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250114105942.64832-1-tvrtko.ursulin@igalia.com
2025-01-13drm/sched: Delete unused update_job_creditsTvrtko Ursulin1-13/+0
No driver is using the update_job_credits() schduler vfunc so lets remove it. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Matt Coster <matt.coster@imgtec.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250110111301.76909-1-tvrtko.ursulin@igalia.com
2024-12-19drm/sched: Fix drm_sched_fini() docu generationBagas Sanjaya1-1/+2
Commit baf4afc5831438 ("drm/sched: Improve teardown documentation") added a list of drm_sched_fini()'s problems. The list triggers htmldocs warning (but renders correctly in htmldocs output): Documentation/gpu/drm-mm:571: ./drivers/gpu/drm/scheduler/sched_main.c:1359: ERROR: Unexpected indentation. Separate the list from the preceding paragraph by a blank line to fix the warning. While at it, also end the aforementioned paragraph by a colon. Fixes: baf4afc58314 ("drm/sched: Improve teardown documentation") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20241108175655.6d3fcfb7@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> [phasta: Adjust commit message] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241217034915.62594-1-bagasdotme@gmail.com
2024-12-19drm/sched: Fix drm_sched_fini() docu generationBagas Sanjaya1-1/+2
Commit baf4afc5831438 ("drm/sched: Improve teardown documentation") documents problems of drm_sched_fini() in form of a list. The checklist triggers htmldocs warning (but renders correctly in htmldocs output): Documentation/gpu/drm-mm:571: ./drivers/gpu/drm/scheduler/sched_main.c:1359: ERROR: Unexpected indentation. Separate the list from the preceding paragraph by a blank line to fix the warning. While at it, also end the aforementioned paragraph by a colon. Fixes: baf4afc58314 ("drm/sched: Improve teardown documentation") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20241108175655.6d3fcfb7@canb.auug.org.au/ Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> [phasta: Adjust commit message] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241217034915.62594-1-bagasdotme@gmail.com
2024-11-07drm/sched: Improve teardown documentationPhilipp Stanner1-2/+21
If jobs are still enqueued in struct drm_gpu_scheduler.pending_list when drm_sched_fini() gets called, those jobs will be leaked since that function stops both job-submission and (automatic) job-cleanup. It is, thus, up to the driver to take care of preventing leaks. The related function drm_sched_wqueue_stop() also prevents automatic job cleanup. Those pitfals are not reflected in the documentation, currently. Explicitly inform about the leak problem in the docstring of drm_sched_fini(). Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and hint at the consequences for automatic cleanup. Signed-off-by: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105143137.71893-2-pstanner@redhat.com
2024-11-04Backmerge v6.12-rc6 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-nextDave Airlie1-1/+13
Backmerge Linus tree for some drm-fixes needed for msm and xe merges. Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-10-31drm/sched: Document purpose of drm_sched_{start,stop}Philipp Stanner1-1/+7
drm_sched_start()'s and drm_sched_stop()'s names suggest that those functions might be intended for actively starting and stopping the scheduler on initialization and teardown. They are, however, only used on timeout handling (reset recovery). The docstrings should reflect that to prevent confusion. Document those functions' purpose. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029133819.78696-2-pstanner@redhat.com
2024-10-28drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIMMatthew Brost1-2/+3
drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler work queue with WQ_MEM_RECLAIM to ensure forward progress during reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward progress during reclaim. v2: - Fixes tags (Philipp) - Reword commit message (Philipp) Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Philipp Stanner <pstanner@redhat.com> Cc: stable@vger.kernel.org Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for submit_wq") Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work queue rather than kthread") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023235917.1836428-1-matthew.brost@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-10-25drm/sched: warn about drm_sched_job_init()'s partial initPhilipp Stanner1-0/+4
drm_sched_job_init()'s name suggests that after the function succeeded, parameter "job" will be fully initialized. This is not the case; some members are only later set, notably drm_sched_job.sched by drm_sched_job_arm(). Document that drm_sched_job_init() does not set all struct members. Document the lifetime of drm_sched_job.sched. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023141530.113370-2-pstanner@redhat.com
2024-10-22drm/sched: memset() 'job' in drm_sched_job_init()Philipp Stanner1-0/+8
drm_sched_job_init() has no control over how users allocate struct drm_sched_job. Unfortunately, the function can also not set some struct members such as job->sched. This could theoretically lead to UB by users dereferencing the struct's pointer members too early. It is easier to debug such issues if these pointers are initialized to NULL, so dereferencing them causes a NULL pointer exception. Accordingly, drm_sched_entity_init() does precisely that and initializes its struct with memset(). Initialize parameter "job" to 0 in drm_sched_job_init(). Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241021105028.19794-2-pstanner@redhat.com Reviewed-by: Christian König <christian.koenig@amd.com>
2024-10-17drm/sched: Further optimise drm_sched_entity_push_jobTvrtko Ursulin2-17/+23
Having removed one re-lock cycle on the entity->lock in a patch titled "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny bit larger refactoring we can do the same optimisation on the rq->lock. (Currently both drm_sched_rq_add_entity() and drm_sched_rq_update_fifo_locked() take and release the same lock.) To achieve this we make drm_sched_rq_update_fifo_locked() and drm_sched_rq_add_entity() expect the rq->lock to be held. We also align drm_sched_rq_update_fifo_locked(), drm_sched_rq_add_entity() and drm_sched_rq_remove_fifo_locked() function signatures, by adding rq as a parameter to the latter. v2: * Fix after rebase of the series. * Avoid naming inconsistency between drm_sched_rq_add/remove. (Christian) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-6-tursulin@igalia.com
2024-10-17drm/sched: Re-group and rename the entity run-queue lockTvrtko Ursulin2-15/+15
When writing to a drm_sched_entity's run-queue, writers are protected through the lock drm_sched_entity.rq_lock. This naming, however, frequently collides with the separate internal lock of struct drm_sched_rq, resulting in uses like this: spin_lock(&entity->rq_lock); spin_lock(&entity->rq->lock); Rename drm_sched_entity.rq_lock to improve readability. While at it, re-order that struct's members to make it more obvious what the lock protects. v2: * Rename some rq_lock straddlers in kerneldoc, improve commit text. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> [pstanner: Fix typo in docstring] Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-5-tursulin@igalia.com
2024-10-17drm/sched: Stop setting current entity in FIFO modeTvrtko Ursulin1-1/+0
It does not seem there is a need to set the current entity in FIFO mode since ot only serves as being a "cursor" in round-robin mode. Even if scheduling mode is changed at runtime the change in behaviour is simply to restart from the first entity, instead of continuing in RR mode from where FIFO left it, and that sounds completely fine. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-3-tursulin@igalia.com
2024-10-17drm/sched: Optimise drm_sched_entity_push_jobTvrtko Ursulin2-7/+12
In FIFO mode (which is the default), both drm_sched_entity_push_job() and drm_sched_rq_update_fifo(), where the latter calls the former, are currently taking and releasing the same entity->rq_lock. We can avoid that design inelegance, and also have a miniscule efficiency improvement on the submit from idle path, by introducing a new drm_sched_rq_update_fifo_locked() helper and pulling up the lock taking to its callers. v2: * Remove drm_sched_rq_update_fifo() altogether. (Christian) v3: * Improved commit message. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Philipp Stanner <pstanner@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241016122013.7857-2-tursulin@igalia.com
2024-10-09Merge tag 'drm-misc-next-2024-09-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-nextDave Airlie2-8/+6
drm-misc-next for v6.13: UAPI Changes: - panthor: Add realtime group priority and priority query. Cross-subsystem Changes: - Add Vivek Kasireddy as udmabuf maintainer. - Assorted udmabuf changes. - Device tree binding updates. - dmabuf documentation fixes. - Move drm_rect to drm core module from kms helper. Core Changes: - Update scheduler documentation and concurrency fixes. - drm/ci updates. - Add memory-agnostic fbdev client and client-agnostic setup helper. - Huge driver conversion for using the above. Driver Changes: - Assorted fixes to imx, panel/nt35510, sti, accel/ivpu, v3d, vkms, host1x. - Add panel quirks for AYA NEO panels. - Make module autoloading work for bridge/it6505 and mcde. - Add huge page support to v3d using a custom shmfs. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a9b95e6f-9f35-464e-83f6-bda75b35ee0b@linux.intel.com
2024-10-09Merge tag 'drm-misc-next-2024-09-20' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-nextDave Airlie1-3/+4
drm-misc-next for v6.12: UAPI Changes: - Add panthor/DEV_QUERY_TIMESTAMP_INFO query. Cross-subsystem Changes: - Updated dt bindings. - Add documentation explaining default errnos for fences. - Mark dma-buf heaps creation functions as __init. Core Changes: - Split DSC helpers from DP helpers. - Clang build fixes for drm/mm test. - Remove simple pipeline support for gem-vram, no longer any users left after converting bochs. - Add erno to drm_sched_start to distinguish between GPU and queue reset. - Add drm_framebuffer testcases. - Fix uninitialized spinlock acquisition with CONFIG_DRM_PANIC=n. - Use read_trylock instead of read_lock in dma_fence_begin_signalling to quiesce lockdep. Driver Changes: - Assorted small fixes and updates for tegra, host1x, imagination, nouveau, panfrost, panthor, panel/ili9341, mali, exynos, panel/samsung-s6e3fa7, ast, bridge/ti-sn65dsi86, panel/himax-hx83112a, bridge/tc358767, bridge/imx8mp-hdmi-tx, panel/khadas-ts050, panel/nt36523, panel/sony-acx565akm, kmb, accel/qaic, omap, v3d. - Add bridge/TI TDP158. - Assorted documentation updates. - Convert bochs from simple drm to gem shmem, and check modes against available memory. - Many VC4 fixes, most related to scaling and YUV support. - Convert some drivers to use SYSTEM_SLEEP_PM_OPS and RUNTIME_PM_OPS. - Rockchip 4k@60 support. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/445713a6-2427-4c53-8ec2-3a894ec62405@linux.intel.com
2024-10-02drm/sched: Use drm sched lockdep map for submit_wqMatthew Brost1-0/+11
Avoid leaking a lockdep map on each drm sched creation and destruction by using a single lockdep map for all drm sched allocated submit_wq. v2: - Use alloc_ordered_workqueue_lockdep_map (Tejun) Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241002131639.3425022-2-matthew.brost@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-10-01Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixesMaarten Lankhorst1-17/+8
Required for a panthor fix that broke when FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-10-01Merge tag 'drm-misc-fixes-2024-09-26' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixesDave Airlie2-7/+4
Short summary of fixes pull: atomic: - Use correct type when reading damage rectangles display: - Fix kernel docs dp-mst: - Fix DSC decompression detection hdmi: - Fix infoframe size panthor: - Fix locking sched: - Update maintainers - Fix race condition whne queueing up jobs sysfb: - Disable sysfb if framebuffer parent device is unknown vbox: - Fix VLA handling Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240926121045.GA561653@localhost.localdomain
2024-09-30drm/sched: revert "Always increment correct scheduler score"Christian König1-1/+1
This reverts commit 087913e0ba2b3b9d7ccbafb2acf5dab9e35ae1d5. It turned out that the original code was correct since the rq can only change when there is no armed job for an entity. This change here broke the logic since we only incremented the counter for the first job, so revert it. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240930131451.536150-1-christian.koenig@amd.com
2024-09-26drm/sched: Always increment correct scheduler scoreTvrtko Ursulin1-1/+1
Entities run queue can change during drm_sched_entity_push_job() so make sure to update the score consistently. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: d41a39dda140 ("drm/scheduler: improve job distribution with multiple queues") Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.9+ Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924101914.2713-4-tursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-09-26drm/sched: Always wake up correct scheduler in drm_sched_entity_push_jobTvrtko Ursulin1-2/+8
Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more problematic since the same lock is taken inside drm_sched_rq_update_fifo(). v2: * Improve commit message. (Philipp) * Cache the scheduler pointer directly. (Christian) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Philipp Stanner <pstanner@redhat.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924101914.2713-3-tursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-09-26drm/sched: Add locking to drm_sched_entity_modify_schedTvrtko Ursulin1-0/+2
Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * Improve commit message. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: Philipp Stanner <pstanner@redhat.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240913160559.49054-2-tursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-09-24drm/sched: Add locking to drm_sched_entity_modify_schedTvrtko Ursulin1-0/+2
Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * Improve commit message. (Philipp) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: b37aced31eb0 ("drm/scheduler: implement a function to modify sched list") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: Philipp Stanner <pstanner@redhat.com> Cc: <stable@vger.kernel.org> # v5.7+ Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240924101914.2713-2-tursulin@igalia.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-09-24drm/scheduler: Improve documentationShuicheng Lin2-8/+6
Function drm_sched_entity_push_job() doesn't have a return value, remove the return value description for it. Correct several other typo errors. v2 (Philipp): - more correction with related comments. Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240917144732.2758572-1-shuicheng.lin@intel.com
2024-09-24drm/sched: Fix dynamic job-flow control raceRob Clark2-7/+4
Fixes a race condition reported here: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2238968609 The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <lina@asahilina.net> Fixes: a78422e9dff3 ("drm/sched: implement dynamic job-flow control") Closes: https://github.com/AsahiLinux/linux/issues/309 Cc: stable@vger.kernel.org Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Tested-by: Janne Grunau <j@jannau.net> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240913202301.16772-1-robdclark@gmail.com
2024-09-06drm/sched: add optional errno to drm_sched_start()Christian König1-3/+4
The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery methods, whether a queue reset or a full GPU reset was used. To improve this, we first try a soft recovery for timeout jobs and use the error code -ENODATA. If soft recovery fails, we proceed with a queue reset, where the error code remains -ENODATA for the job. Finally, for a full GPU reset, we use error codes -ECANCELED or -ETIME. This patch adds an error code parameter to drm_sched_start, allowing us to differentiate between queue reset and GPU reset failures. This enables user mode and test applications to validate the expected correctness of the requested operation. After a successful queue reset, the only way to continue normal operation is to call drm_sched_job_done with the specific error code -ENODATA. v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain and amdgpu_device_unlock_reset_domain to allow user mode to track the queue reset status and distinguish between queue reset and GPU reset. v2: Christian suggested using the error codes -ENODATA for queue reset and -ECANCELED or -ETIME for GPU reset, returned to amdgpu_cs_wait_ioctl. v3: To meet the requirements, we introduce a new function drm_sched_start_ex with an additional parameter to set dma_fence_set_error, allowing us to handle the specific error codes appropriately and dispose of bad jobs with the selected error code depending on whether it was a queue reset or GPU reset. v4: Alex suggested using a new name, drm_sched_start_with_recovery_error, which more accurately describes the function's purpose. Additionally, it was recommended to add documentation details about the new method. v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex) v6 (chk): rebase on upstream changes, cleanup the commit message, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences v7 (chk): rebased Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com
2024-07-25drm/scheduler: remove full_recover from drm_sched_startChristian König1-17/+8
This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com
2024-05-22tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)1-2/+2
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-03-25Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann2-8/+7
Backmerging to get drm-misc-fixes to the state of v6.9-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-03-15drm/sched: fix null-ptr-deref in init entityVitaly Prosyak1-3/+9
The bug can be triggered by sending an amdgpu_cs_wait_ioctl to the AMDGPU DRM driver on any ASICs with valid context. The bug was reported by Joonkyo Jung <joonkyoj@yonsei.ac.kr>. For example the following code: static void Syzkaller2(int fd) { union drm_amdgpu_ctx arg1; union drm_amdgpu_wait_cs arg2; arg1.in.op = AMDGPU_CTX_OP_ALLOC_CTX; ret = drmIoctl(fd, 0x140106442 /* amdgpu_ctx_ioctl */, &arg1); arg2.in.handle = 0x0; arg2.in.timeout = 0x2000000000000; arg2.in.ip_type = AMD_IP_VPE /* 0x9 */; arg2->in.ip_instance = 0x0; arg2.in.ring = 0x0; arg2.in.ctx_id = arg1.out.alloc.ctx_id; drmIoctl(fd, 0xc0206449 /* AMDGPU_WAIT_CS * /, &arg2); } The ioctl AMDGPU_WAIT_CS without previously submitted job could be assumed that the error should be returned, but the following commit 1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa modified the logic and allowed to have sched_rq equal to NULL. As a result when there is no job the ioctl AMDGPU_WAIT_CS returns success. The change fixes null-ptr-deref in init entity and the stack below demonstrates the error condition: [ +0.000007] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ +0.007086] #PF: supervisor read access in kernel mode [ +0.005234] #PF: error_code(0x0000) - not-present page [ +0.005232] PGD 0 P4D 0 [ +0.002501] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ +0.005034] CPU: 10 PID: 9229 Comm: amd_basic Tainted: G B W L 6.7.0+ #4 [ +0.007797] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.009798] RIP: 0010:drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.006426] Code: 80 00 00 00 00 00 00 00 e8 1a 81 82 e0 49 89 9c 24 c0 00 00 00 4c 89 ef e8 4a 80 82 e0 49 8b 5d 00 48 8d 7b 28 e8 3d 80 82 e0 <48> 83 7b 28 00 0f 84 28 01 00 00 4d 8d ac 24 98 00 00 00 49 8d 5c [ +0.019094] RSP: 0018:ffffc90014c1fa40 EFLAGS: 00010282 [ +0.005237] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff8113f3fa [ +0.007326] RDX: fffffbfff0a7889d RSI: 0000000000000008 RDI: ffffffff853c44e0 [ +0.007264] RBP: ffffc90014c1fa80 R08: 0000000000000001 R09: fffffbfff0a7889c [ +0.007266] R10: ffffffff853c44e7 R11: 0000000000000001 R12: ffff8881a719b010 [ +0.007263] R13: ffff88810d412748 R14: 0000000000000002 R15: 0000000000000000 [ +0.007264] FS: 00007ffff7045540(0000) GS:ffff8883cc900000(0000) knlGS:0000000000000000 [ +0.008236] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.005851] CR2: 0000000000000028 CR3: 000000011912e000 CR4: 0000000000350ef0 [ +0.007175] Call Trace: [ +0.002561] <TASK> [ +0.002141] ? show_regs+0x6a/0x80 [ +0.003473] ? __die+0x25/0x70 [ +0.003124] ? page_fault_oops+0x214/0x720 [ +0.004179] ? preempt_count_sub+0x18/0xc0 [ +0.004093] ? __pfx_page_fault_oops+0x10/0x10 [ +0.004590] ? srso_return_thunk+0x5/0x5f [ +0.004000] ? vprintk_default+0x1d/0x30 [ +0.004063] ? srso_return_thunk+0x5/0x5f [ +0.004087] ? vprintk+0x5c/0x90 [ +0.003296] ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.005807] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _printk+0xb3/0xe0 [ +0.003293] ? __pfx__printk+0x10/0x10 [ +0.003735] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ +0.005482] ? do_user_addr_fault+0x345/0x770 [ +0.004361] ? exc_page_fault+0x64/0xf0 [ +0.003972] ? asm_exc_page_fault+0x27/0x30 [ +0.004271] ? add_taint+0x2a/0xa0 [ +0.003476] ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.005812] amdgpu_ctx_get_entity+0x3f9/0x770 [amdgpu] [ +0.009530] ? finish_task_switch.isra.0+0x129/0x470 [ +0.005068] ? __pfx_amdgpu_ctx_get_entity+0x10/0x10 [amdgpu] [ +0.010063] ? __kasan_check_write+0x14/0x20 [ +0.004356] ? srso_return_thunk+0x5/0x5f [ +0.004001] ? mutex_unlock+0x81/0xd0 [ +0.003802] ? srso_return_thunk+0x5/0x5f [ +0.004096] amdgpu_cs_wait_ioctl+0xf6/0x270 [amdgpu] [ +0.009355] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009981] ? srso_return_thunk+0x5/0x5f [ +0.004089] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? __srcu_read_lock+0x20/0x50 [ +0.004096] drm_ioctl_kernel+0x140/0x1f0 [drm] [ +0.005080] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009974] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ +0.005618] ? srso_return_thunk+0x5/0x5f [ +0.004088] ? __kasan_check_write+0x14/0x20 [ +0.004357] drm_ioctl+0x3da/0x730 [drm] [ +0.004461] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009979] ? __pfx_drm_ioctl+0x10/0x10 [drm] [ +0.004993] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? __kasan_check_write+0x14/0x20 [ +0.004356] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _raw_spin_lock_irqsave+0x99/0x100 [ +0.004712] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.005063] ? __pfx_arch_do_signal_or_restart+0x10/0x10 [ +0.005477] ? srso_return_thunk+0x5/0x5f [ +0.004000] ? preempt_count_sub+0x18/0xc0 [ +0.004237] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ +0.005069] amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu] [ +0.008912] __x64_sys_ioctl+0xcd/0x110 [ +0.003918] do_syscall_64+0x5f/0xe0 [ +0.003649] ? noist_exc_debug+0xe6/0x120 [ +0.004095] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ +0.005150] RIP: 0033:0x7ffff7b1a94f [ +0.003647] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ +0.019097] RSP: 002b:00007fffffffe0a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.007708] RAX: ffffffffffffffda RBX: 000055555558b360 RCX: 00007ffff7b1a94f [ +0.007176] RDX: 000055555558b360 RSI: 00000000c0206449 RDI: 0000000000000003 [ +0.007326] RBP: 00000000c0206449 R08: 000055555556ded0 R09: 000000007fffffff [ +0.007176] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5d8 [ +0.007238] R13: 0000000000000003 R14: 000055555555cba8 R15: 00007ffff7ffd040 [ +0.007250] </TASK> v2: Reworked check to guard against null ptr deref and added helpful comments (Christian) Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Joonkyo Jung <joonkyoj@yonsei.ac.kr> Cc: Dokyung Song <dokyungs@yonsei.ac.kr> Cc: <jisoo.jang@yonsei.ac.kr> Cc: <yw9865@yonsei.ac.kr> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Link: https://patchwork.freedesktop.org/patch/msgid/20240315023926.343164-1-vitaly.prosyak@amd.com Signed-off-by: Christian König <christian.koenig@amd.com>
2024-02-28drm/scheduler: Simplify the allocation of slab caches in drm_sched_fence_slab_initKunwu Chan1-3/+1
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20240221085558.166774-1-chentao@kylinos.cn
2024-02-26Merge v6.8-rc6 into drm-nextDaniel Vetter1-6/+9
Thomas Zimmermann asked to backmerge -rc6 for drm-misc branches, there's a few same-area-changed conflicts (xe and amdgpu mostly) that are getting a bit too annoying. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-02-06drm/sched: Re-queue run job worker when drm_sched_entity_pop_job() returns NULLMatthew Brost1-6/+9
Rather then loop over entities until one with a ready job is found, re-queue the run job worker when drm_sched_entity_pop_job() returns NULL. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 66dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker") Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240130030413.2031009-1-matthew.brost@intel.com
2024-02-05Merge tag 'drm-misc-next-2024-01-11' of git://anongit.freedesktop.org/drm/drm-misc into drm-nextDave Airlie1-5/+6
drm-misc-next for v6.9: UAPI Changes: virtio: - add Venus capset defines Cross-subsystem Changes: Core Changes: - fix drm_fixp2int_ceil() - documentation fixes - clean ups - allow DRM_MM_DEBUG with DRM=m - build fixes for debugfs support - EDID cleanups - sched: error-handling fixes - ttm: add tests Driver Changes: bridge: - ite-6505: fix DP link-training bug - samsung-dsim: fix error checking in probe - tc358767: fix regmap usage efifb: - use copy of global screen_info state hisilicon: - fix EDID includes mgag200: - improve ioremap usage - convert to struct drm_edid nouveau: - disp: use kmemdup() - fix EDID includes - documentation fixes panel: - ltk050h3146w: error-handling fixes - panel-edp: support delay between power-on and enable; use put_sync in unprepare; support Mediatek MT8173 Chromebooks, BOE NV116WHM-N49 V8.0, BOE NV122WUM-N41, CSO MNC207QS1-1 plus DT bindings - panel-lvds: support EDT ETML0700Z9NDHA plus DT bindings - panel-novatek: FRIDA FRD400B25025-A-CTK plus DT bindings qaic: - fixes to BO handling - make use of DRM managed release - fix order of remove operations rockchip: - analogix_dp: get encoder port from DT - inno_hdmi: support HDMI for RK3128 - lvds: error-handling fixes simplefb: - fix logging ssd130x: - support SSD133x plus DT bindings tegra: - fix error handling tilcdc: - make use of DRM managed release v3d: - show memory stats in debugfs vc4: - fix error handling in plane prepare_fb - fix framebuffer test in plane helpers vesafb: - use copy of global screen_info state virtio: - cleanups vkms: - fix OOB access when programming the LUT - Kconfig improvements vmwgfx: - unmap surface before changing plane state - fix memory leak in error handling - documentation fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240111154902.GA8448@linux-uq9g
2024-01-26drm/sched: Drain all entities in DRM sched run job workerMatthew Brost1-8/+7
All entities must be drained in the DRM scheduler run job worker to avoid the following case. An entity found that is ready, no job found ready on entity, and run job worker goes idle with other entities + jobs ready. Draining all ready entities (i.e. loop over all ready entities) in the run job worker ensures all job that are ready will be scheduled. Cc: Thorsten Leemhuis <regressions@leemhuis.info> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Closes: https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuOWtoeeE+q26zE+Q@mail.gmail.com/ Reported-and-tested-by: Mario Limonciello <mario.limonciello@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124 Link: https://lore.kernel.org/all/20240123021155.2775-1-mario.limonciello@amd.com/ Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz> Closes: https://lore.kernel.org/dri-devel/05ddb2da-b182-4791-8ef7-82179fd159a8@amd.com/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240124210811.1639040-1-matthew.brost@intel.com
2024-01-07drm/sched: Return an error code only as a constant in drm_sched_init()Markus Elfring1-3/+3
Return an error code without storing it in an intermediate variable. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patchwork.freedesktop.org/patch/msgid/85f8004e-f0c9-42d9-8c59-30f1b4e0b89e@web.de Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2024-01-07drm/sched: One function call less in drm_sched_init() after error detectionMarkus Elfring1-2/+3
The kfree() function was called in one case by the drm_sched_init() function during error handling even if the passed data structure member contained a null pointer. This issue was detected by using the Coccinelle software. Thus adjust a jump target. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patchwork.freedesktop.org/patch/msgid/85066512-983d-480c-a44d-32405ab1b80e@web.de Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
2023-11-28drm/sched: Partial revert of "Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"Bert Karwatzki1-3/+2
Commit f3123c2590005c, in combination with the use of work queues by the GPU scheduler, leads to random lock-ups of the GUI. This is a partial revert of of commit f3123c2590005c since drm_sched_wakeup() still needs its entity argument to pass it to drm_sched_can_queue(). Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2994 Link: https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html Signed-off-by: Bert Karwatzki <spasswolf@web.de> Link: https://patchwork.freedesktop.org/patch/msgid/20231127160955.87879-1-spasswolf@web.de Link: https://lore.kernel.org/r/36bece178ff5dc705065e53d1e5e41f6db6d87e4.camel@web.de Fixes: f3123c2590005c ("drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()") Reviewed-by: Luben Tuikov <ltuikov89@gmail.com> Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>