aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2026-05-20platform/x86: fujitsu: Check ACPI_COMPANION() against NULLRafael J. Wysocki1-2/+10
Every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), so platform drivers that rely on the existence of a device's ACPI companion object need to verify its presence. Accordingly, add requisite ACPI_COMPANION() checks against NULL to the platform/x86 fujitsu-laptop driver. Fixes: 6da22b031a3c ("platform/x86: fujitsu: Convert laptop driver to a platform one") Fixes: d5c9212ccfaa ("platform/x86: fujitsu: Convert backlight driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jonathan Woithe <jwoithe@just42.net> Link: https://patch.msgid.link/3430329.44csPzL39Z@rafael.j.wysocki Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-05-20platform/x86: eeepc-laptop: Check ACPI_COMPANION() against NULLRafael J. Wysocki1-1/+5
Every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), so platform drivers that rely on the existence of a device's ACPI companion object need to verify its presence. Accordingly, add a requisite ACPI_COMPANION() check against NULL to the platform/x86 eeepc-laptop driver. Fixes: 079b59fd2d79 ("platform/x86: eeepc-laptop: Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/3056852.e9J7NaK4W3@rafael.j.wysocki Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-05-20platform/x86: dell/dell-rbtn: Check ACPI_COMPANION() against NULLRafael J. Wysocki1-1/+5
Every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), so platform drivers that rely on the existence of a device's ACPI companion object need to verify its presence. Accordingly, add a requisite ACPI_COMPANION() check against NULL to the platform/x86 dell-rbtn driver. Fixes: 19ebacfb442b ("platform/x86: dell/dell-rbtn: Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/2276487.irdbgypaU6@rafael.j.wysocki Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-05-20platform/x86: asus-laptop: Check ACPI_COMPANION() against NULLRafael J. Wysocki1-1/+5
Every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), so platform drivers that rely on the existence of a device's ACPI companion object need to verify its presence. Accordingly, add a requisite ACPI_COMPANION() check against NULL to the platform/x86 asus-laptop driver. Fixes: ba19eb10170b ("platform/x86: asus-laptop: Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/5083741.GXAFRqVoOG@rafael.j.wysocki Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-05-20platform/x86: acer-wireless: Check ACPI_COMPANION() against NULLRafael J. Wysocki1-2/+6
Every platform driver can be forced to match a device that doesn't match its list of device IDs because of device_match_driver_override(), so platform drivers that rely on the existence of a device's ACPI companion object need to verify its presence. Accordingly, add a requisite ACPI_COMPANION() check against NULL to the platform/x86 acer-wireless driver. Fixes: f7e648027d7e ("platform/x86: acer-wireless: Convert ACPI driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/4746824.LvFx2qVVIh@rafael.j.wysocki Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-05-20Merge tag 'ath-current-20260519' of git://git.kernel.org/pub/scm/linux/kernel/git/ath/athJohannes Berg7-40/+143
Jeff Johnson says: ================== ath.git update for v7.1-rc5 ath10k: - avoid sending any commands to firmware when it is wedged ath11k: - fix WMI buffer leaks on error conditions - fix UAF in RX MSDU coalesce path - allow peer ID 0 on RX path (legal for mobile devices) - reinitialize shared SRNG pointers on restart ath12k: - fix 20 MHz-only parsing of EHT-MCS map ================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20Merge tag 'iwlwifi-fixes-2026-05-16' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-nextJohannes Berg7-35/+50
Miri Korenblit says: ==================== wifi: iwlwifi: fixes - 2026-05-16 Contains: wifi: iwlwifi: mld: fix TSO segmentation explosion when AMSDU is disabled wifi: iwlwifi: mld: stop TX during firmware restart wifi: iwlwifi: mld: don't WARN on WoWLAN suspend w/o BSS vif wifi: iwlwifi: mvm: fix driver-set TX rates on old devices wifi: iwlwifi: mld: disconnect only after 6 beacons without Rx wifi: iwlwifi: mld: don't dereference a pointer before NULL checking it wifi: iwlwifi: use correct function to read STEP_URM register ==================== Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-20wifi: wilc1000: fix dma_buffer leak on bus acquire failureShitalkumar Gandhi1-1/+1
wilc_wlan_firmware_download() allocates dma_buffer with kmalloc() at the top of the function and uses a 'fail:' label to free it via kfree(dma_buffer) on error. All later error paths correctly use 'goto fail' to route through this cleanup. However, the early failure path after the first acquire_bus() call uses a bare 'return ret;', which leaks dma_buffer whenever the bus acquire fails. Replace the early return with goto fail so the existing cleanup path runs. Found via a custom Coccinelle semantic patch hunting for kmalloc'd locals leaked on early-return error paths in driver firmware-download code. Fixes: 1241c5650ff7 ("wifi: wilc1000: Fill in missing error handling") Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260511042732.998311-1-shitalkumar.gandhi@cambiumnetworks.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-19pds_core: fix debugfs_lookup dentry leak and error handlingNikhil P. Rao1-1/+6
debugfs_lookup() returns a dentry with an elevated reference count that must be released with dput(). The current code discards the returned dentry without calling dput(), causing a reference leak on every firmware reset recovery. Additionally, when CONFIG_DEBUG_FS is disabled, debugfs_lookup() returns ERR_PTR(-ENODEV), not NULL. The current check passes for error pointers and would call dput() on an invalid pointer, causing a crash. Fixes: bc90fbe0c318 ("pds_core: Rework teardown/setup flow to be more common") Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260515212907.998028-3-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19pds_core: fix error handling in pdsc_devcmd_waitNikhil P. Rao1-2/+9
Fix two cases where pdsc_devcmd_wait() returns stale success from the completion register instead of an error: 1. FW crash: If firmware stops running, the wait loop breaks early with running=false. The condition "if ((!done || timeout) && running)" is false, so error handling is bypassed and stale status is returned. Check !running first and return -ENXIO. 2. Timeout: If a command times out, err is set to -ETIMEDOUT but then overwritten by pdsc_err_to_errno(status) which reads stale status. Return -ETIMEDOUT immediately after cleaning up. Both errors now propagate to pdsc_devcmd_locked() which queues health_work for recovery. Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260515212907.998028-1-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19net: phy: honor eee_disabled_modes in phy_advertise_eee_all()Nicolai Buchwitz1-1/+2
phy_advertise_eee_all() copies supported_eee into advertising_eee unconditionally, overwriting any filtering applied during phy_probe() based on DT eee-broken-* properties or driver-populated eee_disabled_modes. genphy_c45_ethtool_set_eee() calls this helper when user space passes an empty advertisement, undoing the filtering. Apply the same eee_disabled_modes mask in phy_advertise_eee_all() so the filtering survives the copy, matching the pattern in phy_probe() and phy_support_eee(). Fixes: b64691274f5d ("net: phy: add helper phy_advertise_eee_all") Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260518-devel-phy-support-eee-fix-v2-2-05b52626fa68@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19net: phy: honor eee_disabled_modes in phy_support_eee()Nicolai Buchwitz1-1/+2
phy_support_eee() copies supported_eee into advertising_eee unconditionally, overwriting any filtering applied during phy_probe() based on DT eee-broken-* properties or driver-populated eee_disabled_modes. MAC drivers that call phy_support_eee() after probe (e.g. bcmgenet, fec, lan743x, lan78xx, r8169) then cause the PHY to advertise EEE for modes the user marked as broken. The symptom is that ethtool --show-eee on the local interface reports "not supported" (supported & ~eee_disabled_modes is empty) while the link partner sees EEE negotiated and active. phy_probe() already filters advertising_eee via eee_disabled_modes after calling of_set_phy_eee_broken(). Apply the same mask in phy_support_eee() so the filtering survives the copy. Fixes: 49168d1980e2 ("net: phy: Add phy_support_eee() indicating MAC support EEE") Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260518-devel-phy-support-eee-fix-v2-1-05b52626fa68@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19net: phy: skip EEE advertisement write when autoneg is disabledNerijus Bendžiūnas1-0/+8
genphy_c45_an_config_eee_aneg() writes the EEE advertisement to the auto-negotiation device's MMD register space (MDIO_MMD_AN, register MDIO_AN_EEE_ADV). These registers are read by the link partner only during auto-negotiation, so writing them while autoneg is disabled cannot influence the link. On some PHYs (e.g. Broadcom BCM54213PE) the write nevertheless reaches the chip and disturbs the receive datapath. Concretely, running ethtool -s eth0 speed 100 duplex full autoneg off ethtool --set-eee eth0 eee off leaves eth0 with TX working and RX completely silent on a Raspberry Pi 4 / CM4 board (bcmgenet + BCM54213PE in rgmii-rxid). Switching back to autoneg recovers the link. Prior to commit f26a29a038ee ("net: phy: ensure that genphy_c45_an_config_eee_aneg() sees new value of phydev->eee_cfg.eee_enabled"), the disable path was effectively a no-op because the helper read the stale eee_cfg.eee_enabled, so the underlying PHY behavior never surfaced. Bisected on rpi-6.12.y between commits 83943264 (good) and effcbc88 (bad) to f26a29a038ee. Fixes: f26a29a038ee ("net: phy: ensure that genphy_c45_an_config_eee_aneg() sees new value of phydev->eee_cfg.eee_enabled") Cc: stable@vger.kernel.org Signed-off-by: Nerijus Bendžiūnas <nerijus.bendziunas@gmail.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Tested-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260516150251.879680-1-nerijus.bendziunas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19octeontx2-pf: avoid double free of pool->stack on AQ init failureDawei Feng2-0/+4
otx2_pool_aq_init() frees pool->stack when mailbox sync or retry allocation fails, but leaves the pointer unchanged. Later, otx2_sq_aura_pool_init() unwinds the partial setup through otx2_aura_pool_free(), which frees pool->stack again. The CN20K-specific cn20k_pool_aq_init() implementation has the same bug in its corresponding error path. Set pool->stack to NULL immediately after the local free so the shared cleanup path does not free the same stack again while cleaning up partially initialized pool state. The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. Manual inspection confirms that the bug is still present in v7.1-rc3. Runtime validation was not performed because reproducing this path requires OcteonTX2/CN20K hardware. Fixes: caa2da34fd25 ("octeontx2-pf: Initialize and config queues") Fixes: d322fbd17203 ("octeontx2-pf: Initialize cn20k specific aura and pool contexts") Cc: stable@vger.kernel.org Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260515151826.1005397-1-dawei.feng@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19net: pse-pd: fix sign on -ENOENT check in of_load_pse_pis()Jonas Jelonek1-1/+1
of_count_phandle_with_args() returns the count on success and a negative errno on failure, including -ENOENT when the "pairsets" property is absent. The existing comparison in of_load_pse_pis() checks against ENOENT (positive 2) instead of -ENOENT, so the branch is taken for any error return: legitimate DTs that omit "pairsets" trigger a spurious "wrong number of pairsets" error and probe fails with -EINVAL. Compare against -ENOENT so a missing "pairsets" property is correctly treated as "this PI has no pairsets, continue". Fixes: 9be9567a7c59 ("net: pse-pd: Add support for PSE PIs") Cc: stable@vger.kernel.org Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com> Acked-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260515143103.1721888-1-jelonek.jonas@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-19RDMA/rtrs: Fix use-after-free in path file creation cleanupGuangshuo Li1-1/+1
In the error path of rtrs_srv_create_path_files(), the sysfs root folders may already have been created and srv_path->kobj may already have been initialized. If a later step fails, the cleanup currently calls kobject_put(&srv_path->kobj) before rtrs_srv_destroy_once_sysfs_root_folders(srv_path). kobject_put() may drop the last reference to srv_path->kobj and invoke the release callback, rtrs_srv_release(), which frees srv_path. The following call to rtrs_srv_destroy_once_sysfs_root_folders(srv_path) then dereferences srv_path internally to access srv_path->srv, resulting in a use-after-free. This failure path is reached before rtrs_srv_create_path_files() returns success, so the successful-path lifetime handling is not involved. Fix this by destroying the sysfs root folders before calling kobject_put(&srv_path->kobj), so srv_path is still valid while the helper accesses it. This issue was found by a static analysis tool I am developing. Fixes: ae4c81644e91 ("RDMA/rtrs-srv: Rename rtrs_srv_sess to rtrs_srv_path") Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260514113834.865530-1-lgs201920130244@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-05-19RDMA/mana_ib: Report max_msg_sz in mana_ib_query_portShiraz Saleem1-0/+1
Report max_msg_sz for mana_ib, which is 16MB. Fixes: 4bda1d5332ec ("RDMA/mana_ib: Implement port parameters") Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/20260512094209.264955-1-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-05-19RDMA/core: Do not read wild stack memory in uverbs_get_handler_fn()Jason Gunthorpe3-16/+15
Sashiko points out the legacy write path in ib_uverbs_write() does allocate a struct uverbs_attr_bundle, but it doesn't wrap it in a bundle_priv so downcasting here isn't safe. Instead lift the method_elm out of the bundle_priv and use it for the debug function. The legacy write path will leave it set as NULL since the write method_elm uses a different type. Cc: stable@vger.kernel.org Fixes: 1de9287ece44 ("RDMA: Add ib_copy_validate_udata_in()") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-05-19RDMA/core: Move the _ib_copy_validate_udata* functions to ib_core_uverbsJason Gunthorpe3-122/+124
It was incorrect to place them in uverbs_ioctl because that makes every driver depends on ib_uverbs.ko, which is undesired. ib_core_uverbs.c is for functions used by alot of drivers that are linked into ib_core instead. Fixes: 1de9287ece44 ("RDMA: Add ib_copy_validate_udata_in()") Link: https://patch.msgid.link/r/1-v1-045258567bd6+9fe-ib_uverbs_support_ko_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-19ACPI: battery: Fix system wakeup on critical battery statusRafael J. Wysocki1-1/+3
Commit 0a869409a981 ("ACPI: battery: Convert the driver to a platform one") changed the parent of the battery wakeup source to the platform device used for driver binding, but it forgot to update the acpi_pm_wakeup_event() call in acpi_battery_update() accordingly. Do it now to unbreak waking up the system on critical battery status during suspend-to-idle and during transitions to ACPI S3/S4. Fixes: 0a869409a981 ("ACPI: battery: Convert the driver to a platform one") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 7.0+ <stable@vger.kernel.org> # 7.0+ Link: https://patch.msgid.link/12898712.O9o76ZdvQC@rafael.j.wysocki
2026-05-19Merge tag 'ata-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linuxLinus Torvalds5-47/+83
Pull ata fixes from Niklas Cassel: - Make sure that the issuing of a deferred non-NCQ command via workqueue feature is only used when mixing NCQ and non-NCQ commands to the same link (i.e. return value ATA_DEFER_LINK), and nothing else. This way we will not incorrectly try to use the feature for e.g. PATA drivers - The deferred non-NCQ command was stored in a per-port struct. When using Port Multipliers with FIS-Based Switching, we would thus needlessly defer commands to all other links. Store the deferred QC in a per-link struct, such that Port Multipliers with FBS will get the same performance as before - The issuing of a deferred non-NCQ command via workqueue feature broke support for Port Multipliers using Command-Based Switching. The issuing of a deferred non-NCQ command via workqueue feature is not compatible with the use of ap->excl_link, which PMPs with CBS use for fairness (using implicit round robin) * tag 'ata-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: libata-scsi: do not needlessly defer commands when using PMP with FBS ata: libata-scsi: do not use the deferred QC feature on PMPs with CBS ata: libata-scsi: do not use the deferred QC feature for ATA_DEFER_PORT ata: libata-scsi: improve readability of ata_scsi_qc_issue()
2026-05-19drm/amdgpu: fix handling in amdgpu_userq_createChristian König1-66/+52
Well mostly the same issues the other code had as well: 1. Memory allocation while holding the userq_mutex lock is forbidden! 2. Things were created/started/published in the wrong order. 3. The reset lock was taken in the wrong order and seems to be unecessary in the first place. 4. Error messages on invalid input parameters can spam the logs. 5. Error messages on memory allocation failures are usually superflous as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 89e50de5654dbe7a137e03d78629542e17ba7202)
2026-05-19drm/radeon/evergreen_cs: Add missing NULL prefix check in surface checkVitaliy Triang3l Kuzmin1-2/+4
'evergreen_surface_check' is called with a NULL warning prefix when handling potentially recoverable issues or just to compute the alignment requirements, and 'evergreen_surface_check' is called again in case of failure (with the correct prefix, as opposed to NULL), therefore, the initial check must not print a warning, because the surface may be accepted successfully after having been corrected, however if it isn't, the final check will print the warning anyway. The surface check functions specific to array modes already implement this behavior, but the 'evergreen_surface_check' function itself doesn't. This is also supposed to fix the "'%s' directive argument is null [-Werror=format-overflow=]" compiler warning. Fixes: 285484e2d55e ("drm/radeon: add support for evergreen/ni tiling informations v11") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Vitaliy Triang3l Kuzmin <ml@triang3l.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e20ea411c99f6968af35fd03e9ee21f70d799144)
2026-05-19drm/amdgpu: userq_va_mapped should remain true once doneSunil Khatri3-14/+7
Multiple queues needs these bo_va objects belonging to the same uq_mgr. So once they are mapped lets not unmap them as at any point of time any of the queues might be using it. Also userq_va_mapped should be a boolean than atomic. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5c02889ea22575c3bcfdf212e65fac316cbc6c6a)
2026-05-19drm/amdgpu: avoid integer overflow in VA range checkCe Sun1-2/+2
The original addition operation in 64-bit unsigned type may encounter overflow situations. To prevent such issues and safely reject invalid inputs, the check_add_overflow() function is used. Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cc768f4dd0bb9083c813683eeec44fc23921f771)
2026-05-19drm/amd/ras: Fix UMC error address allocation leakXiang Liu1-0/+2
amdgpu_umc_handle_bad_pages() allocates err_data->err_addr before querying UMC error information. In the direct and firmware query paths, the pointer is reassigned to a fresh allocation before the original buffer is released, so the initial allocation is leaked on each handled event. Free the existing buffer before replacing it in those query paths so the function exit cleanup only owns the active allocation. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 911b1bdd22c3712a22b60fcc58f7b9f2d07b0803)
2026-05-19drm/amdgpu: unmap all user mappings of framebuffer and doorbell before mode1 resetYifan Zhang5-0/+55
During Mode 1 reset, the ASIC undergoes a reset cycle and becomes temporarily inaccessible via PCIe. Any attempt to access framebuffer or MMIO registers during this window can result in uncompleted PCIe transactions, leading to NMI panics or system hangs. To prevent this, Unmap all of the applications mappings of the framebuffer and doorbell BARs before mode1 reset. Also prevent new mappings from coming in during the reset process. v2: remove inode in kfd_dev (Christian) v3: correct unmap offset (Felix), remove prevent new mappings part to avoid deadlock (Christian) Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 70cadefcc6160c575b04f763ada34c20e868d577)
2026-05-19drm/amd/display: Validate payload length and link_index in dc_process_dmub_aux_transfer_asyncHarry Wentland1-1/+5
[Why&How] dc_process_dmub_aux_transfer_async() copies payload->length bytes into a 16-byte stack buffer (dpaux.data[16]) guarded only by an ASSERT(), which is a no-op in release builds. If a caller ever passes length > 16 this results in a stack buffer overflow via memcpy. Additionally, link_index is used to dereference dc->links[] without bounds checking against dc->link_count, risking an out-of-bounds access. Replace the ASSERT with a hard runtime check that returns false when payload->length exceeds the destination buffer size, and add a bounds check for link_index before it is used. Assisted-by: GitHub Copilot:Claude claude-4-opus Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ba4caa9fecdf7a38f98c878ad05a8a64148b6881) Cc: stable@vger.kernel.org
2026-05-19drm/amd/display: Validate GPIO pin LUT table size before iteratingHarry Wentland1-0/+9
[Why&How] The GPIO pin table parsers in get_gpio_i2c_info() and bios_parser_get_gpio_pin_info() derive an element count from the VBIOS table_header.structuresize field, then iterate over gpio_pin[] entries. However, GET_IMAGE() only validates that the table header itself fits within the BIOS image. If the VBIOS reports a structuresize larger than the actual mapped data, the loop reads past the end of the BIOS image, causing an out-of-bounds read. Fix this by calling bios_get_image() to validate that the full claimed structuresize is accessible within the BIOS image before entering the loop in both functions. Assisted-by: GitHub Copilot:claude-opus-4-6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ba5e95b43b773ae1bf1f66ee6b31eb774e65afe3) Cc: stable@vger.kernel.org
2026-05-19drm/amd/display: Fix integer overflow in bios_get_image()Harry Wentland1-3/+6
[Why&How] The bounds check in bios_get_image() computes 'offset + size' using unsigned 32-bit arithmetic before comparing against bios_size. If a VBIOS image contains a near-UINT32_MAX offset the addition wraps to a small value, the comparison passes, and the function returns a wild pointer past the VBIOS mapping. Additionally, the comparison uses '<' (strict), which incorrectly rejects the valid exact-fit case where offset + size == bios_size. Fix both issues by restructuring the check to avoid the addition entirely: first reject if offset alone exceeds bios_size, then check size against the remaining space (bios_size - offset). This eliminates the overflow and correctly permits exact-fit accesses. Assisted-by: GitHub Copilot:claude-opus-4.6 Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d40fb392af659c4a02b560319f226842f6ec1a95) Cc: stable@vger.kernel.org
2026-05-19drm/amdkfd: Check bounds for allocate_sdma_queue restore_sdma_idDavid Francis1-0/+6
allocate_sdma_queue has an option where the sdma queue id can be specified (used by CRIU). We weren't bounds-checking that value. Confirm it's less than the maximum number of queues. Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bfe9a7545b2a7be1c543f1741e16f2d5ec4116ae)
2026-05-19drm/amdgpu: use atomic operation to achieve lockless serializationSunil Khatri1-5/+8
In amdgpu_seq64_alloc there is a possibility that two difference cores from two separate NODES can try to and could get the same free slot. So this fixes that race here using atomic test_and_set clear operations. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4d50a14d346141e03a7c3905e496d91e048bc30c)
2026-05-19drm/amdkfd: Check bounds on allocate_doorbellDavid Francis1-0/+3
allocated_doorbell has an option to set the doorbell id to a specific value (used by CRIU). This value was not bounds checked. Check to confirm it's less than KFD_MAX_NUM_OF_QUEUES_PER_PROCESS. Signed-off-by: David Francis <David.Francis@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1f087bb8cf9e8797633da35c85435e557ef74d06)
2026-05-19drm/amdgpu/vce3: Fix VCE 3 firmware size and offsetsTimur Kristóf1-1/+1
The VCPU BO contains the actual FW at an offset, but it was not calculated into the VCPU BO size. Subtract this from the FW size to make sure there is no out of bounds access. This may fix VM faults when using VCE 3. Cc: John Olender <john.olender@gmail.com> Fixes: e98226221467 ("drm/amdgpu: recalculate VCE firmware BO size") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 15c369257bd85f47a514744f960c5a51c867716f)
2026-05-19drm/amdgpu/vce2: Fix VCE 2 firmware size and offsetsTimur Kristóf1-2/+7
The VCPU BO contains the actual FW at an offset, but it was not calculated into the VCPU BO size. Subtract this from the FW size to make sure there is no out of bounds access. Additionally, increase the VCE_V2_0_DATA_SIZE to have extra space after the VCE handles. Also increase the data size used for each VCE handle. The FW needs 23744 bytes, use 24K to be safe. This fixes VM faults when using VCE 2. Cc: John Olender <john.olender@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4802 Fixes: e98226221467 ("drm/amdgpu: recalculate VCE firmware BO size") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a20d21df625548c1738c0745f753c5d6eb823bc3)
2026-05-19drm/amdgpu/vce1: Stop using amdgpu_vce_resumeTimur Kristóf1-10/+6
The VCE1 firmware works slightly differently and is already loaded by vce_v1_0_load_fw(). It doesn't actually need to call amdgpu_vce_resume(). Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 33d8951405e2dd81ac61edebc680e2dfb6b4fc9f)
2026-05-19drm/amdgpu/vce1: Fix VCE 1 firmware size and offsetsTimur Kristóf1-4/+15
The VCPU BO contains the actual FW at an offset, but it was not calculated into the VCPU BO size. Subtract this from the FW size to make sure there is no out of bounds access. Make sure the stack and data offsets are aligned to the 32K TLB size. Check that the FW microcode actually fits in the space that is reserved for it. Fixes: d4a640d4b9f3 ("drm/amdgpu/vce1: Implement VCE1 IP block (v2)") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c16fe59f622a080fc457a57b3e8f14c780699449)
2026-05-19drm/amdgpu/vce1: Don't repeat GTT MGR node allocationTimur Kristóf1-5/+7
Only allocate entries from the GTT manager when the VCE GTT node is not allocated yet. This prevents the possibility of allocating them multiple times, which causes issues during GPU reset and suspend/resume. Fixes: 71aec08f80e7 ("amdgpu/vce: use amdgpu_gtt_mgr_alloc_entries") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8d2a20c1721cb17e22821e1b4ecbb02d475d91c5)
2026-05-19drm/amdgpu/vce1: Check if VRAM address is lower than GART.Timur Kristóf1-0/+3
Previously, I had assumed this was not possible so it was OK to not handle it, but now we got a report from a user who has a board that is configured this way. When the VCPU BO is already located in a low 32-bit address in VRAM (eg. when VRAM is mapped to the low address space), don't do the workaround. Fixes: 71aec08f80e7 ("amdgpu/vce: use amdgpu_gtt_mgr_alloc_entries") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f370ec9b164698a9ca1a7b59bfbea07f70df769d)
2026-05-19drm/amdgpu/vce1: Remove superfluous address checkTimur Kristóf1-2/+0
The same thing is already checked a few lines above. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c1dc555e760dbfc4a4710f7270f525a03d433af8)
2026-05-19drm/amdgpu/vce1: Check that the GPU address is < 128 MiBTimur Kristóf1-4/+8
When ensuring the low 32-bit address, make sure it is less than 128 MiB, otherwise the VCE seems to fail to initialize. This seems to be an undocumented limitation of the firmware validation mechanism. Note that in case of VCE1 the BAR address is zero and we can't change it also due to the firmware validator. When programming the mmVCE_VCPU_CACHE_OFFSETn registers, don't AND them with a mask. This is incorrect because the register mask is actually 0x0fffffff and useless because we already ensure the addresses are below the limit. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e729ae5f3ac73c861c062080ac8c3d666c972404)
2026-05-19drm/amdgpu: Align amdgpu_gtt_mgr entries to TLB size on Tahiti (v2)Timur Kristóf1-1/+8
The TLB is organized in groups of 8 entries, each one is 4K. On Tahiti, the HW requires these GART entries to be 32K-aligned. This fixes a VCE 1 firmware validation failure that can happen after suspend/resume since we use amdgpu_gtt_mgr for VCE 1. v2: - Change variable declaration order - Add comment about "V bit HW bug" Fixes: 698fa62f56aa ("drm/amdgpu: Add helper to alloc GART entries") Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 530411b465ef0b2c0cc18c2e3d7e38422b1117d1)
2026-05-19drm/amdkfd: Fix OOB memory exposure in get_wave_state()Sunday Clement1-3/+8
The get_wave_state() function for v9 trusts cp_hqd_cntl_stack_size and cp_hqd_cntl_stack_offset values read directly from the MQD, which are written by GPU microcode and fully attacker-controlled on the CRIU-restore path (via AMDKFD_IOC_RESTORE_PROCESS with H3). this leads to an unbounded copy_to_user() that can leak adjacent GTT/kernel memory. If offset > size, integer underflow produces a ~4 GiB read length, if size is set to 1 MiB against a 4 KiB allocation, we leak 1 MiB of adjacent kernel memory (other queues' MQDs, ring buffers, KASLR pointers). Fix by clamping both cp_hqd_cntl_stack_size to the actual allocated buffer size (q->ctl_stack_size) and cp_hqd_cntl_stack_offset to the clamped size before performing arithmetic and copy_to_user(). This ensures we never read beyond the allocated kernel BO regardless of attacker-supplied MQD field values. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7ef144458f48d5589e36f1b3d83e83db2e5c5ba5)
2026-05-19drm/amd/pm: fix memleak of dpm_policies on smu v15Yang Wang1-0/+2
In smu_v15_0_fini_smc_tables, dpm_policies was not freed or NULLed, causing a memory leak. Add kfree() and NULL assignment to properly release memory and avoid dangling pointers. Fixes: 2beedc3a92b7 ("drm/amd/pm: Add initial support for smu v15_0_8"); Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 014f329074f688b9b49383e8b70e79e9ef99359e)
2026-05-19drm/amdgpu: Fix discovery offset check under VFLijo Lazar1-1/+1
Discovery table may be kept at offset 0 by host driver. Remove the validation check. Fixes: 01bdc7e219c4 ("drm/amdgpu: New interface to get IP discovery binary v3") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d3f5bbd007133c64a20e81ef290a93e46c75df40)
2026-05-19drm/amdgpu: remove va cursors for all mappingsSunil Khatri1-10/+7
va_cursor struct needs to be cleaned even if the mapping has been removed already. Also simplify it by make it a void function as return value check isn't needed as its called during tear down. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4d35a45c9b4c1ac5b6e3219f83c3db706b675fa2)
2026-05-19drm/amdgpu: reject non-user addresses early in GEM_USERPTR ioctlAmir Shetaia1-0/+4
amdgpu_gem_userptr_ioctl() currently accepts any value of args->addr and only discovers an out-of-range pointer much later, inside amdgpu_gem_object_create() and the HMM mirror registration path. Userspace can drive that path with kernel-side virtual addresses; the get_user_pages() layer rejects them, but only after the driver has already allocated a GEM object and started wiring up notifier state that then has to be torn down on failure. Add an access_ok() guard at the top of the ioctl, right after the existing page-alignment check and before flag validation, so any address that does not lie within the calling task's user address range is rejected with -EFAULT before any allocation occurs. No legitimate ROCm/HSA userspace passes kernel-mode pointers through this interface, so this is defense-in-depth rather than a behaviour change for valid callers; -EFAULT matches the convention already used by other uaccess-style rejections in the kernel. Also add an explicit #include <linux/uaccess.h>; access_ok() is otherwise only available transitively through other headers in this translation unit. Signed-off-by: Amir Shetaia <Amir.Shetaia@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7a076df36397d780d7e4fb595287b4980451a7f5)
2026-05-19drm/amdgpu/vpe: Force collaborate sync after TRAPAlan Liu1-1/+6
VPE1 could possibly hang and fail to power off at the end of commands in collaboration mode. This workaround adds a COLLAB_SYNC after TRAP to force instances synchronized to avoid VPE1 fail to power off. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alan liu <haoping.liu@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5171 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a8b749c5c5afb7e5daa2bfb95d958fb3c6b8f055) Cc: stable@vger.kernel.org
2026-05-19drm/amdgpu/userq: update the vm task info during signal ioctlSunil Khatri1-0/+3
Pagefaults does not have process information correctly populated as vm->task is not set during vm_init but should be updated while real submission. So setting that up during signal_ioctl to get the correct submission process details. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a9b14d88b4d83e21ab965f23d1fb7b07b87e0517)
2026-05-19drm/amdgpu/userq: cancel reset work while tear down in progressSunil Khatri1-0/+8
While tear down of a userq_mgr is happening when all the queues are free we should cancel any reset work if pending before exiting. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 160164609f71f774c4f661227a9b7a370a86b112)