wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2025-03-14	usb: cdns3-ti: run HW init at resume() if HW was reset	Théo Lebrun	1	-0/+25
	At runtime_resume(), read the W1 (Wrapper Register 1) register to detect if an hardware reset occurred. If it did, run the hardware init sequence. This callback will be called at system-wide resume. Previously, if a reset occurred during suspend, we would crash. The wrapper config had not been written, leading to invalid register accesses inside cdns3. Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Link: https://lore.kernel.org/r/20250205-s2r-cdns-v7-6-13658a271c3c@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: cdns3-ti: move reg writes to separate function	Théo Lebrun	1	-38/+44
	The device probe function mixes management code and hardware initialisation code. Extract the latter into an explicitly named cdns_ti_reset_and_init_hw() function to clarify intent. It also will allow easier transition to using runtime PM for triggering HW init. Reviewed-by: Roger Quadros <rogerq@kernel.org> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Link: https://lore.kernel.org/r/20250205-s2r-cdns-v7-5-13658a271c3c@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: cdns3: call cdns_power_is_lost() only once in cdns_resume()	Théo Lebrun	1	-2/+3
	cdns_power_is_lost() does a register read. Call it only once rather than twice. Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Link: https://lore.kernel.org/r/20250205-s2r-cdns-v7-4-13658a271c3c@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: cdns3: rename hibernated argument of role->resume() to lost_power	Théo Lebrun	3	-4/+4
	The cdns_role_driver->resume() callback takes a second boolean argument named `hibernated` in its implementations. This is mistaken; the only potential caller is: int cdns_resume(struct cdns cdns) { / ... */ if (cdns->roles[cdns->role]->resume) cdns->roles[cdns->role]->resume(cdns, cdns_power_is_lost(cdns)); return 0; } The argument can be true in cases outside of return from hibernation. Reflect the true meaning by renaming both arguments to `lost_power`. Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Acked-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/20250205-s2r-cdns-v7-3-13658a271c3c@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: xhci: tegra: rename `runtime` boolean to `is_auto_runtime`	Théo Lebrun	1	-5/+5
	Unify naming convention: use `is_auto_runtime` in xhci-tegra, to be in phase with (future) drivers/usb/host/xhci.c. Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Link: https://lore.kernel.org/r/20250205-s2r-cdns-v7-2-13658a271c3c@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: host: xhci-plat: mvebu: use ->quirks instead of ->init_quirk() func	Théo Lebrun	3	-17/+1
	Compatible "marvell,armada3700-xhci" match data uses the struct xhci_plat_priv::init_quirk() function pointer to add XHCI_RESET_ON_RESUME as quirk on XHCI. Instead, use the struct xhci_plat_priv::quirks field. Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Link: https://lore.kernel.org/r/20250205-s2r-cdns-v7-1-13658a271c3c@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: dwc3: Don't use %pK through printk	Thomas Weißschuh	2	-4/+4
	This is a revert of commit 04fb365c453e ("usb: dwc3: replace %p with %pK") When the formatting was changed from %p to %pK that was a security improvement, as %p would leak raw pointer values to the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. On the other hand, restricted pointers ("%pK") were never meant to be used through printk(). They can unintentionally still leak raw pointers or acquire sleeping looks in atomic contexts. Switch back to regular %p again. Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/ Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://lore.kernel.org/r/20250310-restricted-pointers-usb-v2-2-a7598e2d47d1@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: core: Don't use %pK through printk	Thomas Weißschuh	2	-3/+3
	This is a revert of commit 2f964780c03b ("USB: core: replace %p with %pK"). When the formatting was changed from %p to %pK that was a security improvement, as %p would leak raw pointer values to the kernel log. Since commit ad67b74d2469 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. On the other hand, restricted pointers ("%pK") were never meant to be used through printk(). They can unintentionally still leak raw pointers or acquire sleeping looks in atomic contexts. Switch back to regular %p again. Link: https://lore.kernel.org/lkml/20250113171731-dc10e3c1-da64-4af0-b767-7c7070468023@linutronix.de/ Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://lore.kernel.org/r/20250310-restricted-pointers-usb-v2-1-a7598e2d47d1@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: gadget: aspeed: Add NULL pointer check in ast_vhub_init_dev()	Chenyuan Yang	1	-0/+3
	The variable d->name, returned by devm_kasprintf(), could be NULL. A pointer check is added to prevent potential NULL pointer dereference. This is similar to the fix in commit 3027e7b15b02 ("ice: Fix some null pointer dereference issues in ice_ptp.c"). This issue is found by our static analysis tool Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com> Link: https://lore.kernel.org/r/20250311012705.1233829-1-chenyuan0y@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	dt-bindings: usb: qcom,dwc3: Synchronize minItems for interrupts and -names	Konrad Dybcio	1	-0/+2
	It makes sense that ARRAY_SIZE(prop) should == ARRAY_SIZE(prop-names), so allow that to happen with interrupts. Fixes bogus warnings such as: usb@c2f8800: interrupt-names: ['pwr_event', 'qusb2_phy', 'hs_phy_irq'] is too short Fixes: 53c6d854be4e ("dt-bindings: usb: dwc3: Clean up hs_phy_irq in binding") Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250308-topic-dt_bindings_fixes_usb-v2-1-3169a3394d5b@oss.qualcomm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-14	usb: common: usb-conn-gpio: switch psy_cfg from of_node to fwnode	Sebastian Reichel	1	-1/+1
	In order to remove .of_node from the power_supply_config struct, use .fwnode instead. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Link: https://lore.kernel.org/r/20250225-psy-core-convert-to-fwnode-v1-3-d5e4369936bb@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-11	usb: xhci: Avoid Stop Endpoint retry loop if the endpoint seems Running	Michal Pecio	1	-4/+7
	Nothing prevents a broken HC from claiming that an endpoint is Running and repeatedly rejecting Stop Endpoint with Context State Error. Avoid infinite retries and give back cancelled TDs. No such cases known so far, but HCs have bugs. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250311154551.4035726-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-11	usb: xhci: Don't change the status of stalled TDs on failed Stop EP	Michal Pecio	1	-1/+11
	When the device stalls an endpoint, current TD is assigned -EPIPE status and Reset Endpoint is queued. If a Stop Endpoint is pending at the time, it will run before Reset Endpoint and fail due to the stall. Its handler will change TD's status to -EPROTO before Reset Endpoint handler runs and initiates giveback. Check if the stall has already been handled and don't try to do it again. Since xhci_handle_halted_endpoint() performs this check too, not overwriting td->status is the only difference. I haven't seen this case yet, but I have seen a related one where the xHC has already executed Reset Endpoint, EP Context state is now Stopped and EP_HALTED is set. If the xHC took a bit longer to execute Reset Endpoint, said case would become this one. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250311154551.4035726-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-11	xhci: Avoid queuing redundant Stop Endpoint command for stalled endpoint	Mathias Nyman	1	-3/+5
	If EP_STALLED flag is set in xhci_urb_dequeue(), without EP_HALTED or SET_DEQ_PENDING flags, then the endpoint is in stopped state and the cancelled URB can be given back immediately withouth queueing a 'stop endpoint' command. Without this change the cancelled URB would eventually be given back in the 'context state error' completion path of the 'stop endpoint' command. This is not optimal. For this improvement to work the EP_STALLED flag must be cleared with xhci lock held. Suggested-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250311154551.4035726-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-09	Linux 6.14-rc6	Linus Torvalds	1	-1/+1

2025-03-08	x86/microcode/AMD: Add some forgotten models to the SHA check	Borislav Petkov (AMD)	1	-0/+6
	Add some more forgotten models to the SHA check. Fixes: 50cef76d5cb0 ("x86/microcode/AMD: Load only SHA256-checksummed patches") Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Link: https://lore.kernel.org/r/20250307220256.11816-1-bp@kernel.org
2025-03-08	LoongArch: KVM: Fix GPA size issue about VM	Bibo Mao	2	-1/+11
	Physical address space is 48 bit on Loongson-3A5000 physical machine, however it is 47 bit for VM on Loongson-3A5000 system. Size of physical address space of VM is the same with the size of virtual user space (a half) of physical machine. Variable cpu_vabits represents user address space, kernel address space is not included (user space and kernel space are both a half of total). Here cpu_vabits, rather than cpu_vabits - 1, is to represent the size of guest physical address space. Also there is strict checking about page fault GPA address, inject error if it is larger than maximum GPA address of VM. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08	LoongArch: KVM: Reload guest CSR registers after sleep	Bibo Mao	1	-0/+7
	On host, the HW guest CSR registers are lost after suspend and resume operation. Since last_vcpu of boot CPU still records latest vCPU pointer so that the guest CSR register skips to reload when boot CPU resumes and vCPU is scheduled. Here last_vcpu is cleared so that guest CSR registers will reload from scheduled vCPU context after suspend and resume. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08	LoongArch: KVM: Add interrupt checking for AVEC	Bibo Mao	1	-1/+1
	There is a newly added macro INT_AVEC with CSR ESTAT register, which is bit 14 used for LoongArch AVEC support. AVEC interrupt status bit 14 is supported with macro CSR_ESTAT_IS, so here replace the hard-coded value 0x1fff with macro CSR_ESTAT_IS so that the AVEC interrupt status is also supported by KVM. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08	LoongArch: Set hugetlb mmap base address aligned with pmd size	Bibo Mao	1	-1/+5
	With ltp test case "testcases/bin/hugefork02", there is a dmesg error report message such as: kernel BUG at mm/hugetlb.c:5550! Oops - BUG[#1]: CPU: 0 UID: 0 PID: 1517 Comm: hugefork02 Not tainted 6.14.0-rc2+ #241 Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022 pc 90000000004eaf1c ra 9000000000485538 tp 900000010edbc000 sp 900000010edbf940 a0 900000010edbfb00 a1 9000000108d20280 a2 00007fffe9474000 a3 00007ffff3474000 a4 0000000000000000 a5 0000000000000003 a6 00000000003cadd3 a7 0000000000000000 t0 0000000001ffffff t1 0000000001474000 t2 900000010ecd7900 t3 00007fffe9474000 t4 00007fffe9474000 t5 0000000000000040 t6 900000010edbfb00 t7 0000000000000001 t8 0000000000000005 u0 90000000004849d0 s9 900000010edbfa00 s0 9000000108d20280 s1 00007fffe9474000 s2 0000000002000000 s3 9000000108d20280 s4 9000000002b38b10 s5 900000010edbfb00 s6 00007ffff3474000 s7 0000000000000406 s8 900000010edbfa08 ra: 9000000000485538 unmap_vmas+0x130/0x218 ERA: 90000000004eaf1c __unmap_hugepage_range+0x6f4/0x7d0 PRMD: 00000004 (PPLV0 +PIE -PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) PRID: 0014c010 (Loongson-64bit, Loongson-3A5000) Process hugefork02 (pid: 1517, threadinfo=00000000a670eaf4, task=000000007a95fc64) Call Trace: [<90000000004eaf1c>] __unmap_hugepage_range+0x6f4/0x7d0 [<9000000000485534>] unmap_vmas+0x12c/0x218 [<9000000000494068>] exit_mmap+0xe0/0x308 [<900000000025fdc4>] mmput+0x74/0x180 [<900000000026a284>] do_exit+0x294/0x898 [<900000000026aa30>] do_group_exit+0x30/0x98 [<900000000027bed4>] get_signal+0x83c/0x868 [<90000000002457b4>] arch_do_signal_or_restart+0x54/0xfa0 [<90000000015795e8>] irqentry_exit_to_user_mode+0xb8/0x138 [<90000000002572d0>] tlb_do_page_fault_1+0x114/0x1b4 The problem is that base address allocated from hugetlbfs is not aligned with pmd size. Here add a checking for hugetlbfs and align base address with pmd size. After this patch the test case "testcases/bin/hugefork02" passes to run. This is similar to the commit 7f24cbc9c4d42db8a3c8484d1 ("mm/mmap: teach generic_get_unmapped_area{_topdown} to handle hugetlb mappings"). Cc: stable@vger.kernel.org # 6.13+ Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08	LoongArch: Set max_pfn with the PFN of the last page	Bibo Mao	1	-0/+3
	The current max_pfn equals to zero. In this case, it causes user cannot get some page information through /proc filesystem such as kpagecount. The following message is displayed by stress-ng test suite with command "stress-ng --verbose --physpage 1 -t 1". # stress-ng --verbose --physpage 1 -t 1 stress-ng: error: [1691] physpage: cannot read page count for address 0x134ac000 in /proc/kpagecount, errno=22 (Invalid argument) stress-ng: error: [1691] physpage: cannot read page count for address 0x7ffff207c3a8 in /proc/kpagecount, errno=22 (Invalid argument) stress-ng: error: [1691] physpage: cannot read page count for address 0x134b0000 in /proc/kpagecount, errno=22 (Invalid argument) ... After applying this patch, the kernel can pass the test. # stress-ng --verbose --physpage 1 -t 1 stress-ng: debug: [1701] physpage: [1701] started (instance 0 on CPU 3) stress-ng: debug: [1701] physpage: [1701] exited (instance 0 on CPU 3) stress-ng: debug: [1700] physpage: [1701] terminated (success) Cc: stable@vger.kernel.org # 6.8+ Fixes: ff6c3d81f2e8 ("NUMA: optimize detection of memory with no node id assigned by firmware") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08	LoongArch: Use polling play_dead() when resuming from hibernation	Huacai Chen	1	-1/+46
	When CONFIG_RANDOM_KMALLOC_CACHES or other randomization infrastructrue enabled, the idle_task's stack may different between the booting kernel and target kernel. So when resuming from hibernation, an ACTION_BOOT_CPU IPI wakeup the idle instruction in arch_cpu_idle_dead() and jump to the interrupt handler. But since the stack pointer is changed, the interrupt handler cannot restore correct context. So rename the current arch_cpu_idle_dead() to idle_play_dead(), make it as the default version of play_dead(), and the new arch_cpu_idle_dead() call play_dead() directly. For hibernation, implement an arch-specific hibernate_resume_nonboot_cpu_disable() to use the polling version (idle instruction is replace by nop, and irq is disabled) of play_dead(), i.e. poll_play_dead(), to avoid IPI handler corrupting the idle_task's stack when resuming from hibernation. This solution is a little similar to commit 406f992e4a372dafbe3c ("x86 / hibernate: Use hlt_play_dead() when resuming from hibernation"). Cc: stable@vger.kernel.org Tested-by: Erpeng Xu <xuerpeng@uniontech.com> Tested-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08	LoongArch: Eliminate superfluous get_numa_distances_cnt()	Yuli Wang	1	-12/+0
	In LoongArch, get_numa_distances_cnt() isn't in use, resulting in a compiler warning. Fix follow errors with clang-18 when W=1e: arch/loongarch/kernel/acpi.c:259:28: error: unused function 'get_numa_distances_cnt' [-Werror,-Wunused-function] 259 \| static inline unsigned int get_numa_distances_cnt(struct acpi_table_slit *slit) \| ^~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Link: https://lore.kernel.org/all/Z7bHPVUH4lAezk0E@kernel.org/ Signed-off-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08	LoongArch: Convert unreachable() to BUG()	Tiezhu Yang	1	-2/+2
	When compiling on LoongArch, there exists the following objtool warning in arch/loongarch/kernel/machine_kexec.o: kexec_reboot() falls through to next function crash_shutdown_secondary() Avoid using unreachable() as it can (and will in the absence of UBSAN) generate fall-through code. Use BUG() so we get a "break BRK_BUG" trap (with unreachable annotation). Cc: stable@vger.kernel.org # 6.12+ Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-03-08	x86/mm: Define PTRS_PER_PMD for assembly code too	Ingo Molnar	1	-4/+4
	Andy reported the following build warning from head_32.S: In file included from arch/x86/kernel/head_32.S:29: arch/x86/include/asm/pgtable_32.h:59:5: error: "PTRS_PER_PMD" is not defined, evaluates to 0 [-Werror=undef] 59 \| #if PTRS_PER_PMD > 1 The reason is that on 2-level i386 paging the folded in PMD's PTRS_PER_PMD constant is not defined in assembly headers, only in generic MM C headers. Instead of trying to fish out the definition from the generic headers, just define it - it even has a comment for it already... Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/Z8oa8AUVyi2HWfo9@gmail.com
2025-03-07	virt: sev-guest: Move SNP Guest Request data pages handling under snp_cmd_mutex	Alexey Kardashevskiy	3	-24/+39
	Compared to the SNP Guest Request, the "Extended" version adds data pages for receiving certificates. If not enough pages provided, the HV can report to the VM how much is needed so the VM can reallocate and repeat. Commit ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex") moved handling of the allocated/desired pages number out of scope of said mutex and create a possibility for a race (multiple instances trying to trigger Extended request in a VM) as there is just one instance of snp_msg_desc per /dev/sev-guest and no locking other than snp_cmd_mutex. Fix the issue by moving the data blob/size and the GHCB input struct (snp_req_data) into snp_guest_req which is allocated on stack now and accessed by the GHCB caller under that mutex. Stop allocating SEV_FW_BLOB_MAX_SIZE in snp_msg_alloc() as only one of four callers needs it. Free the received blob in get_ext_report() right after it is copied to the userspace. Possible future users of snp_send_guest_request() are likely to have different ideas about the buffer size anyways. Fixes: ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex") Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250307013700.437505-3-aik@amd.com
2025-03-07	virt: sev-guest: Allocate request data dynamically	Nikunj A Dadhania	1	-9/+15
	Commit ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex") narrowed the command mutex scope to snp_send_guest_request(). However, GET_REPORT, GET_DERIVED_KEY, and GET_EXT_REPORT share the req structure in snp_guest_dev. Without the mutex protection, concurrent requests can overwrite each other's data. Fix it by dynamically allocating the request structure. Fixes: ae596615d93d ("virt: sev-guest: Reduce the scope of SNP command mutex") Closes: https://github.com/AMDESE/AMDSEV/issues/265 Reported-by: andreas.stuehrk@yaxi.tech Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250307013700.437505-2-aik@amd.com
2025-03-07	x86/amd_nb: Use rdmsr_safe() in amd_get_mmconfig_range()	Andrew Cooper	1	-6/+3
	Xen doesn't offer MSR_FAM10H_MMIO_CONF_BASE to all guests. This results in the following warning: unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0) Call Trace: xen_read_msr+0x1e/0x30 amd_get_mmconfig_range+0x2b/0x80 quirk_amd_mmconfig_area+0x28/0x100 pnp_fixup_device+0x39/0x50 __pnp_add_device+0xf/0x150 pnp_add_device+0x3d/0x100 pnpacpi_add_device_handler+0x1f9/0x280 acpi_ns_get_device_callback+0x104/0x1c0 acpi_ns_walk_namespace+0x1d0/0x260 acpi_get_devices+0x8a/0xb0 pnpacpi_init+0x50/0x80 do_one_initcall+0x46/0x2e0 kernel_init_freeable+0x1da/0x2f0 kernel_init+0x16/0x1b0 ret_from_fork+0x30/0x50 ret_from_fork_asm+0x1b/0x30 based on quirks for a "PNP0c01" device. Treating MMCFG as disabled is the right course of action, so no change is needed there. This was most likely exposed by fixing the Xen MSR accessors to not be silently-safe. Fixes: 3fac3734c43a ("xen/pv: support selecting safe/unsafe msr accesses") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250307002846.3026685-1-andrew.cooper3@citrix.com
2025-03-06	fs/pipe: add simpler helpers for common cases	Linus Torvalds	7	-23/+49
	The fix to atomically read the pipe head and tail state when not holding the pipe mutex has caused a number of headaches due to the size change of the involved types. It turns out that we don't have _that_ many places that access these fields directly and were affected, but we have more than we strictly should have, because our low-level helper functions have been designed to have intimate knowledge of how the pipes work. And as a result, that random noise of direct 'pipe->head' and 'pipe->tail' accesses makes it harder to pinpoint any actual potential problem spots remaining. For example, we didn't have a "is the pipe full" helper function, but instead had a "given these pipe buffer indexes and this pipe size, is the pipe full". That's because some low-level pipe code does actually want that much more complicated interface. But most other places literally just want a "is the pipe full" helper, and not having it meant that those places ended up being unnecessarily much too aware of this all. It would have been much better if only the very core pipe code that cared had been the one aware of this all. So let's fix it - better late than never. This just introduces the trivial wrappers for "is this pipe full or empty" and to get how many pipe buffers are used, so that instead of writing if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) the places that literally just want to know if a pipe is full can just say if (pipe_is_full(pipe)) instead. The existing trivial cases were converted with a 'sed' script. This cuts down on the places that access pipe->head and pipe->tail directly outside of the pipe code (and core splice code) quite a lot. The splice code in particular still revels in doing the direct low-level accesses, and the fuse fuse_dev_splice_write() code also seems a bit unnecessarily eager to go very low-level, but it's at least a bit better than it used to be. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06	block: Name the RQF flags enum	Breno Leitao	1	-1/+1
	Commit 5f89154e8e9e3445f9b59 ("block: Use enum to define RQF_x bit indexes") converted the RQF flags to an anonymous enum, which was a beneficial change. This patch goes one step further by naming the enum as "rqf_flags". This naming enables exporting these flags to BPF clients, eliminating the need to duplicate these flags in BPF code. Instead, BPF clients can now access the same kernel-side values through CO:RE (Compile Once, Run Everywhere), as shown in this example: rqf_stats = bpf_core_enum_value(enum rqf_flags, __RQF_STATS) Suggested-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20250306-rqf_flags-v1-1-bbd64918b406@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-06	bcachefs: copygc now skips non-rw devices	Kent Overstreet	1	-13/+12
	There's no point in doing copygc on non-rw devices: the fragmentation doesn't matter if we're not writing to them, and we may not have anywhere to put the data on our other devices. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-06	bcachefs: Fix bch2_dev_journal_alloc() spuriously failing	Kent Overstreet	1	-27/+32
	Previously, we fixed journal resize spuriousl failing with -BCH_ERR_open_buckets_empty, but initial journal allocation was missed because it didn't invoke the "block on allocator" loop at all. Factor out the "loop on allocator" code to fix that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-06	x86/boot: Sanitize boot params before parsing command line	Ard Biesheuvel	1	-0/+2
	The 5-level paging code parses the command line to look for the 'no5lvl' string, and does so very early, before sanitize_boot_params() has been called and has been given the opportunity to wipe bogus data from the fields in boot_params that are not covered by struct setup_header, and are therefore supposed to be initialized to zero by the bootloader. This triggers an early boot crash when using syslinux-efi to boot a recent kernel built with CONFIG_X86_5LEVEL=y and CONFIG_EFI_STUB=n, as the 0xff padding that now fills the unused PE/COFF header is copied into boot_params by the bootloader, and interpreted as the top half of the command line pointer. Fix this by sanitizing the boot_params before use. Note that there is no harm in calling this more than once; subsequent invocations are able to spot that the boot_params have already been cleaned up. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> # v6.1+ Link: https://lore.kernel.org/r/20250306155915.342465-2-ardb+git@google.com Closes: https://lore.kernel.org/all/202503041549.35913.ulrich.gemkow@ikr.uni-stuttgart.de
2025-03-06	fs/pipe: fix pipe buffer index use in FUSE	Linus Torvalds	1	-7/+6
	This was another case that Rasmus pointed out where the direct access to the pipe head and tail pointers broke on 32-bit configurations due to the type changes. As with the pipe FIONREAD case, fix it by using the appropriate helper functions that deal with the right pipe index sizing. Reported-by: Rasmus Villemoes <ravi@prevas.dk> Link: https://lore.kernel.org/all/878qpi5wz4.fsf@prevas.dk/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg > Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06	fs/pipe: do not open-code pipe head/tail logic in FIONREAD	Linus Torvalds	1	-4/+3
	Rasmus points out that we do indeed have other cases of breakage from the type changes that were introduced on 32-bit targets in order to read the pipe head and tail values atomically (commit 3d252160b818: "fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex"). Fix it up by using the proper helper functions that now deal with the pipe buffer index types properly. This makes the code simpler and more obvious. The compiler does the CSE and loop hoisting of the pipe ring size masking that we used to do manually, so open-coding this was never a good idea. Reported-by: Rasmus Villemoes <ravi@prevas.dk> Link: https://lore.kernel.org/all/87cyeu5zgk.fsf@prevas.dk/ Fixes: 3d252160b818 ("fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex")Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Swapnil Sapkal <swapnil.sapkal@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06	fs/pipe: express 'pipe_empty()' in terms of 'pipe_occupancy()'	Linus Torvalds	1	-6/+6
	That's what 'pipe_full()' does, so it's more consistent. But more importantly it gets the type limits right when the pipe head and tail are no longer necessarily 'unsigned int'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-06	usb: typec: ucsi: Fix NULL pointer access	Andrei Kuchynski	1	-6/+7
	Resources should be released only after all threads that utilize them have been destroyed. This commit ensures that resources are not released prematurely by waiting for the associated workqueue to complete before deallocating them. Cc: stable <stable@kernel.org> Fixes: b9aa02ca39a4 ("usb: typec: ucsi: Add polling mechanism for partner tasks like alt mode checking") Signed-off-by: Andrei Kuchynski <akuchynski@chromium.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20250305111739.1489003-2-akuchynski@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: quirks: Add DELAY_INIT and NO_LPM for Prolific Mass Storage Card Reader	Miao Li	1	-0/+4
	When used on Huawei hisi platforms, Prolific Mass Storage Card Reader which the VID:PID is in 067b:2731 might fail to enumerate at boot time and doesn't work well with LPM enabled, combination quirks: USB_QUIRK_DELAY_INIT + USB_QUIRK_NO_LPM fixed the problems. Signed-off-by: Miao Li <limiao@kylinos.cn> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20250304070757.139473-1-limiao870622@163.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	xhci: Handle spurious events on Etron host isoc enpoints	Mathias Nyman	2	-13/+27
	Unplugging a USB3.0 webcam from Etron hosts while streaming results in errors like this: [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13 [ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650 [ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13 [ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670 Etron xHC generates two transfer events for the TRB if an error is detected while processing the last TRB of an isoc TD. The first event can be any sort of error (like USB Transaction or Babble Detected, etc), and the final event is Success. The xHCI driver will handle the TD after the first event and remove it from its internal list, and then print an "Transfer event TRB DMA ptr not part of current TD" error message after the final event. Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a transaction error mid TD.") is designed to address isoc transaction errors, but unfortunately it doesn't account for this scenario. This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success event follows a 'short transfer' event, but the TD the event points to is already given back. Expand the spurious success 'short transfer' event handling to cover the spurious success after error on Etron hosts. Kuangyi Chiang reported this issue and submitted a different solution based on using error_mid_td. This commit message is mostly taken from that patch. Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com> Closes: https://lore.kernel.org/linux-usb/20241028025337.6372-6-ki.chiang65@gmail.com/ Tested-by: Kuangyi Chiang <ki.chiang65@gmail.com> Tested-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-16-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: Unify duplicate inc_enq() code	Michal Pecio	1	-75/+55
	Extract a block of code copied from inc_enq() into a separate function and call it from inc_enq() and the other function which used this code. Remove the pointless 'next' variable which only aliases ring->enqueue. Note: I don't know if any 0.95 xHC ever reached series production, but "AMD 0.96 host" appears to be the "Llano" family APU. Example dmesg at https://linux-hardware.org/?probe=79d5cfd4fd&log=dmesg pci 0000:00:10.0: [1022:7812] type 00 class 0x0c0330 hcc params 0x014042c3 hci version 0x96 quirks 0x0000000000000608 Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-15-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: Apply the link chain quirk on NEC isoc endpoints	Michal Pecio	1	-2/+11
	Two clearly different specimens of NEC uPD720200 (one with start/stop bug, one without) were seen to cause IOMMU faults after some Missed Service Errors. Faulting address is immediately after a transfer ring segment and patched dynamic debug messages revealed that the MSE was received when waiting for a TD near the end of that segment: [ 1.041954] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ffa08fe0 [ 1.042120] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09000 flags=0x0000] [ 1.042146] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09040 flags=0x0000] It gets even funnier if the next page is a ring segment accessible to the HC. Below, it reports MSE in segment at ff1e8000, plows through a zero-filled page at ff1e9000 and starts reporting events for TRBs in page at ff1ea000 every microframe, instead of jumping to seg ff1e6000. [ 7.041671] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0 [ 7.041999] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0 [ 7.042011] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint [ 7.042028] xhci_hcd: All TDs skipped for slot 1 ep 2. Clear skip flag. [ 7.042134] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint [ 7.042138] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31 [ 7.042144] xhci_hcd: Looking for event-dma 00000000ff1ea040 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 [ 7.042259] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint [ 7.042262] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31 [ 7.042266] xhci_hcd: Looking for event-dma 00000000ff1ea050 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 At some point completion events change from Isoch Buffer Overrun to Short Packet and the HC finally finds cycle bit mismatch in ff1ec000. [ 7.098130] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 [ 7.098132] xhci_hcd: Looking for event-dma 00000000ff1ecc50 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 [ 7.098254] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 [ 7.098256] xhci_hcd: Looking for event-dma 00000000ff1ecc60 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 [ 7.098379] xhci_hcd: Overrun event on slot 1 ep 2 It's possible that data from the isochronous device were written to random buffers of pending TDs on other endpoints (either IN or OUT), other devices or even other HCs in the same IOMMU domain. Lastly, an error from a different USB device on another HC. Was it caused by the above? I don't know, but it may have been. The disk was working without any other issues and generated PCIe traffic to starve the NEC of upstream BW and trigger those MSEs. The two HCs shared one x1 slot by means of a commercial "PCIe splitter" board. [ 7.162604] usb 10-2: reset SuperSpeed USB device number 3 using xhci_hcd [ 7.178990] sd 9:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s [ 7.179001] sd 9:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 04 02 ae 00 00 02 00 00 [ 7.179004] I/O error, dev sdb, sector 67284480 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0 Fortunately, it appears that this ridiculous bug is avoided by setting the chain bit of Link TRBs on isochronous rings. Other ancient HCs are known which also expect the bit to be set and they ignore Link TRBs if it's not. Reportedly, 0.95 spec guaranteed that the bit is set. The bandwidth-starved NEC HC running a 32KB/uframe UVC endpoint reports tens of MSEs per second and runs into the bug within seconds. Chaining Link TRBs allows the same workload to run for many minutes, many times. No negative side effects seen in UVC recording and UAC playback with a few devices at full speed, high speed and SuperSpeed. The problem doesn't reproduce on the newer Renesas uPD720201/uPD720202 and on old Etron EJ168 and VIA VL805 (but the VL805 has other bug). [shorten line length of log snippets in commit messge -Mathias] Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-14-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	xhci: Prevent early endpoint restart when handling STALL errors.	Mathias Nyman	3	-3/+13
	Ensure that an endpoint halted due to device STALL is not restarted before a Clear_Feature(ENDPOINT_HALT) request is sent to the device. The host side of the endpoint may otherwise be started early by the 'Set TR Deq' command completion handler which is called if dequeue is moved past a cancelled or halted TD. Prevent this with a new flag set for bulk and interrupt endpoints when a Stall Error is received. Clear it in hcd->endpoint_reset() which is called after Clear_Feature(ENDPOINT_HALT) is sent. Also add a debug message if a class driver queues a new URB after the STALL. Note that class driver might not be aware of the STALL yet when it submits the URB as URBs are given back in BH. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-13-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: move debug capabilities from trb_in_td() to handle_tx_event()	Niklas Neronin	1	-21/+17
	Function trb_in_td() currently includes debug capabilities that are triggered when its debug argument is set to true. The only consumer of these debug capabilities is handle_tx_event(), which calls trb_in_td() twice, once for its primary functionality and a second time solely for debugging purposes if the first call returns 'NULL'. This approach is inefficient and can lead to confusion, as trb_in_td() executes the same code with identical arguments twice, differing only in the debug output during the second execution. To enhance clarity and efficiency, move the debug capabilities out of trb_in_td() and integrates them directly into handle_tx_event(). This change reduces the argument count of trb_in_td() and ensures that debug steps are executed only when necessary, streamlining the function's operation. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-12-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: refactor trb_in_td() to be static	Niklas Neronin	2	-63/+61
	Relocate trb_in_td() and marks it as static, as it's exclusively utilized in xhci-ring.c. This adjustment lays the groundwork for future rework of the function. The function's logic remains unchanged; only its access specifier is altered to static and a redundant "else" is removed on line 325 (due to checkpatch.pl complaining). Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-11-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: set page size to the xHCI-supported size	Niklas Neronin	2	-20/+22
	The current xHCI driver does not validate whether a page size of 4096 bytes is supported. Address the issue by setting the page size to the value supported by the xHCI controller, as read from the Page Size register. In the event of an unexpected value; default to a 4K page size. Additionally, this commit removes unnecessary debug messages and instead prints the supported and used page size once. The xHCI controller supports page sizes of (2^{(n+12)}) bytes, where 'n' is the Page Size Bit. Only one page size is supported, with a maximum page size of 128 KB. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-10-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: correct debug message page size calculation	Niklas Neronin	1	-3/+3
	The ffs() function returns the index of the first set bit, starting from 1. If no bits are set, it returns zero. This behavior causes an off-by-one page size in the debug message, as the page size calculation [1] is zero-based, while ffs() is one-based. Fix this by subtracting one from the result of ffs(). Note that since variable 'val' is unsigned, subtracting one from zero will result in the maximum unsigned integer value. Consequently, the condition 'if (val < 16)' will still function correctly. [1], Page size: (2^(n+12)), where 'n' is the set page size bit. Fixes: 81720ec5320c ("usb: host: xhci: use ffs() in xhci_mem_init()") Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-9-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: Skip only one TD on Ring Underrun/Overrun	Michal Pecio	1	-1/+14
	If skipping is deferred to events other than Missed Service Error itsef, it means we are running on an xHCI 1.0 host and don't know how many TDs were missed until we reach some ordinary transfer completion event. And in case of ring xrun, we can't know where the xrun happened either. If we skip all pending TDs, we may prematurely give back TDs added after the xrun had occurred, risking data loss or buffer UAF by the xHC. If we skip none, a driver may become confused and stop working when all its URBs are missed and appear to be "in flight" forever. Skip exactly one TD on each xrun event - the first one that was missed, as we can now be sure that the HC has finished processing it. Provided that one more TD is queued before any subsequent doorbell ring, it will become safe to skip another TD by the time we get an xrun again. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-8-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: Expedite skipping missed isoch TDs on modern HCs	Michal Pecio	1	-4/+10
	xHCI spec rev. 1.0 allowed the TRB pointer of Missed Service events to be NULL. Having no idea which of the queued TDs were missed and which are waiting, we can only set a flag to skip missed TDs later. But HCs are also allowed to give us pointer to the last missed TRB, and this became mandatory in spec rev. 1.1 and later. Use this pointer, if available, to immediately skip all missed TDs. This reduces latency and risk of skipping-related bugs, because we can now leave the skip flag cleared for future events. Handle Missed Service Error events as 'error mid TD', if applicable, because rev. 1.0 spec excplicitly says so in notes to 4.10.3.2 and later revs in 4.10.3.2 and 4.11.2.5.2. Notes to 4.9.1 seem to apply. Tested on ASM1142 and ASM3142 v1.1 xHCs which provide TRB pointers. Tested on AMD, Etron, Renesas v1.0 xHCs which provide TRB pointers. Tested on NEC v0.96 and VIA v1.0 xHCs which send a NULL pointer. Change inspired by a discussion about realtime USB audio. Link: https://lore.kernel.org/linux-usb/76e1a191-020d-4a76-97f6-237f9bd0ede0@gmx.net/T/ Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-7-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: Fix isochronous Ring Underrun/Overrun event handling	Michal Pecio	1	-6/+14
	The TRB pointer of these events points at enqueue at the time of error occurrence on xHCI 1.1+ HCs or it's NULL on older ones. By the time we are handling the event, a new TD may be queued at this ring position. I can trigger this race by rising interrupt moderation to increase IRQ handling delay. Similar delay may occur naturally due to system load. If this ever happens after a Missed Service Error, missed TDs will be skipped and the new TD processed as if it matched the event. It could be given back prematurely, risking data loss or buffer UAF by the xHC. Don't complete TDs on xrun events and don't warn if queued TDs don't match the event's TRB pointer, which can be NULL or a link/no-op TRB. Don't warn if there are no queued TDs at all. Now that it's safe, also handle xrun events if the skip flag is clear. This ensures completion of any TD stuck in 'error mid TD' state right before the xrun event, which could happen if a driver submits a finite number of URBs to a buggy HC and then an error occurs on the last TD. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-6-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-06	usb: xhci: Complete 'error mid TD' transfers when handling Missed Service	Michal Pecio	1	-1/+5
	Missed Service Error after an error mid TD means that the failed TD has already been passed by the xHC without acknowledgment of the final TRB, a known hardware bug. So don't wait any more and give back the TD. Reproduced on NEC uPD720200 under conditions of ludicrously bad USB link quality, confirmed to behave as expected using dynamic debug. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20250306144954.3507700-5-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>