aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/arch/arm64 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-06-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-268/+202
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some missing synchronisation - A small fix for the irqbypass hook fixes, tightening the check and ensuring that we only deal with MSI for both the old and the new route entry - Rework the way the shadow LRs are addressed in a nesting configuration, plugging an embarrassing bug as well as simplifying the whole process - Add yet another fix for the dreaded arch_timer_edge_cases selftest RISC-V: - Fix the size parameter check in SBI SFENCE calls - Don't treat SBI HFENCE calls as NOPs x86 TDX: - Complete API for handling complex TDVMCALLs in userspace. This was delayed because the spec lacked a way for userspace to deny supporting these calls; the new exit code is now approved" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: TDX: Exit to userspace for GetTdVmCallInfo KVM: TDX: Handle TDG.VP.VMCALL<GetQuote> KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs KVM: arm64: VHE: Centralize ISBs when returning to host KVM: arm64: Remove cpacr_clear_set() KVM: arm64: Remove ad-hoc CPTR manipulation from kvm_hyp_handle_fpsimd() KVM: arm64: Remove ad-hoc CPTR manipulation from fpsimd_sve_sync() KVM: arm64: Reorganise CPTR trap manipulation KVM: arm64: VHE: Synchronize CPTR trap deactivation KVM: arm64: VHE: Synchronize restore of host debug registers KVM: arm64: selftests: Close the GIC FD in arch_timer_edge_cases KVM: arm64: Explicitly treat routing entry type changes as changes KVM: arm64: nv: Fix tracking of shadow list registers RISC-V: KVM: Don't treat SBI HFENCE calls as NOPs RISC-V: KVM: Fix the size parameter check in SBI SFENCE calls
2025-06-20Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds3-3/+6
Pull arm64 fixes from Will Deacon: "There's nothing major (even the vmalloc one is just suppressing a potential warning) but all worth having, nonetheless. - Suppress KASAN false positive in stack unwinding code - Drop redundant reset of the GCS state on exec() - Don't try to descend into a !present PMD when creating a huge vmap() entry at the PUD level - Fix a small typo in the arm64 booting Documentation" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth() arm64/gcs: Don't call gcs_free() during flush_gcs() arm64: Restrict pagetable teardown to avoid false warning docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
2025-06-19KVM: arm64: VHE: Centralize ISBs when returning to hostMark Rutland3-20/+14
The VHE hyp code has recently gained a few ISBs. Simplify this to one unconditional ISB in __kvm_vcpu_run_vhe(), and remove the unnecessary ISB from the kvm_call_hyp_ret() macro. While kvm_call_hyp_ret() is also used to invoke __vgic_v3_get_gic_config(), but no ISB is necessary in that case either. For the moment, an ISB is left in kvm_call_hyp(), as there are many more users, and removing the ISB would require a more thorough audit. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-8-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: Remove cpacr_clear_set()Mark Rutland1-62/+0
We no longer use cpacr_clear_set(). Remove cpacr_clear_set() and its helper functions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-7-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: Remove ad-hoc CPTR manipulation from kvm_hyp_handle_fpsimd()Mark Rutland1-9/+8
The hyp code FPSIMD/SVE/SME trap handling logic has some rather messy open-coded manipulation of CPTR/CPACR. This is benign for non-nested guests, but broken for nested guests, as the guest hypervisor's CPTR configuration is not taken into account. Consider the case where L0 provides FPSIMD+SVE to an L1 guest hypervisor, and the L1 guest hypervisor only provides FPSIMD to an L2 guest (with L1 configuring CPTR/CPACR to trap SVE usage from L2). If the L2 guest triggers an FPSIMD trap to the L0 hypervisor, kvm_hyp_handle_fpsimd() will see that the vCPU supports FPSIMD+SVE, and will configure CPTR/CPACR to NOT trap FPSIMD+SVE before returning to the L2 guest. Consequently the L2 guest would be able to manipulate SVE state even though the L1 hypervisor had configured CPTR/CPACR to forbid this. Clean this up, and fix the nested virt issue by always using __deactivate_cptr_traps() and __activate_cptr_traps() to manage the CPTR traps. This removes the need for the ad-hoc fixup in kvm_hyp_save_fpsimd_host(), and ensures that any guest hypervisor configuration of CPTR/CPACR is taken into account. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-6-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: Remove ad-hoc CPTR manipulation from fpsimd_sve_sync()Mark Rutland1-1/+4
There's no need for fpsimd_sve_sync() to write to CPTR/CPACR. All relevant traps are always disabled earlier within __kvm_vcpu_run(), when __deactivate_cptr_traps() configures CPTR/CPACR. With irrelevant details elided, the flow is: handle___kvm_vcpu_run(...) { flush_hyp_vcpu(...) { fpsimd_sve_flush(...); } __kvm_vcpu_run(...) { __activate_traps(...) { __activate_cptr_traps(...); } do { __guest_enter(...); } while (...); __deactivate_traps(....) { __deactivate_cptr_traps(...); } } sync_hyp_vcpu(...) { fpsimd_sve_sync(...); } } Remove the unnecessary write to CPTR/CPACR. An ISB is still necessary, so a comment is added to describe this requirement. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-5-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: Reorganise CPTR trap manipulationMark Rutland3-140/+130
The NVHE/HVHE and VHE modes have separate implementations of __activate_cptr_traps() and __deactivate_cptr_traps() in their respective switch.c files. There's some duplication of logic, and it's not currently possible to reuse this logic elsewhere. Move the logic into the common switch.h header so that it can be reused, and de-duplicate the common logic. This rework changes the way SVE traps are deactivated in VHE mode, aligning it with NVHE/HVHE modes: * Before this patch, VHE's __deactivate_cptr_traps() would unconditionally enable SVE for host EL2 (but not EL0), regardless of whether the ARM64_SVE cpucap was set. * After this patch, VHE's __deactivate_cptr_traps() will take the ARM64_SVE cpucap into account. When ARM64_SVE is not set, SVE will be trapped from EL2 and below. The old and new behaviour are both benign: * When ARM64_SVE is not set, the host will not touch SVE state, and will not reconfigure SVE traps. Host EL0 access to SVE will be trapped as expected. * When ARM64_SVE is set, the host will configure EL0 SVE traps before returning to EL0 as part of reloading the EL0 FPSIMD/SVE/SME state. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-4-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: VHE: Synchronize CPTR trap deactivationMark Rutland1-0/+3
Currently there is no ISB between __deactivate_cptr_traps() disabling traps that affect EL2 and fpsimd_lazy_switch_to_host() manipulating registers potentially affected by CPTR traps. When NV is not in use, this is safe because the relevant registers are only accessed when guest_owns_fp_regs() && vcpu_has_sve(vcpu), and this also implies that SVE traps affecting EL2 have been deactivated prior to __guest_entry(). When NV is in use, a guest hypervisor may have configured SVE traps for a nested context, and so it is necessary to have an ISB between __deactivate_cptr_traps() and fpsimd_lazy_switch_to_host(). Due to the current lack of an ISB, when a guest hypervisor enables SVE traps in CPTR, the host can take an unexpected SVE trap from within fpsimd_lazy_switch_to_host(), e.g. | Unhandled 64-bit el1h sync exception on CPU1, ESR 0x0000000066000000 -- SVE | CPU: 1 UID: 0 PID: 164 Comm: kvm-vcpu-0 Not tainted 6.15.0-rc4-00138-ga05e0f012c05 #3 PREEMPT | Hardware name: FVP Base RevC (DT) | pstate: 604023c9 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __kvm_vcpu_run+0x6f4/0x844 | lr : __kvm_vcpu_run+0x150/0x844 | sp : ffff800083903a60 | x29: ffff800083903a90 x28: ffff000801f4a300 x27: 0000000000000000 | x26: 0000000000000000 x25: ffff000801f90000 x24: ffff000801f900f0 | x23: ffff800081ff7720 x22: 0002433c807d623f x21: ffff000801f90000 | x20: ffff00087f730730 x19: 0000000000000000 x18: 0000000000000000 | x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 | x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 | x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 | x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff000801f90d70 | x5 : 0000000000001000 x4 : ffff8007fd739000 x3 : ffff000801f90000 | x2 : 0000000000000000 x1 : 00000000000003cc x0 : ffff800082f9d000 | Kernel panic - not syncing: Unhandled exception | CPU: 1 UID: 0 PID: 164 Comm: kvm-vcpu-0 Not tainted 6.15.0-rc4-00138-ga05e0f012c05 #3 PREEMPT | Hardware name: FVP Base RevC (DT) | Call trace: | show_stack+0x18/0x24 (C) | dump_stack_lvl+0x60/0x80 | dump_stack+0x18/0x24 | panic+0x168/0x360 | __panic_unhandled+0x68/0x74 | el1h_64_irq_handler+0x0/0x24 | el1h_64_sync+0x6c/0x70 | __kvm_vcpu_run+0x6f4/0x844 (P) | kvm_arm_vcpu_enter_exit+0x64/0xa0 | kvm_arch_vcpu_ioctl_run+0x21c/0x870 | kvm_vcpu_ioctl+0x1a8/0x9d0 | __arm64_sys_ioctl+0xb4/0xf4 | invoke_syscall+0x48/0x104 | el0_svc_common.constprop.0+0x40/0xe0 | do_el0_svc+0x1c/0x28 | el0_svc+0x30/0xcc | el0t_64_sync_handler+0x10c/0x138 | el0t_64_sync+0x198/0x19c | SMP: stopping secondary CPUs | Kernel Offset: disabled | CPU features: 0x0000,000002c0,02df4fb9,97ee773f | Memory Limit: none | ---[ end Kernel panic - not syncing: Unhandled exception ]--- Fix this by adding an ISB between __deactivate_traps() and fpsimd_lazy_switch_to_host(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250617133718.4014181-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: VHE: Synchronize restore of host debug registersMark Rutland1-0/+3
When KVM runs in non-protected VHE mode, there's no context synchronization event between __debug_switch_to_host() restoring the host debug registers and __kvm_vcpu_run() unmasking debug exceptions. Due to this, it's theoretically possible for the host to take an unexpected debug exception due to the stale guest configuration. This cannot happen in NVHE/HVHE mode as debug exceptions are masked in the hyp code, and the exception return to the host will provide the necessary context synchronization before debug exceptions can be taken. For now, avoid the problem by adding an ISB after VHE hyp code restores the host debug registers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250617133718.4014181-2-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: Explicitly treat routing entry type changes as changesSean Christopherson1-1/+2
Explicitly treat type differences as GSI routing changes, as comparing MSI data between two entries could get a false negative, e.g. if userspace changed the type but left the type-specific data as- Note, the same bug was fixed in x86 by commit bcda70c56f3e ("KVM: x86: Explicitly treat routing entry type changes as changes"). Fixes: 4bf3693d36af ("KVM: arm64: Unmap vLPIs affected by changes to GSI routing information") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250611224604.313496-3-seanjc@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-19KVM: arm64: nv: Fix tracking of shadow list registersMarc Zyngier1-39/+42
Wei-Lin reports that the tracking of shadow list registers is majorly broken when resync'ing the L2 state after a run, as we confuse the guest's LR index with the host's, potentially losing the interrupt state. While this could be fixed by adding yet another side index to track it (Wei-Lin's fix), it may be better to refactor this code to avoid having a side index altogether, limiting the risk to introduce this class of bugs. A key observation is that the shadow index is always the number of bits in the lr_map bitmap. With that, the parallel indexing scheme can be completely dropped. While doing this, introduce a couple of helpers that abstract the index conversion and some of the LR repainting, making the whole exercise much simpler. Reported-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw> Reviewed-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250614145721.2504524-1-r09922117@csie.ntu.edu.tw Link: https://lore.kernel.org/r/86qzzkc5xa.wl-maz@kernel.org
2025-06-18Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linuxLinus Torvalds1-2/+2
Pull crypto library fixes from Eric Biggers: - Fix a regression in the arm64 Poly1305 code - Fix a couple compiler warnings * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto/poly1305: Fix arm64's poly1305_blocks_arch() lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older lib/crypto: Annotate crypto strings with nonstring
2025-06-16lib/crypto/poly1305: Fix arm64's poly1305_blocks_arch()Eric Biggers1-2/+2
For some reason arm64's Poly1305 code got changed to ignore the padbit argument. As a result, the output is incorrect when the message length is not a multiple of 16 (which is not reached with the standard ChaCha20Poly1305, but bcachefs could reach this). Fix this. Fixes: a59e5468a921 ("crypto: arm64/poly1305 - Add block-only interface") Reported-by: Kent Overstreet <kent.overstreet@linux.dev> Tested-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20250616010654.367302-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-06-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds15-102/+126
Pull kvm fixes from Paolo Bonzini: "ARM: - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken. x86: - Make KVM_PRE_FAULT_MEMORY stricter for TDX, allowing userspace to pass only the "untouched" addresses and flipping the shared/private bit in the implementation. - Disable SEV-SNP support on initialization failure * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Reject direct bits in gpa passed to KVM_PRE_FAULT_MEMORY KVM: x86/mmu: Embed direct bits into gpa for KVM_PRE_FAULT_MEMORY KVM: SEV: Disable SEV-SNP support on initialization failure KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases KVM: arm64: selftests: Fix xVAL init in arch_timer_edge_cases KVM: arm64: selftests: Fix thread migration in arch_timer_edge_cases KVM: arm64: selftests: Fix help text for arch_timer_edge_cases KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operand KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysreg KVM: arm64: Add RMW specific sysreg accessor KVM: arm64: Add assignment-specific sysreg accessor
2025-06-12arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth()Tengda Wu1-1/+1
KASAN reports a stack-out-of-bounds read in regs_get_kernel_stack_nth(). Call Trace: [ 97.283505] BUG: KASAN: stack-out-of-bounds in regs_get_kernel_stack_nth+0xa8/0xc8 [ 97.284677] Read of size 8 at addr ffff800089277c10 by task 1.sh/2550 [ 97.285732] [ 97.286067] CPU: 7 PID: 2550 Comm: 1.sh Not tainted 6.6.0+ #11 [ 97.287032] Hardware name: linux,dummy-virt (DT) [ 97.287815] Call trace: [ 97.288279] dump_backtrace+0xa0/0x128 [ 97.288946] show_stack+0x20/0x38 [ 97.289551] dump_stack_lvl+0x78/0xc8 [ 97.290203] print_address_description.constprop.0+0x84/0x3c8 [ 97.291159] print_report+0xb0/0x280 [ 97.291792] kasan_report+0x84/0xd0 [ 97.292421] __asan_load8+0x9c/0xc0 [ 97.293042] regs_get_kernel_stack_nth+0xa8/0xc8 [ 97.293835] process_fetch_insn+0x770/0xa30 [ 97.294562] kprobe_trace_func+0x254/0x3b0 [ 97.295271] kprobe_dispatcher+0x98/0xe0 [ 97.295955] kprobe_breakpoint_handler+0x1b0/0x210 [ 97.296774] call_break_hook+0xc4/0x100 [ 97.297451] brk_handler+0x24/0x78 [ 97.298073] do_debug_exception+0xac/0x178 [ 97.298785] el1_dbg+0x70/0x90 [ 97.299344] el1h_64_sync_handler+0xcc/0xe8 [ 97.300066] el1h_64_sync+0x78/0x80 [ 97.300699] kernel_clone+0x0/0x500 [ 97.301331] __arm64_sys_clone+0x70/0x90 [ 97.302084] invoke_syscall+0x68/0x198 [ 97.302746] el0_svc_common.constprop.0+0x11c/0x150 [ 97.303569] do_el0_svc+0x38/0x50 [ 97.304164] el0_svc+0x44/0x1d8 [ 97.304749] el0t_64_sync_handler+0x100/0x130 [ 97.305500] el0t_64_sync+0x188/0x190 [ 97.306151] [ 97.306475] The buggy address belongs to stack of task 1.sh/2550 [ 97.307461] and is located at offset 0 in frame: [ 97.308257] __se_sys_clone+0x0/0x138 [ 97.308910] [ 97.309241] This frame has 1 object: [ 97.309873] [48, 184) 'args' [ 97.309876] [ 97.310749] The buggy address belongs to the virtual mapping at [ 97.310749] [ffff800089270000, ffff800089279000) created by: [ 97.310749] dup_task_struct+0xc0/0x2e8 [ 97.313347] [ 97.313674] The buggy address belongs to the physical page: [ 97.314604] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x14f69a [ 97.315885] flags: 0x15ffffe00000000(node=1|zone=2|lastcpupid=0xfffff) [ 97.316957] raw: 015ffffe00000000 0000000000000000 dead000000000122 0000000000000000 [ 97.318207] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 97.319445] page dumped because: kasan: bad access detected [ 97.320371] [ 97.320694] Memory state around the buggy address: [ 97.321511] ffff800089277b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 97.322681] ffff800089277b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 97.323846] >ffff800089277c00: 00 00 f1 f1 f1 f1 f1 f1 00 00 00 00 00 00 00 00 [ 97.325023] ^ [ 97.325683] ffff800089277c80: 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3 [ 97.326856] ffff800089277d00: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This issue seems to be related to the behavior of some gcc compilers and was also fixed on the s390 architecture before: commit d93a855c31b7 ("s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()") As described in that commit, regs_get_kernel_stack_nth() has confirmed that `addr` is on the stack, so reading the value at `*addr` should be allowed. Use READ_ONCE_NOCHECK() helper to silence the KASAN check for this case. Fixes: 0a8ea52c3eb1 ("arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Link: https://lore.kernel.org/r/20250604005533.1278992-1-wutengda@huaweicloud.com [will: Use '*addr' as the argument to READ_ONCE_NOCHECK()] Signed-off-by: Will Deacon <will@kernel.org>
2025-06-12arm64/gcs: Don't call gcs_free() during flush_gcs()Mark Brown1-1/+3
Currently we call gcs_free() during flush_gcs() to reset the thread state for GCS. This includes unmapping any kernel allocated GCS, but this is redundant when doing a flush_thread() since we are reinitialising the thread memory too. Inline the reinitialisation of the thread struct. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250611-arm64-gcs-flush-thread-v1-1-cc26feeddabd@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-06-12arm64: Restrict pagetable teardown to avoid false warningDev Jain1-1/+2
Commit 9c006972c3fe ("arm64: mmu: drop pXd_present() checks from pXd_free_pYd_table()") removes the pxd_present() checks because the caller checks pxd_present(). But, in case of vmap_try_huge_pud(), the caller only checks pud_present(); pud_free_pmd_page() recurses on each pmd through pmd_free_pte_page(), wherein the pmd may be none. Thus it is possible to hit a warning in the latter, since pmd_none => !pmd_table(). Thus, add a pmd_present() check in pud_free_pmd_page(). This problem was found by code inspection. Fixes: 9c006972c3fe ("arm64: mmu: drop pXd_present() checks from pXd_free_pYd_table()") Cc: stable@vger.kernel.org Reported-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250527082633.61073-1-dev.jain@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-06-11mm: pgtable: fix pte_swp_exclusiveMagnus Lindholm1-1/+1
Make pte_swp_exclusive return bool instead of int. This will better reflect how pte_swp_exclusive is actually used in the code. This fixes swap/swapoff problems on Alpha due pte_swp_exclusive not returning correct values when _PAGE_SWP_EXCLUSIVE bit resides in upper 32-bits of PTE (like on alpha). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Cc: Sam James <sam@gentoo.org> Link: https://lore.kernel.org/lkml/20250218175735.19882-2-linmag7@gmail.com/ Link: https://lore.kernel.org/lkml/20250602041118.GA2675383@ZenIV/ [ Applied as the 'sed' script Al suggested - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-06-11Merge tag 'kvmarm-fixes-6.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini15-102/+126
KVM/arm64 fixes for 6.16, take #2 - Rework of system register accessors for system registers that are directly writen to memory, so that sanitisation of the in-memory value happens at the correct time (after the read, or before the write). For convenience, RMW-style accessors are also provided. - Multiple fixes for the so-called "arch-timer-edge-cases' selftest, which was always broken.
2025-06-07Merge tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-1/+1
Pull Kbuild updates from Masahiro Yamada: - Add support for the EXPORT_SYMBOL_GPL_FOR_MODULES() macro, which exports a symbol only to specified modules - Improve ABI handling in gendwarfksyms - Forcibly link lib-y objects to vmlinux even if CONFIG_MODULES=n - Add checkers for redundant or missing <linux/export.h> inclusion - Deprecate the extra-y syntax - Fix a genksyms bug when including enum constants from *.symref files * tag 'kbuild-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits) genksyms: Fix enum consts from a reference affecting new values arch: use always-$(KBUILD_BUILTIN) for vmlinux.lds kbuild: set y instead of 1 to KBUILD_{BUILTIN,MODULES} efi/libstub: use 'targets' instead of extra-y in Makefile module: make __mod_device_table__* symbols static scripts/misc-check: check unnecessary #include <linux/export.h> when W=1 scripts/misc-check: check missing #include <linux/export.h> when W=1 scripts/misc-check: add double-quotes to satisfy shellcheck kbuild: move W=1 check for scripts/misc-check to top-level Makefile scripts/tags.sh: allow to use alternative ctags implementation kconfig: introduce menu type enum docs: symbol-namespaces: fix reST warning with literal block kbuild: link lib-y objects to vmlinux forcibly even when CONFIG_MODULES=n tinyconfig: enable CONFIG_LD_DEAD_CODE_DATA_ELIMINATION docs/core-api/symbol-namespaces: drop table of contents and section numbering modpost: check forbidden MODULE_IMPORT_NS("module:") at compile time kbuild: move kbuild syntax processing to scripts/Makefile.build Makefile: remove dependency on archscripts for header installation Documentation/kbuild: Add new gendwarfksyms kABI rules Documentation/kbuild: Drop section numbers ...
2025-06-07arch: use always-$(KBUILD_BUILTIN) for vmlinux.ldsMasahiro Yamada1-1/+1
The extra-y syntax is deprecated. Instead, use always-$(KBUILD_BUILTIN), which behaves equivalently. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Nicolas Schier <n.schier@avm.de>
2025-06-05Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds7-23/+46
Pull arm64 fixes from Will Deacon: "We've got a couple of build fixes when using LLD, a missing TLB invalidation and a workaround for broken firmware on SoCs with CPUs that implement MPAM: - Disable problematic linker assertions for broken versions of LLD - Work around sporadic link failure with LLD and various randconfig builds - Fix missing invalidation in the TLB batching code when reclaim races with mprotect() and friends - Add a command-line override for MPAM to allow booting on systems with broken firmware" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Add override for MPAM arm64/mm: Close theoretical race where stale TLB entry remains valid arm64: Work around convergence issue with LLD linker arm64: Disable LLD linker ASSERT()s for the time being
2025-06-05Merge tag 'dmaengine-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengineLinus Torvalds1-0/+165
Pull dmaengine updates from Vinod Koul: "A fairly small update for the dmaengine subsystem. This has a new ARM dmaengine driver and couple of new device support and few driver changes: New support: - Renesas RZ/V2H(P) dma support for r9a09g057 - Arm DMA-350 driver - Tegra Tegra264 ADMA support Updates: - AMD ptdma driver code removal and optimizations - Freescale edma error interrupt handler support" * tag 'dmaengine-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (27 commits) dmaengine: idxd: Remove unused pointer and macro arm64: dts: renesas: r9a09g057: Add DMAC nodes dmaengine: sh: rz-dmac: Add RZ/V2H(P) support dmaengine: sh: rz-dmac: Allow for multiple DMACs irqchip/renesas-rzv2h: Add rzv2h_icu_register_dma_req() dt-bindings: dma: rz-dmac: Document RZ/V2H(P) family of SoCs dt-bindings: dma: rz-dmac: Restrict properties for RZ/A1H dmaengine: idxd: Narrow the restriction on BATCH to ver. 1 only dmaengine: ti: Add NULL check in udma_probe() fsldma: Set correct dma_mask based on hw capability dmaengine: idxd: Check availability of workqueue allocated by idxd wq driver before using dmaengine: xilinx_dma: Set dma_device directions dmaengine: tegra210-adma: Add Tegra264 support dt-bindings: Document Tegra264 ADMA support dmaengine: dw-edma: Add HDMA NATIVE map check dmaegnine: fsl-edma: add edma error interrupt handler dt-bindings: dma: fsl-edma: increase maxItems of interrupts and interrupt-names dmaengine: ARM_DMA350 should depend on ARM/ARM64 dt-bindings: dma: qcom,bam: Document dma-coherent property dmaengine: Add Arm DMA-350 driver ...
2025-06-05KVM: arm64: Make __vcpu_sys_reg() a pure rvalue operandMarc Zyngier1-5/+5
Now that we don't have any use of __vcpu_sys_reg() as a lvalue, remove the in-place update, and directly return the sanitised value. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05KVM: arm64: Don't use __vcpu_sys_reg() to get the address of a sysregMarc Zyngier2-3/+3
We are about to prevent the use of __vcpu_sys_reg() as a lvalue, and getting the address of a rvalue is not a thing. Update the couple of places where we do this to use the __ctxt_sys_reg() accessor, which return the address of a register. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05KVM: arm64: Add RMW specific sysreg accessorMarc Zyngier6-21/+32
In a number of cases, we perform a Read-Modify-Write operation on a system register, meaning that we would apply the RESx masks twice. Instead, provide a new accessor that performs this RMW operation, allowing the masks to be applied exactly once per operation. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-05KVM: arm64: Add assignment-specific sysreg accessorMarc Zyngier12-73/+86
Assigning a value to a system register doesn't do what it is supposed to be doing if that register is one that has RESx bits. The main problem is that we use __vcpu_sys_reg(), which can be used both as a lvalue and rvalue. When used as a lvalue, the bit masking occurs *before* the new value is assigned, meaning that we (1) do pointless work on the old cvalue, and (2) potentially assign an invalid value as we fail to apply the masks to it. Fix this by providing a new __vcpu_assign_sys_reg() that does what it says on the tin, and sanitises the *new* value instead of the old one. This comes with a significant amount of churn. Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250603070824.1192795-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-04Merge tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pciLinus Torvalds1-1/+1
Pull pci updates from Bjorn Helgaas: "Enumeration: - Print the actual delay time in pci_bridge_wait_for_secondary_bus() instead of assuming it was 1000ms (Wilfred Mallawa) - Revert 'iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices', which broke resume from system sleep on AMD platforms and has been fixed by other commits (Lukas Wunner) Resource management: - Remove mtip32xx use of pcim_iounmap_regions(), which is deprecated and unnecessary (Philipp Stanner) - Remove pcim_iounmap_regions() and pcim_request_region_exclusive() and related flags since all uses have been removed (Philipp Stanner) - Rework devres 'request' functions so they are no longer 'hybrid', i.e., their behavior no longer depends on whether pcim_enable_device or pci_enable_device() was used, and remove related code (Philipp Stanner) - Warn (not BUG()) about failure to assign optional resources (Ilpo Järvinen) Error handling: - Log the DPC Error Source ID only when it's actually valid (when ERR_FATAL or ERR_NONFATAL was received from a downstream device) and decode into bus/device/function (Bjorn Helgaas) - Determine AER log level once and save it so all related messages use the same level (Karolina Stolarek) - Use KERN_WARNING, not KERN_ERR, when logging PCIe Correctable Errors (Karolina Stolarek) - Ratelimit PCIe Correctable and Non-Fatal error logging, with sysfs controls on interval and burst count, to avoid flooding logs and RCU stall warnings (Jon Pan-Doh) Power management: - Increment PM usage counter when probing reset methods so we don't try to read config space of a powered-off device (Alex Williamson) - Set all devices to D0 during enumeration to ensure ACPI opregion is connected via _REG (Mario Limonciello) Power control: - Rename pwrctrl Kconfig symbols from 'PWRCTL' to 'PWRCTRL' to match the filename paths. Retain old deprecated symbols for compatibility, except for the pwrctrl slot driver (PCI_PWRCTRL_SLOT) (Johan Hovold) - When unregistering pwrctrl, cancel outstanding rescan work before cleaning up data structures to avoid use-after-free issues (Brian Norris) Bandwidth control: - Simplify link bandwidth controller by replacing the count of Link Bandwidth Management Status (LBMS) events with a PCI_LINK_LBMS_SEEN flag (Ilpo Järvinen) - Update the Link Speed after retraining, since the Link Speed may have changed (Ilpo Järvinen) PCIe native device hotplug: - Ignore Presence Detect Changed caused by DPC. pciehp already ignores Link Down/Up events caused by DPC, but on slots using in-band presence detect, DPC causes a spurious Presence Detect Changed event (Lukas Wunner) - Ignore Link Down/Up caused by Secondary Bus Reset. On hotplug ports using in-band presence detect, the reset causes a Presence Detect Changed event, which mistakenly caused teardown and re-enumeration of the device. Drivers may need to annotate code that resets their device (Lukas Wunner) Virtualization: - Add an ACS quirk for Loongson Root Ports that don't advertise ACS but don't allow peer-to-peer transactions between Root Ports; the quirk allows each Root Port to be in a separate IOMMU group (Huacai Chen) Endpoint framework: - For fixed-size BARs, retain both the actual size and the possibly larger size allocated to accommodate iATU alignment requirements (Jerome Brunet) - Simplify ctrl/SPAD space allocation and avoid allocating more space than needed (Jerome Brunet) - Correct MSI-X PBA offset calculations for DesignWare and Cadence endpoint controllers (Niklas Cassel) - Align the return value (number of interrupts) encoding for pci_epc_get_msi()/pci_epc_ops::get_msi() and pci_epc_get_msix()/pci_epc_ops::get_msix() (Niklas Cassel) - Align the nr_irqs parameter encoding for pci_epc_set_msi()/pci_epc_ops::set_msi() and pci_epc_set_msix()/pci_epc_ops::set_msix() (Niklas Cassel) Common host controller library: - Convert pci-host-common to a library so platforms that don't need native host controller drivers don't need to include these helper functions (Manivannan Sadhasivam) Apple PCIe controller driver: - Extract ECAM bridge creation helper from pci_host_common_probe() to separate driver-specific things like MSI from PCI things (Marc Zyngier) - Dynamically allocate RID-to_SID bitmap to prepare for SoCs with varying capabilities (Marc Zyngier) - Skip ports disabled in DT when setting up ports (Janne Grunau) - Add t6020 compatible string (Alyssa Rosenzweig) - Add T602x PCIe support (Hector Martin) - Directly set/clear INTx mask bits because T602x dropped the accessors that could do this without locking (Marc Zyngier) - Move port PHY registers to their own reg items to accommodate T602x, which moves them around; retain default offsets for existing DTs that lack phy%d entries with the reg offsets (Hector Martin) - Stop polling for core refclk, which doesn't work on T602x and the bootloader has already done anyway (Hector Martin) - Use gpiod_set_value_cansleep() when asserting PERST# in probe because we're allowed to sleep there (Hector Martin) Cadence PCIe controller driver: - Drop a runtime PM 'put' to resolve a runtime atomic count underflow (Hans Zhang) - Make the cadence core buildable as a module (Kishon Vijay Abraham I) - Add cdns_pcie_host_disable() and cdns_pcie_ep_disable() for use by loadable drivers when they are removed (Siddharth Vadapalli) Freescale i.MX6 PCIe controller driver: - Apply link training workaround only on IMX6Q, IMX6SX, IMX6SP (Richard Zhu) - Remove redundant dw_pcie_wait_for_link() from imx_pcie_start_link(); since the DWC core does this, imx6 only needs it when retraining for a faster link speed (Richard Zhu) - Toggle i.MX95 core reset to align with PHY powerup (Richard Zhu) - Set SYS_AUX_PWR_DET to work around i.MX95 ERR051624 erratum: in some cases, the controller can't exit 'L23 Ready' through Beacon or PERST# deassertion (Richard Zhu) - Clear GEN3_ZRXDC_NONCOMPL to work around i.MX95 ERR051586 erratum: controller can't meet 2.5 GT/s ZRX-DC timing when operating at 8 GT/s, causing timeouts in L1 (Richard Zhu) - Wait for i.MX95 PLL lock before enabling controller (Richard Zhu) - Save/restore i.MX95 LUT for suspend/resume (Richard Zhu) Mobiveil PCIe controller driver: - Return bool (not int) for link-up check in mobiveil_pab_ops.link_up() and layerscape-gen4, mobiveil (Hans Zhang) NVIDIA Tegra194 PCIe controller driver: - Create debugfs directory for 'aspm_state_cnt' only when CONFIG_PCIEASPM is enabled, since there are no other entries (Hans Zhang) Qualcomm PCIe controller driver: - Add OF support for parsing DT 'eq-presets-<N>gts' property for lane equalization presets (Krishna Chaitanya Chundru) - Read Maximum Link Width from the Link Capabilities register if DT lacks 'num-lanes' property (Krishna Chaitanya Chundru) - Add Physical Layer 64 GT/s Capability ID and register offsets for 8, 32, and 64 GT/s lane equalization registers (Krishna Chaitanya Chundru) - Add generic dwc support for configuring lane equalization presets (Krishna Chaitanya Chundru) - Add DT and driver support for PCIe on IPQ5018 SoC (Nitheesh Sekar) Renesas R-Car PCIe controller driver: - Describe endpoint BAR 4 as being fixed size (Jerome Brunet) - Document how to obtain R-Car V4H (r8a779g0) controller firmware (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Reorder rockchip_pci_core_rsts because reset_control_bulk_deassert() deasserts in reverse order, to fix a link training regression (Jensen Huang) - Mark RK3399 as being capable of raising INTx interrupts (Niklas Cassel) Rockchip DesignWare PCIe controller driver: - Check only PCIE_LINKUP, not LTSSM status, to determine whether the link is up (Shawn Lin) - Increase N_FTS (used in L0s->L0 transitions) and enable ASPM L0s for Root Complex and Endpoint modes (Shawn Lin) - Hide the broken ATS Capability in rockchip_pcie_ep_init() instead of rockchip_pcie_ep_pre_init() so it stays hidden after PERST# resets non-sticky registers (Shawn Lin) - Call phy_power_off() before phy_exit() in rockchip_pcie_phy_deinit() (Diederik de Haas) Synopsys DesignWare PCIe controller driver: - Set PORT_LOGIC_LINK_WIDTH to one lane to make initial link training more robust; this will not affect the intended link width if all lanes are functional (Wenbin Yao) - Return bool (not int) for link-up check in dw_pcie_ops.link_up() and armada8k, dra7xx, dw-rockchip, exynos, histb, keembay, keystone, kirin, meson, qcom, qcom-ep, rcar_gen4, spear13xx, tegra194, uniphier, visconti (Hans Zhang) - Add debugfs support for exposing DWC device-specific PTM context (Manivannan Sadhasivam) TI J721E PCIe driver: - Make j721e buildable as a loadable and removable module (Siddharth Vadapalli) - Fix j721e host/endpoint dependencies that result in link failures in some configs (Arnd Bergmann) Device tree bindings: - Add qcom DT binding for 'global' interrupt (PCIe controller and link-specific events) for ipq8074, ipq8074-gen3, ipq6018, sa8775p, sc7280, sc8180x sdm845, sm8150, sm8250, sm8350 (Manivannan Sadhasivam) - Add qcom DT binding for 8 MSI SPI interrupts for msm8998, ipq8074, ipq8074-gen3, ipq6018 (Manivannan Sadhasivam) - Add dw rockchip DT binding for rk3576 and rk3562 (Kever Yang) - Correct indentation and style of examples in brcm,stb-pcie, cdns,cdns-pcie-ep, intel,keembay-pcie-ep, intel,keembay-pcie, microchip,pcie-host, rcar-pci-ep, rcar-pci-host, xilinx-versal-cpm (Krzysztof Kozlowski) - Convert Marvell EBU (dove, kirkwood, armada-370, armada-xp) and armada8k from text to schema DT bindings (Rob Herring) - Remove obsolete .txt DT bindings for content that has been moved to schemas (Rob Herring) - Add qcom DT binding for MHI registers in IPQ5332, IPQ6018, IPQ8074 and IPQ9574 (Varadarajan Narayanan) - Convert v3,v360epc-pci from text to DT schema binding (Rob Herring) - Change microchip,pcie-host DT binding to be 'dma-noncoherent' since PolarFire may be configured that way (Conor Dooley) Miscellaneous: - Drop 'pci' suffix from intel_mid_pci.c filename to match similar files (Andy Shevchenko) - All platforms with PCI have an MMU, so add PCI Kconfig dependency on MMU to simplify build testing and avoid inadvertent build regressions (Arnd Bergmann) - Update Krzysztof Wilczyński's email address in MAINTAINERS (Krzysztof Wilczyński) - Update Manivannan Sadhasivam's email address in MAINTAINERS (Manivannan Sadhasivam)" * tag 'pci-v6.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (147 commits) MAINTAINERS: Update Manivannan Sadhasivam email address PCI: j721e: Fix host/endpoint dependencies PCI: j721e: Add support to build as a loadable module PCI: cadence-ep: Introduce cdns_pcie_ep_disable() helper for cleanup PCI: cadence-host: Introduce cdns_pcie_host_disable() helper for cleanup PCI: cadence: Add support to build pcie-cadence library as a kernel module MAINTAINERS: Update Krzysztof Wilczyński email address PCI: Remove unnecessary linesplit in __pci_setup_bridge() PCI: WARN (not BUG()) when we fail to assign optional resources PCI: Remove unused pci_printk() PCI: qcom: Replace PERST# sleep time with proper macro PCI: dw-rockchip: Replace PERST# sleep time with proper macro PCI: host-common: Convert to library for host controller drivers PCI/ERR: Remove misleading TODO regarding kernel panic PCI: cadence: Remove duplicate message code definitions PCI: endpoint: Align pci_epc_set_msix(), pci_epc_ops::set_msix() nr_irqs encoding PCI: endpoint: Align pci_epc_set_msi(), pci_epc_ops::set_msi() nr_irqs encoding PCI: endpoint: Align pci_epc_get_msix(), pci_epc_ops::get_msix() return value encoding PCI: endpoint: Align pci_epc_get_msi(), pci_epc_ops::get_msi() return value encoding PCI: cadence-ep: Correct PBA offset in .set_msix() callback ...
2025-06-03Merge tag 'mfd-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfdLinus Torvalds1-1/+1
Pull MFD updates from Lee Jones: "Samsung Exynos ACPM: - Populate child platform devices from device tree data - Introduce a new API, 'devm_acpm_get_by_node()', for child devices to get the ACPM handle ROHM PMICs: - Add support for the ROHM BD96802 scalable companion PMIC to the BD96801 core driver - Add support for controlling the BD96802 using the BD96801 regulator driver - Add support to the BD96805, which is almost identical to the BD96801 - Add support to the BD96806, which is similar to the BD96802 Maxim MAX77759: - Add a core driver for the MAX77759 companion PMIC - Add a GPIO driver for the expander functions on the MAX77759 - Add an NVMEM driver to expose the non-volatile memory on the MAX77759 STMicroelectronics STM32MP25: - Add support for the STM32MP25 SoC to the stm32-lptimer - Add support for the STM32MP25 to the clocksource driver, handling new register access requirements - Add support for the STM32MP25 to the PWM driver, enabling up to two PWM outputs Broadcom BCM590xx: - Add support for the BCM59054 PMU - Parse the PMU ID and revision to support behavioral differences between chip revisions - Add regulator support for the BCM59054 Samsung S2MPG10: - Add support for the S2MPG10 PMIC, which communicates via the Samsung ACPM firmware instead of I2C Exynos ACPM: - Improve timeout detection reliability by using ktime APIs instead of a loop counter assumption - Allow PMIC access during late system shutdown by switching to 'udelay()' instead of a sleeping function - Fix an issue where reading command results longer than 8 bytes would fail - Silence non-error '-EPROBE_DEFER' messages during boot to clean up logs Exynos LPASS: - Fix an error handling path by switching to 'devm_regmap_init_mmio()' to prevent resource leaks - Fix a bug where 'exynos_lpass_disable()' was called twice in the remove function - Fix another resource leak in the probe's error path by using 'devm_add_action_or_reset()' Samsung SEC: - Handle the s2dos05, which does not have IRQ support, explicitly to prevent warnings - Fix the core driver to correctly handle errors from 'sec_irq_init()' instead of ignoring them STMPE-SPI: - Correct an undeclared identifier in the 'MODULE_DEVICE_TABLE' macro MAINTAINERS: - Adjust a file path for the Siemens IPC LED drivers entry to fix a broken reference Maxim Drivers: - Correct the spelling of "Electronics" in Samsung copyright headers across multiple files General: - Fix wakeup source memory leaks on device unbind for 88pm886, as3722, max14577, max77541, max77705, max8925, rt5033, and sprd-sc27xx drivers Samsung SEC Drivers: - Split the driver into a transport-agnostic core ('sec-core') and transport-specific ('sec-i2c', 'sec-acpm') modules to support non-I2C devices - Merge the 'sec-core' and 'sec-irq' modules to reduce memory consumption - Move internal APIs to a private header to clean up the public API - Improve code style by sorting includes, cleaning up headers, sorting device tables, and using helper macros like 'dev_err_probe()', 'MFD_CELL', and 'REGMAP_IRQ_REG' - Make regmap configuration for s2dos05/s2mpu05 explicit to improve clarity - Rework platform data and regmap instantiation to use OF match data instead of a large switch statement ROHM BD96801/2: - Prepare the driver for new models by separating chip-specific data into its own structure - Drop IC name prefix from IRQ resource names in both the MFD and regulator drivers for simplification Broadcom BCM590xx: - Refactor the regulator driver to store descriptions in a table to ease support for new chips - Rename BCM59056-specific data to prepare for the addition of other regulators - Use 'dev_err_probe()' for cleaner error handling Exynos ACPM: - Correct kerneldoc warnings and use the conventional 'np' argument name General MFD: - Convert 'aat2870' and 'tps65010' to use the per-client debugfs directory provided by the I2C core - Convert 'sm501', 'tps65010' and 'ucb1x00' to use the new GPIO line value setter callbacks - Constify 'regmap_irq_chip' and other structures in '88pm886' to move data to read-only sections BCM590xx: - Drop the unused "id" member from the 'bcm590xx' struct in preparation for a replacement Samsung SEC Core: - Remove forward declarations for functions that no longer exist SM501: - Remove the unused 'sm501_find_clock()' function New Compatibles: - Google: Add a PMIC child node to the 'google,gs101-acpm-ipc' binding - ROHM: Add new bindings for 'rohm,bd96802-regulator' and 'rohm,bd96802-pmic', and add compatibles for BD96805 and BD96806 - Maxim: Add new bindings for 'maxim,max77759-gpio', 'maxim,max77759-nvmem', and the top-level 'maxim,max77759' - STM: Add 'stm32mp25' compatible to the 'stm32-lptimer' binding - Broadcom: Add 'bcm59054' compatible - Atmel/Microchip: Add 'microchip,sama7d65-gpbr' and 'microchip,sama7d65-secumod' compatibles - Samsung: Add 's2mpg10' compatible to the 'samsung,s2mps11' MFD binding - MediaTek: Add compatibles for 'mt6893' (scpsys), 'mt7988-topmisc', and 'mt8365-infracfg-nao' - Qualcomm: Add 'qcom,apq8064-mmss-sfpb' and 'qcom,apq8064-sps-sic' syscon compatibles Refactoring & Cleanup: - Convert Broadcom BCM59056 devicetree bindings to YAML and split them into MFD and regulator parts - Convert the Microchip AT91 secumod binding to YAML - Drop unrelated consumer nodes from binding examples to reduce bloat - Correct indentation and style in various DTS examples" * tag 'mfd-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (81 commits) mfd: maxim: Correct Samsung "Electronics" spelling in copyright headers mfd: maxim: Correct Samsung "Electronics" spelling in headers mfd: sm501: Remove unused sm501_find_clock mfd: 88pm886: Constify struct regmap_irq_chip and some other structures dt-bindings: mfd: syscon: Add mediatek,mt8365-infracfg-nao mfd: sprd-sc27xx: Fix wakeup source leaks on device unbind mfd: rt5033: Fix wakeup source leaks on device unbind mfd: max8925: Fix wakeup source leaks on device unbind mfd: max77705: Fix wakeup source leaks on device unbind mfd: max77541: Fix wakeup source leaks on device unbind mfd: max14577: Fix wakeup source leaks on device unbind mfd: as3722: Fix wakeup source leaks on device unbind mfd: 88pm886: Fix wakeup source leaks on device unbind dt-bindings: mfd: Correct indentation and style in DTS example dt-bindings: mfd: Drop unrelated nodes from DTS example dt-bindings: mfd: syscon: Add qcom,apq8064-sps-sic dt-bindings: mfd: syscon: Add qcom,apq8064-mmss-sfpb mfd: stmpe-spi: Correct the name used in MODULE_DEVICE_TABLE dt-bindings: mfd: syscon: Add mt7988-topmisc mfd: exynos-lpass: Fix another error handling path in exynos_lpass_probe() ...
2025-06-03Merge tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linuxLinus Torvalds2-9/+54
Pull hyperv updates from Wei Liu: - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel) - Fixes for Hyper-V UIO driver (Long Li) - Fixes for Hyper-V PCI driver (Michael Kelley) - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley) - Documentation updates for Hyper-V VMBus (Michael Kelley) - Enhance logging for hv_kvp_daemon (Shradha Gupta) * tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits) Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir Documentation: hyperv: Update VMBus doc with new features and info PCI: hv: Remove unnecessary flex array in struct pci_packet Drivers: hv: Remove hv_alloc/free_* helpers Drivers: hv: Use kzalloc for panic page allocation uio_hv_generic: Align ring size to system page uio_hv_generic: Use correct size for interrupt and monitor pages Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary arch/x86: Provide the CPU number in the wakeup AP callback x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap() PCI: hv: Get vPCI MSI IRQ domain from DeviceTree ACPI: irq: Introduce acpi_get_gsi_dispatcher() Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device() Drivers: hv: vmbus: Get the IRQ number from DeviceTree dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties arm64, x86: hyperv: Report the VTL the system boots in arm64: hyperv: Initialize the Virtual Trust Level field Drivers: hv: Provide arch-neutral implementation of get_vtl() Drivers: hv: Enable VTL mode for arm64 ...
2025-06-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds10-132/+147
Pull more kvm updates from Paolo Bonzini: Generic: - Clean up locking of all vCPUs for a VM by using the *_nest_lock() family of functions, and move duplicated code to virt/kvm/. kernel/ patches acked by Peter Zijlstra - Add MGLRU support to the access tracking perf test ARM fixes: - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet s390: - Fix interaction between some filesystems and Secure Execution - Some cleanups and refactorings, preparing for an upcoming big series x86: - Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM - Refine and harden handling of spurious faults - Add support for ALLOWED_SEV_FEATURES - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits - Don't account temporary allocations in sev_send_update_data() - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX - Advertise support to userspace for WRMSRNS and PREFETCHI - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing - Add a module param to control and enumerate support for device posted interrupts - Fix a potential overflow with nested virt on Intel systems running 32-bit kernels - Flush shadow VMCSes on emergency reboot - Add support for SNP to the various SEV selftests - Add a selftest to verify fastops instructions via forced emulation - Refine and optimize KVM's software processing of the posted interrupt bitmap, and share the harvesting code between KVM and the kernel's Posted MSI handler" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) rtmutex_api: provide correct extern functions KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race KVM: arm64: Unmap vLPIs affected by changes to GSI routing information KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding() KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock KVM: arm64: Use lock guard in vgic_v4_set_forwarding() KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue KVM: s390: Simplify and move pv code KVM: s390: Refactor and split some gmap helpers KVM: s390: Remove unneeded srcu lock s390: Remove unneeded includes s390/uv: Improve splitting of large folios that cannot be split while dirty s390/uv: Always return 0 from s390_wiggle_split_folio() if successful s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful rust: add helper for mutex_trylock RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation ...
2025-06-02arm64: Add override for MPAMXi Ruoyao4-18/+23
As the message of the commit 09e6b306f3ba ("arm64: cpufeature: discover CPU support for MPAM") already states, if a buggy firmware fails to either enable MPAM or emulate the trap as if it were disabled, the kernel will just fail to boot. While upgrading the firmware should be the best solution, we have some hardware of which the vendor have made no response 2 months after we requested a firmware update. Allow overriding it so our devices don't become some e-waste. Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: Mingcong Bai <jeffbai@aosc.io> Cc: Shaopeng Tan <tan.shaopeng@fujitsu.com> Cc: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250602043723.216338-1-xry111@xry111.site Signed-off-by: Will Deacon <will@kernel.org>
2025-06-02arm64/mm: Close theoretical race where stale TLB entry remains validRyan Roberts1-4/+5
Commit 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries") describes a race that, prior to the commit, could occur between reclaim and operations such as mprotect() when using reclaim's tlbbatch mechanism. See that commit for details but the summary is: """ Nadav Amit identified a theoritical race between page reclaim and mprotect due to TLB flushes being batched outside of the PTL being held. He described the race as follows: CPU0 CPU1 ---- ---- user accesses memory using RW PTE [PTE now cached in TLB] try_to_unmap_one() ==> ptep_get_and_clear() ==> set_tlb_ubc_flush_pending() mprotect(addr, PROT_READ) ==> change_pte_range() ==> [ PTE non-present - no flush ] user writes using cached RW PTE ... try_to_unmap_flush() """ The solution was to insert flush_tlb_batched_pending() in mprotect() and friends to explcitly drain any pending reclaim TLB flushes. In the modern version of this solution, arch_flush_tlb_batched_pending() is called to do that synchronisation. arm64's tlbbatch implementation simply issues TLBIs at queue-time (arch_tlbbatch_add_pending()), eliding the trailing dsb(ish). The trailing dsb(ish) is finally issued in arch_tlbbatch_flush() at the end of the batch to wait for all the issued TLBIs to complete. Now, the Arm ARM states: """ The completion of the TLB maintenance instruction is guaranteed only by the execution of a DSB by the observer that performed the TLB maintenance instruction. The execution of a DSB by a different observer does not have this effect, even if the DSB is known to be executed after the TLB maintenance instruction is observed by that different observer. """ arch_tlbbatch_add_pending() and arch_tlbbatch_flush() conform to this requirement because they are called from the same task (either kswapd or caller of madvise(MADV_PAGEOUT)), so either they are on the same CPU or if the task was migrated, __switch_to() contains an extra dsb(ish). HOWEVER, arm64's arch_flush_tlb_batched_pending() is also implemented as a dsb(ish). But this may be running on a CPU remote from the one that issued the outstanding TLBIs. So there is no architectural gurantee of synchonization. Therefore we are still vulnerable to the theoretical race described in Commit 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries"). Fix this by flushing the entire mm in arch_flush_tlb_batched_pending(). This aligns with what the other arches that implement the tlbbatch feature do. Cc: <stable@vger.kernel.org> Fixes: 43b3dfdd0455 ("arm64: support batched/deferred tlb shootdown during page reclamation/migration") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250530152445.2430295-1-ryan.roberts@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-06-02arm64: Work around convergence issue with LLD linkerArd Biesheuvel2-1/+12
LLD will occasionally error out with a '__init_end does not converge' error if INIT_IDMAP_DIR_SIZE is defined in terms of _end, as this results in a circular dependency. Counter this by dimensioning the initial IDMAP page tables based on a new boundary marker 'kimage_limit', and define it such that its value should not change as a result of the initdata segment being pushed over a 64k segment boundary due to changes in INIT_IDMAP_DIR_SIZE, provided that its value doesn't change by more than 2M between linker passes. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250531123005.3866382-2-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>
2025-06-02arm64: Disable LLD linker ASSERT()s for the time beingArd Biesheuvel1-0/+6
It turns out [1] that the way LLD handles ASSERT()s in the linker script can result in spurious failures, so disable them for the newly introduced BSS symbol export checks. Since we're not aware of any issues with the existing assertions in vmlinux.lds.S, leave those alone for now so that they can continue to provide useful coverage. A linker fix [2] is due to land in version 21 of LLD. Link: https://lore.kernel.org/r/202505261019.OUlitN6m-lkp@intel.com [1] Link: https://github.com/llvm/llvm-project/commit/5859863bab7f [2] Link: https://github.com/ClangBuiltLinux/linux/issues/2094 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505261019.OUlitN6m-lkp@intel.com/ Link: https://lore.kernel.org/r/20250529073507.2984959-2-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>
2025-06-02Merge tag 'kvmarm-fixes-6.16-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini7-72/+133
KVM/arm64 fixes for 6.16, take #1 - Make the irqbypass hooks resilient to changes in the GSI<->MSI routing, avoiding behind stale vLPI mappings being left behind. The fix is to resolve the VGIC IRQ using the host IRQ (which is stable) and nuking the vLPI mapping upon a routing change. - Close another VGIC race where vCPU creation races with VGIC creation, leading to in-flight vCPUs entering the kernel w/o private IRQs allocated. - Fix a build issue triggered by the recently added workaround for Ampere's AC04_CPU_23 erratum. - Correctly sign-extend the VA when emulating a TLBI instruction potentially targeting a VNCR mapping. - Avoid dereferencing a NULL pointer in the VGIC debug code, which can happen if the device doesn't have any mapping yet.
2025-05-31Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds2-2/+2
Pull non-MM updates from Andrew Morton: - "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores - "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts - "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter - "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count - "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c - "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts * tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits) llist: make llist_add_batch() a static inline delayacct: remove redundant code and adjust indentation squashfs: add optional full compressed block caching crash_dump, nvme: select CONFIGFS_FS as built-in scripts/gdb/symbols: determine KASLR offset on s390 during early boot scripts/gdb/symbols: factor out pagination_off() scripts/gdb/symbols: factor out get_vmlinux() kernel/panic.c: format kernel-doc comments mailmap: update and consolidate Casey Connolly's name and email nilfs2: remove wbc->for_reclaim handling fork: define a local GFP_VMAP_STACK fork: check charging success before zeroing stack fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code fork: clean-up ifdef logic around stack allocation kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count x86/crash: make the page that stores the dm crypt keys inaccessible x86/crash: pass dm crypt keys to kdump kernel Revert "x86/mm: Remove unused __set_memory_prot()" crash_dump: retrieve dm crypt keys in kdump kernel ...
2025-05-31Merge tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds12-67/+173
Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
2025-05-31Merge tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-genericLinus Torvalds5-60/+5
Pull compiler version requirement update from Arnd Bergmann: "Require gcc-8 and binutils-2.30 x86 already uses gcc-8 as the minimum version, this changes all other architectures to the same version. gcc-8 is used is Debian 10 and Red Hat Enterprise Linux 8, both of which are still supported, and binutils 2.30 is the oldest corresponding version on those. Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as the system compiler but additionally include toolchains that remain supported. With the new minimum toolchain versions, a number of workarounds for older versions can be dropped, in particular on x86_64 and arm64. Importantly, the updated compiler version allows removing two of the five remaining gcc plugins, as support for sancov and structeak features is already included in modern compiler versions. I tried collecting the known changes that are possible based on the new toolchain version, but expect that more cleanups will be possible. Since this touches multiple architectures, I merged the patches through the asm-generic tree." * tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV Documentation: update binutils-2.30 version reference gcc-plugins: remove SANCOV gcc plugin Kbuild: remove structleak gcc plugin arm64: drop binutils version checks raid6: skip avx512 checks kbuild: require gcc-8 and binutils-2.30
2025-05-31Merge tag 'soc-dt-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds557-7730/+61669
Pull SoC devicetree updates from Arnd Bergmann: "There are 11 newly supported SoCs, but these are all either new variants of existing designs, or straight reuses of the existing chip in a new package: - RK3562 is a new chip based on the old Cortex-A53 core, apparently a low-cost version of the Cortex-A55 based RK3568/RK3566. - NXP i.MX94 is a minor variation of i.MX93/i.MX95 with a different set of on-chip peripherals. - Renesas RZ/V2N (R9A09G056) is a new member of the larger RZ/V2 family - Amlogic S6/S7/S7D - Samsung Exynos7870 is an older chip similar to Exynos7885 - WonderMedia wm8950 is a minor variation on the wm8850 chip - Amlogic s805y is almost idential to s805x - Allwinner A523 is similar to A527 and T527 - Qualcomm MSM8926 is a variant of MSM8226 - Qualcomm Snapdragon X1P42100 is related to R1E80100 There are also 65 boards, including reference designs for the chips above, this includes - 12 new boards based on TI K3 series chips, most of them from Toradex - 10 devices using Rockchips RK35xx and PX30 chips - 2 phones and 2 laptops based on Qualcomm Snapdragon designs - 10 NXP i.MX8/i.MX9 boards, mostly for embedded/industrial uses - 3 Samsung Galaxy phones based on Exynos7870 - 5 Allwinner based boards using a variety of ARMv8 chips - 9 32-bit machines, each based on a different SoC family Aside from the new hardware, there is the usual set of cleanups and newly added hardware support on existing machines, for a total of 965 devicetree changesets" * tag 'soc-dt-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (956 commits) MAINTAINERS, mailmap: update Sven Peter's email address arm64: dts: renesas: rzg3e-smarc-som: Reduce I2C2 clock frequency arm64: dts: nuvoton: Add pinctrl ARM: dts: samsung: sp5v210-aries: Align wifi node name with bindings arm64: dts: blaize-blzp1600: Enable GPIO support dt-bindings: clock: socfpga: convert to yaml arm64: dts: rockchip: move rk3562 pinctrl node outside the soc node arm64: dts: rockchip: fix rk3562 pcie unit addresses arm64: dts: rockchip: move rk3528 pinctrl node outside the soc node arm64: dts: rockchip: remove a double-empty line from rk3576 core dtsi arm64: dts: rockchip: move rk3576 pinctrl node outside the soc node arm64: dts: rockchip: fix rk3576 pcie unit addresses arm64: dts: rockchip: Drop assigned-clock* from cpu nodes on rk3588 arm64: dts: rockchip: Add missing SFC power-domains to rk3576 Revert "arm64: dts: mediatek: mt8390-genio-common: Add firmware-name for scp0" arm64: dts: mediatek: mt8188: Address binding warnings for MDP3 nodes arm64: dts: mt6359: Rename RTC node to match binding expectations arm64: dts: mt8365-evk: Add goodix touchscreen support arm64: dts: mediatek: mt8188: Add missing #reset-cells property arm64: dts: airoha: en7581: Add PCIe nodes to EN7581 SoC evaluation board ...
2025-05-31Merge tag 'soc-defconfig-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds1-0/+26
Pull SoC defconfig updates from Arnd Bergmann: "The usual defconfig updates enable configuration options for drivers that got added. A few SoC specific options are enabled in Kconfig files instead, in place of the defconfig files" * tag 'soc-defconfig-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: defconfig: enable ACPM protocol and Exynos mailbox arm64: defconfig: Enable configs for MediaTek Genio EVK boards arm64: defconfig: mediatek: enable PHY drivers arm64: defconfig: Enable Rockchip SAI and ES8328 arm64: defconfig: Add Toradex Embedded Controller config arm64: defconfig: Enable TPIC2810 GPIO expander riscv: defconfig: spacemit: enable clock controller driver for SpacemiT K1 arm64: defconfig: Enable TMP102 as module arm64: defconfig: Enable hwspinlock and eQEP for K3 arm64: defconfig: Add CDNS_DSI and CDNS_PHY config riscv: defconfig: spacemit: enable gpio support for K1 SoC arm64: defconfig: Enable IPQ5424 RDP466 base configs riscv: Enable PM_GENERIC_DOMAINS for T-Head SoCs
2025-05-30Merge tag 'efi-next-for-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efiLinus Torvalds1-3/+3
Pull EFI updates from Ard Biesheuvel: "Not a lot going on in the EFI tree this cycle. The only thing that stands out is the new support for SBAT metadata, which was a bit contentious when it was first proposed, because in the initial incarnation, it would have required us to maintain a revocation index, and bump it each time a vulnerability affecting UEFI secure boot got fixed. This was shot down for obvious reasons. This time, only the changes needed to emit the SBAT section into the PE/COFF image are being carried upstream, and it is up to the distros to decide what to put in there when creating and signing the build. This only has the EFI zboot bits (which the distros will be using for arm64); the x86 bzImage changes should be arriving next cycle, presumably via the -tip tree. Summary: - Add support for emitting a .sbat section into the EFI zboot image, so that downstreams can easily include revocation metadata in the signed EFI images - Align PE symbolic constant names with other projects - Bug fix for the efi_test module - Log the physical address and size of the EFI memory map when failing to map it - A kerneldoc fix for the EFI stub code" * tag 'efi-next-for-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: include: pe.h: Fix PE definitions efi/efi_test: Fix missing pending status update in getwakeuptime efi: zboot specific mechanism for embedding SBAT section efi/libstub: Describe missing 'out' parameter in efi_load_initrd efi: Improve logging around memmap init
2025-05-30KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointerMarc Zyngier1-1/+4
Dan reports that iterating over a device ITEs can legitimately lead to a NULL pointer, and that the NULL check is placed *after* the pointer has already been dereferenced. Hoist the pointer check as early as possible and be done with it. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: 30deb51a677b ("KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables") Link: https://lore.kernel.org/r/aDBylI1YnjPatAbr@stanley.mountain Cc: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20250530091647.1152489-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation raceOliver Upton1-1/+26
syzkaller has found another ugly race in the VGIC, this time dealing with VGIC creation. Since kvm_vgic_create() doesn't sufficiently protect against in-flight vCPU creations, it is possible to get a vCPU into the kernel w/ an in-kernel VGIC but no allocation of private IRQs: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000d20 Mem abort info: ESR = 0x0000000096000046 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000 CM = 0, WnR = 1, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000103e4f000 [0000000000000d20] pgd=0800000102e1c403, p4d=0800000102e1c403, pud=0800000101146403, pmd=0000000000000000 Internal error: Oops: 0000000096000046 [#1] PREEMPT SMP CPU: 9 UID: 0 PID: 246 Comm: test Not tainted 6.14.0-rc6-00097-g0c90821f5db8 #16 Hardware name: linux,dummy-virt (DT) pstate: 814020c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : _raw_spin_lock_irqsave+0x34/0x8c lr : kvm_vgic_set_owner+0x54/0xa4 sp : ffff80008086ba20 x29: ffff80008086ba20 x28: ffff0000c19b5640 x27: 0000000000000000 x26: 0000000000000000 x25: ffff0000c4879bd0 x24: 000000000000001e x23: 0000000000000000 x22: 0000000000000000 x21: ffff0000c487af80 x20: ffff0000c487af18 x19: 0000000000000000 x18: 0000001afadd5a8b x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000001 x14: ffff0000c19b56c0 x13: 0030c9adf9d9889e x12: ffffc263710e1908 x11: 0000001afb0d74f2 x10: e0966b840b373664 x9 : ec806bf7d6a57cd5 x8 : ffff80008086b980 x7 : 0000000000000001 x6 : 0000000000000001 x5 : 0000000080800054 x4 : 4ec4ec4ec4ec4ec5 x3 : 0000000000000000 x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000d20 Call trace: _raw_spin_lock_irqsave+0x34/0x8c (P) kvm_vgic_set_owner+0x54/0xa4 kvm_timer_enable+0xf4/0x274 kvm_arch_vcpu_run_pid_change+0xe0/0x380 kvm_vcpu_ioctl+0x93c/0x9e0 __arm64_sys_ioctl+0xb4/0xec invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x30/0xd0 el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x198/0x19c Code: b9000841 d503201f 52800001 52800022 (88e17c02) ---[ end trace 0000000000000000 ]--- Plug the race by explicitly checking for an in-progress vCPU creation and failing kvm_vgic_create() when that's the case. Add some comments to document all the things kvm_vgic_create() is trying to guard against too. Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20250523194722.4066715-6-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30KVM: arm64: Unmap vLPIs affected by changes to GSI routing informationOliver Upton1-0/+23
KVM's interrupt infrastructure is dodgy at best, allowing for some ugly 'off label' usage of the various UAPIs. In one example, userspace can change the routing entry of a particular "GSI" after configuring irqbypass with KVM_IRQFD. KVM/arm64 is oblivious to this, and winds up preserving the stale translation in cases where vLPIs are configured. Honor userspace's intentions and tear down the vLPI mapping if affected by a "GSI" routing change. Make no attempt to reconstruct vLPIs if the new target is an MSI and just fall back to software injection. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-5-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()Oliver Upton2-22/+26
The virtual mapping and "GSI" routing of a particular vLPI is subject to change in response to the guest / userspace. This can be pretty annoying to deal with when KVM needs to track the physical state that's managed for vLPI direct injection. Make vgic_v4_unset_forwarding() resilient by using the host IRQ to resolve the vgic IRQ. Since this uses the LPI xarray directly, finding the ITS by doorbell address + grabbing it's its_lock is no longer necessary. Note that matching the right ITS / ITE is already handled in vgic_v4_set_forwarding(), and unless there's a bug in KVM's VGIC ITS emulation the virtual mapping that should remain stable for the lifetime of the vLPI mapping. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-4-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30KVM: arm64: Protect vLPI translation with vgic_irq::irq_lockOliver Upton2-42/+47
Though undocumented, KVM generally protects the translation of a vLPI with the its_lock. While this makes perfectly good sense, as the ITS itself contains the guest translation, an upcoming change will require twiddling the vLPI mapping in an atomic context. Switch to using the vIRQ's irq_lock to protect the translation. Use of the its_lock in vgic_v4_unset_forwarding() is preserved for now as it still needs to walk the ITS. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30KVM: arm64: Use lock guard in vgic_v4_set_forwarding()Oliver Upton1-6/+4
The locking dance is about to get more interesting, switch the its_lock over to a lock guard to make it a bit easier to handle. Tested-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250523194722.4066715-2-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidationMarc Zyngier1-2/+4
When handling a TLBI VA* instruction that potentially targets a VNCR page mapping, we fail to mask out the top bits that contain the ASID and TTL fields, hence potentially failing the VA check in the TLB code. An additional wrinkle is that we fail to sign extend the VA, again leading to failed VA checks. Fix both in one go by sign-extending the VA from bit 48, making it comparable to the way we interpret VNCR_EL2.BADDR. Fixes: 4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2") Link: https://lore.kernel.org/r/20250525175759.780891-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-30arm64: sysreg: Drag linux/kconfig.h to work around vdso build issueMarc Zyngier1-0/+1
Broonie reports that fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") breaks one of the vdso selftests (vdso_test_chacha) as it indirectly drags asm/sysreg.h. It is rather unfortunate (and worrying) that userspace gets built with non-UAPI headers. In any case, paper over the issue by dragging linux/kconfig.h in asm/sysreg.h. It is the right thing to do, at least from the kernel perspective. Reported-by: Mark Brown <broonie@kernel.org> Fixes: fed55f49fad18 ("arm64: errata: Work around AmpereOne's erratum AC04_CPU_23") Link: https://lore.kernel.org/r/aDCDGZ-G-nCP3hJI@finisterre.sirena.org.uk Cc: D Scott Phillips <scott@os.amperecomputing.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250523170208.530818-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>