aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-12-02Merge tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linuxLinus Torvalds5-18/+41
Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers: "This is a core arm64 change. However, I was asked to take this because most uses of kernel-mode FPSIMD are in crypto or CRC code. In v6.8, the size of task_struct on arm64 increased by 528 bytes due to the new 'kernel_fpsimd_state' field. This field was added to allow kernel-mode FPSIMD code to be preempted. Unfortunately, 528 bytes is kind of a lot for task_struct. This regression in the task_struct size was noticed and reported. Recover that space by making this state be allocated on the stack at the beginning of each kernel-mode FPSIMD section. To make it easier for all the users of kernel-mode FPSIMD to do that correctly, introduce and use a 'scoped_ksimd' abstraction" * tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits) lib/crypto: arm64: Move remaining algorithms to scoped ksimd API lib/crypto: arm/blake2b: Move to scoped ksimd API arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack arm64/fpu: Enforce task-context only for generic kernel mode FPU net/mlx5: Switch to more abstract scoped ksimd guard API on arm64 arm64/xorblocks: Switch to 'ksimd' scoped guard API crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API crypto/arm64: polyval - Switch to 'ksimd' scoped guard API crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API raid6: Move to more abstract 'ksimd' guard API crypto: aegis128-neon - Move to more abstract 'ksimd' guard API crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API ...
2025-12-02Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds70-282/+298
Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set_*_tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...
2025-12-02Merge tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+2
Pull scoped user access updates from Thomas Gleixner: "Scoped user mode access and related changes: - Implement the missing u64 user access function on ARM when CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in generic code with [unsafe_]get_user(). All other architectures and ARM variants provide the relevant accessors already. - Ensure that ASM GOTO jump label usage in the user mode access helpers always goes through a local C scope label indirection inside the helpers. This is required because compilers are not supporting that a ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit the cleanup invocation and CLANG fails the build. [ Editor's note: gcc-16 will have fixed the code generation issue in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm goto jumps [PR122835]"). But we obviously have to deal with clang and older versions of gcc, so.. - Linus ] This provides generic wrapper macros and the conversion of affected architecture code to use them. - Scoped user mode access with auto cleanup Access to user mode memory can be required in hot code paths, but if it has to be done with user controlled pointers, the access is shielded with a speculation barrier, so that the CPU cannot speculate around the address range check. Those speculation barriers impact performance quite significantly. This cost can be avoided by "masking" the provided pointer so it is guaranteed to be in the valid user memory access range and otherwise to point to a guaranteed unpopulated address space. This has to be done without branches so it creates an address dependency for the access, which the CPU cannot speculate ahead. This results in repeating and error prone programming patterns: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(*from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; which can be replaced with scopes and automatic cleanup: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; - Convert code which implements the above pattern over to scope_user.*.access(). This also corrects a couple of imbalanced masked_*_begin() instances which are harmless on most architectures, but prevent PowerPC from implementing the masking optimization. - Add a missing speculation barrier in copy_from_user_iter()" * tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required scm: Convert put_cmsg() to scoped user access iov_iter: Add missing speculation barrier to copy_from_user_iter() iov_iter: Convert copy_from_user_iter() to masked user access select: Convert to scoped user access x86/futex: Convert to scoped user access futex: Convert to get/put_user_inline() uaccess: Provide put/get_user_inline() uaccess: Provide scoped user access regions arm64: uaccess: Use unsafe wrappers for ASM GOTO s390/uaccess: Use unsafe wrappers for ASM GOTO riscv/uaccess: Use unsafe wrappers for ASM GOTO powerpc/uaccess: Use unsafe wrappers for ASM GOTO x86/uaccess: Use unsafe wrappers for ASM GOTO uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user() ARM: uaccess: Implement missing __get_user_asm_dword()
2025-12-01Merge tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull bug handling infrastructure updates from Ingo Molnar: "Core updates: - Improve WARN(), which has vararg printf like arguments, to work with the x86 #UD based WARN-optimizing infrastructure by hiding the format in the bug_table and replacing this first argument with the address of the bug-table entry, while making the actual function that's called a UD1 instruction (Peter Zijlstra) - Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo Molnar, s390 support by Heiko Carstens) Fixes and cleanups: - bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens) - <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter Zijlstra)" * tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS x86/bug: Fix BUG_FORMAT vs KASLR x86_64/bug: Inline the UD1 x86/bug: Implement WARN_ONCE() x86_64/bug: Implement __WARN_printf() x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED x86/bug: Add BUG_FORMAT basics bug: Allow architectures to provide __WARN_printf() bug: Implement WARN_ON() using __WARN_FLAGS() bug: Add report_bug_entry() bug: Add BUG_FORMAT_ARGS infrastructure bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS bug: Add BUG_FORMAT infrastructure x86: Rework __bug_table helpers bugs/s390: Remove private WARN_ON() implementation bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS() ...
2025-11-28Merge branch 'for-next/sysreg' into for-next/coreCatalin Marinas6-105/+54
* for-next/sysreg: : arm64 sysreg updates/cleanups arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64/sysreg: Add ICH_VMCR_EL2 arm64/sysreg: Move generation of RES0/RES1/UNKN to function arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor arm64/sysreg: Fix checks for incomplete sysreg definitions arm64/sysreg: Replace TCR_EL1 field macros
2025-11-28Merge branches 'for-next/misc', 'for-next/kselftest', 'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch ↵Catalin Marinas69-177/+244
'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY perf/arm-ni: Fix and optimise register offset calculation perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister() perf/arm-ni: Add NoC S3 support perf/arm_cspmu: nvidia: Add pmevfiltr2 support perf/arm_cspmu: nvidia: Add revision id matching perf/arm_cspmu: Add pmpidr support perf/arm_cspmu: Add callback to reset filter config perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores * for-next/misc: : Miscellaneous patches arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index arm64: mm: make linear mapping permission update more robust for patial range arm64/mm: Elide TLB flush in certain pte protection transitions arm64/mm: Rename try_pgd_pgtable_alloc_init_mm arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors arm64: add unlikely hint to MTE async fault check in el0_svc_common arm64: acpi: add newline to deferred APEI warning arm64: entry: Clean out some indirection arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52 arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz() arm64: remove unused ARCH_PFN_OFFSET arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack arm64: Remove assertion on CONFIG_VMAP_STACK * for-next/kselftest: : arm64 kselftest patches kselftest/arm64: Align zt-test register dumps * for-next/efi-preempt: : arm64: Make EFI calls preemptible arm64/efi: Call EFI runtime services without disabling preemption arm64/efi: Move uaccess en/disable out of efi_set_pgd() arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper arm64/fpsimd: Permit kernel mode NEON with IRQs off arm64/fpsimd: Don't warn when EFI execution context is preemptible efi/runtime-wrappers: Keep track of the efi_runtime_lock owner efi: Add missing static initializer for efi_mm::cpus_allowed_lock * for-next/assembler-macro: : arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers * for-next/typos: : Random typo/spelling fixes arm64: Fix double word in comments arm64: Fix typos and spelling errors in comments * for-next/sme-ptrace-disable: : Support disabling streaming mode via ptrace on SME only systems kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems arm64/sme: Support disabling streaming mode via ptrace on SME only systems * for-next/local-tlbi-page-reused: : arm64, mm: avoid TLBI broadcast if page reused in write fault arm64, tlbflush: don't TLBI broadcast if page reused in write fault mm: add spurious fault fixing support for huge pmd * for-next/mpam: (34 commits) : Basic Arm MPAM driver (more to follow) MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state arm_mpam: Use long MBWU counters if supported arm_mpam: Probe for long/lwd mbwu counters arm_mpam: Consider overflow in bandwidth counter state arm_mpam: Track bandwidth counter state for power management arm_mpam: Add mpam_msmon_read() to read monitor value arm_mpam: Add helpers to allocate monitors arm_mpam: Probe and reset the rest of the features arm_mpam: Allow configuration to be applied and restored during cpu online arm_mpam: Use a static key to indicate when mpam is enabled arm_mpam: Register and enable IRQs arm_mpam: Extend reset logic to allow devices to be reset any time arm_mpam: Add a helper to touch an MSC from any CPU arm_mpam: Reset MSC controls from cpuhp callbacks arm_mpam: Merge supported features during mpam_enable() into mpam_class arm_mpam: Probe the hardware features resctrl supports arm_mpam: Add helpers for managing the locking around the mon_sel registers ... * for-next/acpi: : arm64 acpi updates ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() * for-next/documentation: : arm64 Documentation updates Documentation/arm64: Fix the typo of register names
2025-11-27arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITSBen Horgan1-2/+0
The define ARM64_FEATURE_FIELD_BITS is now unused and feature id fields don't always have 4 bits. Remove it. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-27arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macrosSeongsu Park1-10/+10
The ATOMIC_FETCH_OP_AND and ATOMIC64_FETCH_OP_AND macros accept 'mb' and 'cl' parameters but never use them in their implementation. These macros simply delegate to the corresponding andnot functions, which handle the actual atomic operations and memory barriers. Signed-off-by: Seongsu Park <sgsu.park@samsung.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-21Merge branch 'objtool/core'Peter Zijlstra41-257/+418
Bring in the UDB and objtool data annotations to avoid conflicts while further extending the bug exceptions. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2025-11-19arm64, tlbflush: don't TLBI broadcast if page reused in write faultHuang Ying2-5/+65
A multi-thread customer workload with large memory footprint uses fork()/exec() to run some external programs every tens seconds. When running the workload on an arm64 server machine, it's observed that quite some CPU cycles are spent in the TLB flushing functions. While running the workload on the x86_64 server machine, it's not. This causes the performance on arm64 to be much worse than that on x86_64. During the workload running, after fork()/exec() write-protects all pages in the parent process, memory writing in the parent process will cause a write protection fault. Then the page fault handler will make the PTE/PDE writable if the page can be reused, which is almost always true in the workload. On arm64, to avoid the write protection fault on other CPUs, the page fault handler flushes the TLB globally with TLBI broadcast after changing the PTE/PDE. However, this isn't always necessary. Firstly, it's safe to leave some stale read-only TLB entries as long as they will be flushed finally. Secondly, it's quite possible that the original read-only PTE/PDEs aren't cached in remote TLB at all if the memory footprint is large. In fact, on x86_64, the page fault handler doesn't flush the remote TLB in this situation, which benefits the performance a lot. To improve the performance on arm64, make the write protection fault handler flush the TLB locally instead of globally via TLBI broadcast after making the PTE/PDE writable. If there are stale read-only TLB entries in the remote CPUs, the page fault handler on these CPUs will regard the page fault as spurious and flush the stale TLB entries. To test the patchset, make the usemem.c from vm-scalability (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git). support calling fork()/exec() periodically. To mimic the behavior of the customer workload, run usemem with 4 threads, access 100GB memory, and call fork()/exec() every 40 seconds. Test results show that with the patchset the score of usemem improves ~40.6%. The cycles% of TLB flush functions reduces from ~50.5% to ~0.3% in perf profile. Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Barry Song <baohua@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-18mm/huge_memory: Fix initialization of huge zero folioLinus Torvalds1-2/+2
The recent fix to properly initialize the tags of the huge zero folio had an unfortunate not-so-subtle side effect: it caused the actual *contents* of the huge zero folio to not be initialized at all when the hardware didn't support the memory tagging. The reason was the unfortunate semantics of tag_clear_highpage(): on hardware that didn't do the tagging, it would silently just not do anything at all. And since this is done only on arm64 with MTE support, that basically meant most hardware. It wasn't necessarily immediately obvious since the huge zero page isn't necessarily very heavily used - or because it might already be zero because all-zeroes is the most common pattern. But it ends up causing random odd user space failures when you do hit it. The unfortunate semantics have been around for a while, but became a real bug only when we started actively using __GFP_ZEROTAGS in the generic get_huge_zero_folio() function - before that, it had only ever been used in code that checked that the hardware supported it. Fix this by simply changing the semantics of tag_clear_highpage() to return whether it actually successfully did something or not. While at it, also make it initialize multiple pages in one go, since that's actually what the only caller wants it to do and it simplifies the whole logic. Fixes: adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero folio") Link: https://lore.kernel.org/all/20251117082023.90176-1-00107082@163.com/ Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Reported-and-tested-by: David Wang <00107082@163.com> Reported-and-tested-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-13arm64/mm: Elide TLB flush in certain pte protection transitionsDev Jain1-0/+27
Currently arm64 does an unconditional TLB flush in mprotect(). This is not required for some cases, for example, when changing from PROT_NONE to PROT_READ | PROT_WRITE (a real usecase - glibc malloc does this to emulate growing into the non-main heaps), and unsetting uffd-wp in a range. Therefore, implement pte_needs_flush() for arm64, which is already implemented by some other arches as well. Running a userspace program changing permissions back and forth between PROT_NONE and PROT_READ | PROT_WRITE, and measuring the average time taken for the none->rw transition, I get a reduction from 3.2 microseconds to 2.85 microseconds, giving a 12.3% improvement. Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13arm64/sysreg: Replace TCR_EL1 field macrosAnshuman Khandual5-103/+54
This just replaces all used TCR_EL1 field macros with tools sysreg variant based fields and subsequently drops them from the header (pgtable-hwdef.h), although while retaining the ones used for KVM (represented via the sysreg tools format). Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-12Merge tag 'arm64-fpsimd-on-stack-for-v6.19' into libcrypto-fpsimd-on-stackEric Biggers5-20/+36
Pull fpsimd-on-stack changes from Ard Biesheuvel: "Shared tag/branch for arm64 FP/SIMD changes going through libcrypto" Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12Merge tag 'scoped-ksimd-for-arm-arm64' into libcrypto-fpsimd-on-stackEric Biggers1-0/+7
Pull scoped ksimd API for ARM and arm64 from Ard Biesheuvel: "Introduce a more strict replacement API for kernel_neon_begin()/kernel_neon_end() on both ARM and arm64, and replace occurrences of the latter pair appearing in lib/crypto" Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12arm64: Fix typos and spelling errors in commentsmrigendrachaubey5-8/+8
This patch corrects several minor typographical and spelling errors in comments across multiple arm64 source files. No functional changes. Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-12arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stackArd Biesheuvel4-7/+15
Commit aefbab8e77eb16b5 ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") added a 'kernel_fpsimd_state' field to struct thread_struct, which is the arch-specific portion of struct task_struct, and is allocated for each task in the system. The size of this field is 528 bytes, resulting in non-negligible bloat of task_struct, and the resulting memory overhead may impact performance on systems with many processes. This allocation is only used if the task is scheduled out or interrupted by a softirq while using the FP/SIMD unit in kernel mode, and so it is possible to transparently allocate this buffer on the caller's stack instead. So tweak the 'ksimd' scoped guard implementation so that a stack buffer is allocated and passed to both kernel_neon_begin() and kernel_neon_end(), and either record it in the task struct, or use it directly to preserve the task mode kernel FP/SIMD when running in softirq context. Passing the address to both functions, and checking the addresses for consistency ensures that callers of the updated bare begin/end API use it in a manner that is consistent with the new context switch semantics. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12arm64/fpu: Enforce task-context only for generic kernel mode FPUArd Biesheuvel1-2/+14
The generic kernel mode FPU API, which is used by the AMDGPU driver to perform floating point calculations, is modeled after the most restrictive architecture that supports it. This means it doesn't support preemption, and can only be used from task context. The arm64 implementation is a bit more flexible, but supporting that in the generic API complicates matters slightly, and for no good reason, given that the only user does not need it. So enforce that kernel_fpu_begin() can only be called from task context, and [redundantly] disable preemption. This removes the need for users of this API to provide a kernel mode FP/SIMD state after a future patch that makes that compulsory for preemptible task context. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12arm64/xorblocks: Switch to 'ksimd' scoped guard APIArd Biesheuvel1-13/+9
Switch to the more abstract 'scoped_ksimd()' API, which will be modified in a future patch to transparently allocate a kernel mode FP/SIMD state buffer on the stack, so that kernel mode FP/SIMD code remains preemptible in principe, but without the memory overhead that adds 528 bytes to the size of struct task_struct. Reviewed-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-12arm64/simd: Add scoped guard API for kernel mode SIMDArd Biesheuvel1-0/+7
Encapsulate kernel_neon_begin() and kernel_neon_end() using a 'ksimd' cleanup guard. This hides the prototype of those functions, allowing them to be changed for arm64 but not ARM, without breaking code that is shared between those architectures (RAID6, AEGIS-128) It probably makes sense to expose this API more widely across architectures, as it affords more flexibility to the arch code to plumb it in, while imposing more rigid rules regarding the start/end bookends appearing in matched pairs. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-11arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headersThomas Huth61-123/+123
While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. So let's standardize now on the __ASSEMBLER__ macro that is provided by the compilers. This is a mostly mechanical patch (done with a simple "sed -i" statement), except for the following files where comments with mis-spelled macros were tweaked manually: arch/arm64/include/asm/stacktrace/frame.h arch/arm64/include/asm/kvm_ptrauth.h arch/arm64/include/asm/debug-monitors.h arch/arm64/include/asm/esr.h arch/arm64/include/asm/scs.h arch/arm64/include/asm/memory.h Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headersThomas Huth3-5/+5
__ASSEMBLY__ is only defined by the Makefile of the kernel, so this is not really useful for uapi headers (unless the userspace Makefile defines it, too). Let's switch to __ASSEMBLER__ which gets set automatically by the compiler when compiling assembly code. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64/efi: Move uaccess en/disable out of efi_set_pgd()Ard Biesheuvel1-10/+3
efi_set_pgd() will no longer be called when invoking EFI runtime services via the efi_rts_wq work queue, but the uaccess en/disable are still needed when using PAN emulation using TTBR0 switching. So move these into the callers. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64/fpsimd: Permit kernel mode NEON with IRQs offArd Biesheuvel1-1/+1
Currently, may_use_simd() will return false when called from a context where IRQs are disabled. One notable case where this happens is when calling the ResetSystem() EFI runtime service from the reboot/poweroff code path. For this case alone, there is a substantial amount of FP/SIMD support code to handle the corner case where a EFI runtime service is invoked with IRQs disabled. The only reason kernel mode SIMD is not allowed when IRQs are disabled is that re-enabling softirqs in this case produces a noisy diagnostic when lockdep is enabled. The warning is valid, in the sense that delivering pending softirqs over the back of the call to local_bh_enable() is problematic when IRQs are disabled. While the API lacks a facility to simply mask and unmask softirqs without triggering their delivery, disabling softirqs is not needed to begin with when IRQs are disabled, given that softirqs are only every taken asynchronously over the back of a hard IRQ. So dis/enable softirq processing conditionally, based on whether IRQs are enabled, and relax the check in may_use_simd(). Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds5-9/+19
Pull arm64 fixes from Will Deacon: "There's more here than I would ideally like at this stage, but there's been a steady trickle of fixes and some of them took a few rounds of review. The bulk of the changes are fixing some fallout from the recent BBM level two support which allows the linear map to be split from block to page mappings at runtime, but inadvertently led to sleeping in atomic context on some paths where the linear map was already mapped with page granularity. The fix is simply to avoid splitting in those cases but the implementation of that is a little involved. The other interesting fix is addressing a catastophic performance issue with our per-cpu atomics discovered by Paul in the SRCU locking code but which took some interactions with the hardware folks to resolve. Summary: - Avoid sleeping in atomic context when changing linear map permissions for DEBUG_PAGEALLOC or KFENCE - Rework printing of Spectre mitigation status to avoid hardlockup when enabling per-task mitigations on the context-switch path - Reject kernel modules when instruction patching fails either due to the DWARF-based SCS patching or because of an alternatives callback residing outside of the core kernel text - Propagate error when updating kernel memory permissions in kprobes - Drop pointless, incorrect message when enabling the ACPI SPCR console - Use value-returning LSE instructions for per-cpu atomics to reduce latency in SRCU locking routines" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Reject modules with internal alternative callbacks arm64: Fail module loading if dynamic SCS patching fails arm64: proton-pack: Fix hard lockup due to print in scheduler context arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY arm64: mm: Tidy up force_pte_mapping() arm64: mm: Optimize range_split_to_ptes() arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context arm64: kprobes: check the return value of set_memory_rox() arm64: acpi: Drop message logging SPCR default console Revert "ACPI: Suppress misleading SPCR console message when SPCR table is absent" arm64: Use load LSE atomics for the non-return per-CPU atomic operations
2025-11-07arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz()Anshuman Khandual1-10/+2
These TCR_El1 helpers don't have any other callers. Drop these redundant indirections completely thus making this code more compact and readable. No functional change. Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07arm64: remove unused ARCH_PFN_OFFSETOmar Sandoval1-1/+0
This is only relevant to the FLATMEM memory model, which isn't an option since commit 782276b4d0ad ("arm64: Force SPARSEMEM_VMEMMAP as the only memory management model"). Signed-off-by: Omar Sandoval <osandov@fb.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07arm64: Remove assertion on CONFIG_VMAP_STACKDawei Li1-4/+0
CONFIG_VMAP_STACK is selected by arm64 arch unconditionly since commit ef6861b8e6dd ("arm64: Mandate VMAP_STACK"). Remove the redundant assertion and headers. Signed-off-by: Dawei Li <dawei.li@linux.dev> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07arm64: Reject modules with internal alternative callbacksAdrian Barnaś1-2/+5
During module loading, check if a callback function used by the alternatives specified in the '.altinstruction' ELF section (if present) is located in core kernel .text. If not fail module loading before callback is called. Reported-by: Fanqin Cui <cuifq1@chinatelecom.cn> Closes: https://lore.kernel.org/all/20250807072700.348514-1-fanqincui@163.com/ Signed-off-by: Adrian Barnaś <abarnas@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> [will: Folded in 'noinstr' tweak from Mark] Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: Fail module loading if dynamic SCS patching failsAdrian Barnaś1-1/+1
Disallow a module to load if SCS dynamic patching fails for its code. For module loading, instead of running a dry-run to check for patching errors, try to run patching in the first run and propagate any errors so module loading will fail. Signed-off-by: Adrian Barnaś <abarnas@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: proton-pack: Fix hard lockup due to print in scheduler contextshechenglong1-0/+1
Relocate the printk() calls from spectre_v4_mitigations_off() and spectre_v2_mitigations_off() into setup_system_capabilities() function, preventing hard lockups caused by printk calls in scheduler context: | _raw_spin_lock_nested+168 | ttwu_queue+180 (rq_lock(rq, &rf); 2nd acquiring the rq->__lock) | try_to_wake_up+548 | wake_up_process+32 | __up+88 | up+100 | __up_console_sem+96 | console_unlock+696 | vprintk_emit+428 | vprintk_default+64 | vprintk_func+220 | printk+104 | spectre_v4_enable_task_mitigation+344 | __switch_to+100 | __schedule+1028 (rq_lock(rq, &rf); 1st acquiring the rq->__lock) | schedule_idle+48 | do_idle+388 | cpu_startup_entry+44 | secondary_start_kernel+352 Suggested-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: shechenglong <shechenglong@xfusion.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic contextRyan Roberts1-2/+1
It has been reported that split_kernel_leaf_mapping() is trying to sleep in non-sleepable context. It does this when acquiring the pgtable_split_lock mutex, when either CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE are enabled, which change linear map permissions within softirq context during memory allocation and/or freeing. All other paths into this function are called from sleepable context and so are safe. But it turns out that the memory for which these 2 features may attempt to modify the permissions is always mapped by pte, so there is no need to attempt to split the mapping. So let's exit early in these cases and avoid attempting to take the mutex. There is one wrinkle to this approach; late-initialized kfence allocates it's pool from the buddy which may be block mapped. So we must hook that allocation and convert it to pte-mappings up front. Previously this was done as a side-effect of kfence protecting all the individual pages in its pool at init-time, but this no longer works due to the added early exit path in split_kernel_leaf_mapping(). So instead, do this via the existing arch_kfence_init_pool() arch hook, and reuse the existing linear_map_split_to_ptes() infrastructure. Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@roeck-us.net/ Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <groeck@google.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: Use load LSE atomics for the non-return per-CPU atomic operationsCatalin Marinas1-4/+11
The non-return per-CPU this_cpu_*() atomic operations are implemented as STADD/STCLR/STSET when FEAT_LSE is available. On many microarchitecture implementations, these instructions tend to be executed "far" in the interconnect or memory subsystem (unless the data is already in the L1 cache). This is in general more efficient when there is contention as it avoids bouncing cache lines between CPUs. The load atomics (e.g. LDADD without XZR as destination), OTOH, tend to be executed "near" with the data loaded into the L1 cache. STADD executed back to back as in srcu_read_{lock,unlock}*() incur an additional overhead due to the default posting behaviour on several CPU implementations. Since the per-CPU atomics are unlikely to be used concurrently on the same memory location, encourage the hardware to to execute them "near" by issuing load atomics - LDADD/LDCLR/LDSET - with the destination register unused (but not XZR). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com> [will: Add comment and link to the discussion thread] Signed-off-by: Will Deacon <will@kernel.org>
2025-11-04arm64: uaccess: Use unsafe wrappers for ASM GOTOThomas Gleixner1-2/+2
Clang propagates a provided label, which is outside of a cleanup scope to ASM GOTO despite the fact that __raw_get_mem() has a local label for that purpose: "error: cannot jump from this asm goto statement to one of its possible targets" Using the unsafe wrapper with the extra local label indirection cures that. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-10-23Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds1-1/+2
Pull arm64 fixes from Catalin Marinas: - Do not make a clean PTE dirty in pte_mkwrite() The Arm architecture, for backwards compatibility reasons (ARMv8.0 before in-hardware dirty bit management - DBM), uses the PTE_RDONLY bit to mean !dirty while the PTE_WRITE bit means DBM enabled. The arm64 pte_mkwrite() simply clears the PTE_RDONLY bit and this inadvertently makes the PTE pte_hw_dirty(). Most places making a PTE writable also invoke pte_mkdirty() but do_swap_page() does not and we end up with dirty, freshly swapped in, writeable pages. - Do not warn if the destination page is already MTE-tagged in copy_highpage() In the majority of the cases, a destination page copied into is freshly allocated without the PG_mte_tagged flag set. However, the folio migration may be restarted if __folio_migrate_mapping() failed, triggering the benign WARN_ON_ONCE(). * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Do not warn if the page is already tagged in copy_highpage() arm64, mm: avoid always making PTE dirty in pte_mkwrite()
2025-10-21arm64, mm: avoid always making PTE dirty in pte_mkwrite()Huang Ying1-1/+2
Current pte_mkwrite_novma() makes PTE dirty unconditionally. This may mark some pages that are never written dirty wrongly. For example, do_swap_page() may map the exclusive pages with writable and clean PTEs if the VMA is writable and the page fault is for read access. However, current pte_mkwrite_novma() implementation always dirties the PTE. This may cause unnecessary disk writing if the pages are never written before being reclaimed. So, change pte_mkwrite_novma() to clear the PTE_RDONLY bit only if the PTE_DIRTY bit is set to make it possible to make the PTE writable and clean. The current behavior was introduced in commit 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"). Before that, pte_mkwrite() only sets the PTE_WRITE bit, while set_pte_at() only clears the PTE_RDONLY bit if both the PTE_WRITE and the PTE_DIRTY bits are set. To test the performance impact of the patch, on an arm64 server machine, run 16 redis-server processes on socket 1 and 16 memtier_benchmark processes on socket 0 with mostly get transactions (that is, redis-server will mostly read memory only). The memory footprint of redis-server is larger than the available memory, so swap out/in will be triggered. Test results show that the patch can avoid most swapping out because the pages are mostly clean. And the benchmark throughput improves ~23.9% in the test. Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-6/+82
Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the handling of ZCR_EL2 in NV VMs - Pick the correct translation regime when doing a PTW on the back of a SEA - Prevent userspace from injecting an event into a vcpu that isn't initialised yet - Move timer save/restore to the sysreg handling code, fixing EL2 timer access in the process - Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug - Fix trapping configuration when the host isn't GICv3 - Improve the detection of HCR_EL2.E2H being RES1 - Drop a spurious 'break' statement in the S1 PTW - Don't try to access SPE when owned by EL3 Documentation updates: - Document the failure modes of event injection - Document that a GICv3 guest can be created on a GICv5 host with FEAT_GCIE_LEGACY Selftest improvements: - Add a selftest for the effective value of HCR_EL2.AMO - Address build warning in the timer selftest when building with clang - Teach irqfd selftests about non-x86 architectures - Add missing sysregs to the set_id_regs selftest - Fix vcpu allocation in the vgic_lpi_stress selftest - Correctly enable interrupts in the vgic_lpi_stress selftest x86: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. guest_memfd: - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits) arm64: Revamp HCR_EL2.E2H RES1 detection KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available KVM: arm64: Compute per-vCPU FGTs at vcpu_load() KVM: arm64: selftests: Fix misleading comment about virtual timer encoding KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit KVM: arm64: Kill leftovers of ad-hoc timer userspace access KVM: arm64: Fix WFxT handling of nested virt KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure KVM: arm64: Add timer UAPI workaround to sysreg infrastructure KVM: arm64: Make timer_set_offset() generally accessible KVM: arm64: Replace timer context vcpu pointer with timer_id KVM: arm64: Introduce timer_context_to_vcpu() helper KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests Documentation: KVM: Update GICv3 docs for GICv5 hosts KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress KVM: arm64: selftests: Allocate vcpus with correct size ...
2025-10-17arm64/sysreg: Fix GIC CDEOI instruction encodingLorenzo Pieralisi1-1/+10
The GIC CDEOI system instruction requires the Rt field to be set to 0b11111 otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE. Currenly, its usage is encoded as a system register write, with a constant 0 value: write_sysreg_s(0, GICV5_OP_GIC_CDEOI) While compiling with GCC, the 0 constant value, through these asm constraints and modifiers ('x' modifier and 'Z' constraint combo): asm volatile(__msr_s(r, "%x0") : : "rZ" (__val)); forces the compiler to issue the XZR register for the MSR operation (ie that corresponds to Rt == 0b11111) issuing the right instruction encoding. Unfortunately LLVM does not yet understand that modifier/constraint combo so it ends up issuing a different register from XZR for the MSR source, which in turns means that it encodes the GIC CDEOI instruction wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE that we must prevent. Add a conditional to write_sysreg_s() macro that detects whether it is passed a constant 0 value and issues an MSR write with XZR as source register - explicitly doing what the asm modifier/constraint is meant to achieve through constraints/modifiers, fixing the LLVM compilation issue. Fixes: 7ec80fb3f025 ("irqchip/gic-v5: Add GICv5 PPI support") Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Cc: Sascha Bischoff <sascha.bischoff@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-10-14arm64: Revamp HCR_EL2.E2H RES1 detectionMarc Zyngier1-6/+32
We currently have two ways to identify CPUs that only implement FEAT_VHE and not FEAT_E2H0: - either they advertise it via ID_AA64MMFR4_EL1.E2H0, - or the HCR_EL2.E2H bit is RAO/WI However, there is a third category of "cpus" that fall between these two cases: on CPUs that do not implement FEAT_FGT, it is IMPDEF whether an access to ID_AA64MMFR4_EL1 can trap to EL2 when the register value is zero. A consequence of this is that on systems such as Neoverse V2, a NV guest cannot reliably detect that it is in a VHE-only configuration (E2H is writable, and ID_AA64MMFR0_EL1 is 0), despite the hypervisor's best effort to repaint the id register. Replace the RAO/WI test by a sequence that makes use of the VHE register remnapping between EL1 and EL2 to detect this situation, and work out whether we get the VHE behaviour even after having set HCR_EL2.E2H to 0. This solves the NV problem, and provides a more reliable acid test for CPUs that do not completely follow the letter of the architecture while providing a RES1 behaviour for HCR_EL2.E2H. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Jan Kotas <jank@cadence.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/15A85F2B-1A0C-4FA7-9FE4-EEC2203CC09E@global.cadence.com
2025-10-13KVM: arm64: Compute per-vCPU FGTs at vcpu_load()Oliver Upton1-0/+50
To date KVM has used the fine-grained traps for the sake of UNDEF enforcement (so-called FGUs), meaning the constituent parts could be computed on a per-VM basis and folded into the effective value when programmed. Prepare for traps changing based on the vCPU context by computing the whole mess of them at vcpu_load(). Aggressively inline all the helpers to preserve the build-time checks that were there before. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-07Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds2-1/+7
Pull arm64 fixes from Will Deacon: - Preserve old 'tt_core' UAPI for Hisilicon L3C PMU driver - Ensure linear alias of kprobes instruction page is not writable - Fix kernel stack unwinding from BPF - Fix build warnings from the Fujitsu uncore PMU documentation - Fix hang with deferred 'struct page' initialisation and MTE - Consolidate KPTI page-table re-writing code * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: mte: Do not flag the zero page as PG_mte_tagged docs: perf: Fujitsu: Fix htmldocs build warnings and errors arm64: mm: Move KPTI helpers to mmu.c tracing: Fix the bug where bpf_get_stackid returns -EFAULT on the ARM64 arm64: kprobes: call set_memory_rox() for kprobe page drivers/perf: hisi: Add tt_core_deprecated for compatibility
2025-10-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds7-10/+62
Pull kvm updates from Paolo Bonzini: "This excludes the bulk of the x86 changes, which I will send separately. They have two not complex but relatively unusual conflicts so I will wait for other dust to settle. guest_memfd: - Add support for host userspace mapping of guest_memfd-backed memory for VM types that do NOT use support KVM_MEMORY_ATTRIBUTE_PRIVATE (which isn't precisely the same thing as CoCo VMs, since x86's SEV-MEM and SEV-ES have no way to detect private vs. shared). This lays the groundwork for removal of guest memory from the kernel direct map, as well as for limited mmap() for guest_memfd-backed memory. For more information see: - commit a6ad54137af9 ("Merge branch 'guest-memfd-mmap' into HEAD") - guest_memfd in Firecracker: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hiding - direct map removal: https://lore.kernel.org/all/20250221160728.1584559-1-roypat@amazon.co.uk/ - mmap support: https://lore.kernel.org/all/20250328153133.3504118-1-tabba@google.com/ ARM: - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature. LoongArch: - Detect page table walk feature on new hardware - Add sign extension with kernel MMIO/IOCSR emulation - Improve in-kernel IPI emulation - Improve in-kernel PCH-PIC emulation - Move kvm_iocsr tracepoint out of generic code RISC-V: - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V - Added SBI v3.0 PMU enhancements in KVM and perf driver s390: - Improve interrupt cpu for wakeup, in particular the heuristic to decide which vCPU to deliver a floating interrupt to. - Clear the PTE when discarding a swapped page because of CMMA; this bug was introduced in 6.16 when refactoring gmap code. x86 selftests: - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements x86 (guest side): - For the legacy PCI hole (memory between TOLUD and 4GiB) to UC when overriding guest MTRR for TDX/SNP to fix an issue where ACPI auto-mapping could map devices as WB and prevent the device drivers from mapping their devices with UC/UC-. - Make kvm_async_pf_task_wake() a local static helper and remove its export. - Use native qspinlocks when running in a VM with dedicated vCPU=>pCPU bindings even when PV_UNHALT is unsupported. Generic: - Remove a redundant __GFP_NOWARN from kvm_setup_async_pf() as __GFP_NOWARN is now included in GFP_NOWAIT. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (178 commits) KVM: s390: Fix to clear PTE when discarding a swapped page KVM: arm64: selftests: Cover ID_AA64ISAR3_EL1 in set_id_regs KVM: arm64: selftests: Remove a duplicate register listing in set_id_regs KVM: arm64: selftests: Cope with arch silliness in EL2 selftest KVM: arm64: selftests: Add basic test for running in VHE EL2 KVM: arm64: selftests: Enable EL2 by default KVM: arm64: selftests: Initialize HCR_EL2 KVM: arm64: selftests: Use the vCPU attr for setting nr of PMU counters KVM: arm64: selftests: Use hyp timer IRQs when test runs at EL2 KVM: arm64: selftests: Select SMCCC conduit based on current EL KVM: arm64: selftests: Provide helper for getting default vCPU target KVM: arm64: selftests: Alias EL1 registers to EL2 counterparts KVM: arm64: selftests: Create a VGICv3 for 'default' VMs KVM: arm64: selftests: Add unsanitised helpers for VGICv3 creation KVM: arm64: selftests: Add helper to check for VGICv3 support KVM: arm64: selftests: Initialize VGICv3 only once KVM: arm64: selftests: Provide kvm_arch_vm_post_create() in library code KVM: selftests: Add ex_str() to print human friendly name of exception vectors selftests/kvm: remove stale TODO in xapic_state_test KVM: selftests: Handle Intel Atom errata that leads to PMU event overcount ...
2025-10-02Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds4-11/+18
Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
2025-09-30Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-7/+14
Pull VDSO updates from Thomas Gleixner: - Further consolidation of the VDSO infrastructure and the common data store - Simplification of the related Kconfig logic - Improve the VDSO selftest suite * tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests: vDSO: Drop vdso_test_clock_getres selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64() selftests: vDSO: vdso_test_abi: Test CPUTIME clocks selftests: vDSO: vdso_test_abi: Use explicit indices for name array selftests: vDSO: vdso_test_abi: Drop clock availability tests selftests: vDSO: vdso_test_abi: Use ksft_finished() selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper vdso: Add struct __kernel_old_timeval forward declaration to gettime.h vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO vdso: Drop Kconfig GENERIC_VDSO_TIME_NS vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE vdso: Drop kconfig GENERIC_COMPAT_VDSO vdso: Drop kconfig GENERIC_VDSO_32 riscv: vdso: Untangle Kconfig logic time: Build generic update_vsyscall() only with generic time vDSO vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs vdso: Move ENABLE_COMPAT_VDSO from core to arm64 ARM: VDSO: Remove cntvct_ok global variable vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini3-0/+7
KVM/riscv changes for 6.18 - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V such as access_tracking_perf_test, dirty_log_perf_test, memslot_modification_stress_test, memslot_perf_test, mmu_stress_test, and rseq_test - Added SBI v3.0 PMU enhancements in KVM and perf driver
2025-09-30Merge tag 'kvmarm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini7-10/+62
KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature.
2025-09-29Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds21-84/+206
Pull arm64 updates from Will Deacon: "There's good stuff across the board, including some nice mm improvements for CPUs with the 'noabort' BBML2 feature and a clever patch to allow ptdump to play nicely with block mappings in the vmalloc area. Confidential computing: - Add support for accepting secrets from firmware (e.g. ACPI CCEL) and mapping them with appropriate attributes. CPU features: - Advertise atomic floating-point instructions to userspace - Extend Spectre workarounds to cover additional Arm CPU variants - Extend list of CPUs that support break-before-make level 2 and guarantee not to generate TLB conflict aborts for changes of mapping granularity (BBML2_NOABORT) - Add GCS support to our uprobes implementation. Documentation: - Remove bogus SME documentation concerning register state when entering/exiting streaming mode. Entry code: - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY) - Micro-optimise syscall entry path with a compiler branch hint. Memory management: - Enable huge mappings in vmalloc space even when kernel page-table dumping is enabled - Tidy up the types used in our early MMU setup code - Rework rodata= for closer parity with the behaviour on x86 - For CPUs implementing BBML2_NOABORT, utilise block mappings in the linear map even when rodata= applies to virtual aliases - Don't re-allocate the virtual region between '_text' and '_stext', as doing so confused tools parsing /proc/vmcore. Miscellaneous: - Clean-up Kconfig menuconfig text for architecture features - Avoid redundant bitmap_empty() during determination of supported SME vector lengths - Re-enable warnings when building the 32-bit vDSO object - Avoid breaking our eggs at the wrong end. Perf and PMUs: - Support for v3 of the Hisilicon L3C PMU - Support for Hisilicon's MN and NoC PMUs - Support for Fujitsu's Uncore PMU - Support for SPE's extended event filtering feature - Preparatory work to enable data source filtering in SPE - Support for multiple lanes in the DWC PCIe PMU - Support for i.MX94 in the IMX DDR PMU driver - MAINTAINERS update (Thank you, Yicong) - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets). Selftests: - Add basic LSFE check to the existing hwcaps test - Support nolibc in GCS tests - Extend SVE ptrace test to pass unsupported regsets and invalid vector lengths - Minor cleanups (typos, cosmetic changes). System registers: - Fix ID_PFR1_EL1 definition - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1 - Sync TCR_EL1 definition with the latest Arm ARM (L.b) - Be stricter about the input fed into our AWK sysreg generator script - Typo fixes and removal of redundant definitions. ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code. CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64: cpufeature: Remove duplicate asm/mmu.h header arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN perf/dwc_pcie: Fix use of uninitialized variable arm/syscalls: mark syscall invocation as likely in invoke_syscall Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver arm64: map [_text, _stext) virtual address range non-executable+read-only arm64/sysreg: Update TCR_EL1 register arm64: Enable vmalloc-huge with ptdump arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list arm64: errata: Apply workarounds for Neoverse-V3AE arm64: cputype: Add Neoverse-V3AE definitions ...
2025-09-25arm64: mm: Move KPTI helpers to mmu.cKevin Brodsky1-1/+6
create_kpti_ng_temp_pgd() is currently defined (as an alias) in mmu.c without matching declaration in a header; instead cpufeature.c makes its own declaration. This is clearly not pretty, and as commit ceca927c86e6 ("arm64: mm: Fix CFI failure due to kpti_ng_pgd_alloc function signature") showed, it also makes it very easy for the prototypes to go out of sync. All this would be much simpler if kpti_install_ng_mappings() and associated functions lived in mmu.c, where they logically belong. This is what this patch does: - Move kpti_install_ng_mappings() and associated functions from cpufeature.c to mmu.c, add a declaration to <asm/mmu.h> - Remove create_kpti_ng_temp_pgd() and just call __create_pgd_mapping_locked() directly instead - Mark all these functions __init - Move __initdata after kpti_ng_temp_alloc (as suggested by checkpatch) Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> [will: Fix conflicts with init_idmap_kpti_bbml2_flag()] Signed-off-by: Will Deacon <will@kernel.org>
2025-09-25tracing: Fix the bug where bpf_get_stackid returns -EFAULT on the ARM64Feng Yang1-0/+1
When using bpf_program__attach_kprobe_multi_opts on ARM64 to hook a BPF program that contains the bpf_get_stackid function, the BPF program fails to obtain the stack trace and returns -EFAULT. This is because ftrace_partial_regs omits the configuration of the pstate register, leaving pstate at the default value of 0. When get_perf_callchain executes, it uses user_mode(regs) to determine whether it is in kernel mode. This leads to a misjudgment that the code is in user mode, so perf_callchain_kernel is not executed and the function returns directly. As a result, trace->nr becomes 0, and finally -EFAULT is returned. Therefore, the assignment of the pstate register is added here. Fixes: b9b55c8912ce ("tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs") Closes: https://lore.kernel.org/bpf/20250919071902.554223-1-yangfeng59949@163.com/ Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2025-09-24Merge branch 'for-next/uprobes' into for-next/coreWill Deacon2-41/+90
* for-next/uprobes: arm64: probes: Fix incorrect bl/blr address and register usage uprobes: uprobe_warn should use passed task arm64: Kconfig: Remove GCS restrictions on UPROBES arm64: uprobes: Add GCS support to uretprobes arm64: probes: Add GCS support to bl/blr/ret arm64: uaccess: Add additional userspace GCS accessors arm64: uaccess: Move existing GCS accessors definitions to gcs.h arm64: probes: Break ret out from bl/blr