aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-02-13KVM: arm64: Convert timer offset VA when accessed in HYP codeMarc Zyngier1-1/+14
Now that EL2 has gained some early timer emulation, it accesses the offsets pointed to by the timer structure, both of which live in the KVM structure. Of course, these are *kernel* pointers, so the dereferencing of these pointers in non-kernel code must be itself be offset. Given switch.h its own version of timer_get_offset() and use that instead. Fixes: b86fc215dc26d ("KVM: arm64: Handle counter access early in non-HYP context") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Anders Roxell <anders.roxell@linaro.org> Link: https://lore.kernel.org/r/20250212173454.2864462-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp()Mark Rutland1-6/+1
At the end of kvm_arch_vcpu_load_fp() we check that no bits are set in SVCR. We only check this for protected mode despite this mattering equally for non-protected mode, and the comment above this is confusing. Remove the comment and simplify the check, moving from WARN_ON() to WARN_ON_ONCE() to avoid spamming the log. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Eagerly switch ZCR_EL{1,2}Mark Rutland6-41/+76
In non-protected KVM modes, while the guest FPSIMD/SVE/SME state is live on the CPU, the host's active SVE VL may differ from the guest's maximum SVE VL: * For VHE hosts, when a VM uses NV, ZCR_EL2 contains a value constrained by the guest hypervisor, which may be less than or equal to that guest's maximum VL. Note: in this case the value of ZCR_EL1 is immaterial due to E2H. * For nVHE/hVHE hosts, ZCR_EL1 contains a value written by the guest, which may be less than or greater than the guest's maximum VL. Note: in this case hyp code traps host SVE usage and lazily restores ZCR_EL2 to the host's maximum VL, which may be greater than the guest's maximum VL. This can be the case between exiting a guest and kvm_arch_vcpu_put_fp(). If a softirq is taken during this period and the softirq handler tries to use kernel-mode NEON, then the kernel will fail to save the guest's FPSIMD/SVE state, and will pend a SIGKILL for the current thread. This happens because kvm_arch_vcpu_ctxsync_fp() binds the guest's live FPSIMD/SVE state with the guest's maximum SVE VL, and fpsimd_save_user_state() verifies that the live SVE VL is as expected before attempting to save the register state: | if (WARN_ON(sve_get_vl() != vl)) { | force_signal_inject(SIGKILL, SI_KERNEL, 0, 0); | return; | } Fix this and make this a bit easier to reason about by always eagerly switching ZCR_EL{1,2} at hyp during guest<->host transitions. With this happening, there's no need to trap host SVE usage, and the nVHE/nVHE __deactivate_cptr_traps() logic can be simplified to enable host access to all present FPSIMD/SVE/SME features. In protected nVHE/hVHE modes, the host's state is always saved/restored by hyp, and the guest's state is saved prior to exit to the host, so from the host's PoV the guest never has live FPSIMD/SVE/SME state, and the host's ZCR_EL1 is never clobbered by hyp. Fixes: 8c8010d69c132273 ("KVM: arm64: Save/restore SVE state for nVHE") Fixes: 2e3cf82063a00ea0 ("KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-9-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Mark some header functions as inlineMark Rutland1-10/+9
The shared hyp switch header has a number of static functions which might not be used by all files that include the header, and when unused they will provoke compiler warnings, e.g. | In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:703:13: warning: 'kvm_hyp_handle_dabt_low' defined but not used [-Wunused-function] | 703 | static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:682:13: warning: 'kvm_hyp_handle_cp15_32' defined but not used [-Wunused-function] | 682 | static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:662:13: warning: 'kvm_hyp_handle_sysreg' defined but not used [-Wunused-function] | 662 | static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:458:13: warning: 'kvm_hyp_handle_fpsimd' defined but not used [-Wunused-function] | 458 | static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:329:13: warning: 'kvm_hyp_handle_mops' defined but not used [-Wunused-function] | 329 | static bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code) | | ^~~~~~~~~~~~~~~~~~~ Mark these functions as 'inline' to suppress this warning. This shouldn't result in any functional change. At the same time, avoid the use of __alias() in the header and alias kvm_hyp_handle_iabt_low() and kvm_hyp_handle_watchpt_low() to kvm_hyp_handle_memory_fault() using CPP, matching the style in the rest of the kernel. For consistency, kvm_hyp_handle_memory_fault() is also marked as 'inline'. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-8-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Refactor exit handlersMark Rutland3-41/+26
The hyp exit handling logic is largely shared between VHE and nVHE/hVHE, with common logic in arch/arm64/kvm/hyp/include/hyp/switch.h. The code in the header depends on function definitions provided by arch/arm64/kvm/hyp/vhe/switch.c and arch/arm64/kvm/hyp/nvhe/switch.c when they include the header. This is an unusual header dependency, and prevents the use of arch/arm64/kvm/hyp/include/hyp/switch.h in other files as this would result in compiler warnings regarding missing definitions, e.g. | In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8: | ./arch/arm64/kvm/hyp/include/hyp/switch.h:733:31: warning: 'kvm_get_exit_handler_array' used but never defined | 733 | static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu); | | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | ./arch/arm64/kvm/hyp/include/hyp/switch.h:735:13: warning: 'early_exit_filter' used but never defined | 735 | static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code); | | ^~~~~~~~~~~~~~~~~ Refactor the logic such that the header doesn't depend on anything from the C files. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-7-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Refactor CPTR trap deactivationMark Rutland3-47/+42
For historical reasons, the VHE and nVHE/hVHE implementations of __activate_cptr_traps() pair with a common implementation of __kvm_reset_cptr_el2(), which ideally would be named __deactivate_cptr_traps(). Rename __kvm_reset_cptr_el2() to __deactivate_cptr_traps(), and split it into separate VHE and nVHE/hVHE variants so that each can be paired with its corresponding implementation of __activate_cptr_traps(). At the same time, fold kvm_write_cptr_el2() into its callers. This makes it clear in-context whether a write is made to the CPACR_EL1 encoding or the CPTR_EL2 encoding, and removes the possibility of confusion as to whether kvm_write_cptr_el2() reformats the sysreg fields as cpacr_clear_set() does. In the nVHE/hVHE implementation of __activate_cptr_traps(), placing the sysreg writes within the if-else blocks requires that the call to __activate_traps_fpsimd32() is moved earlier, but as this was always called before writing to CPTR_EL2/CPACR_EL1, this should not result in a functional change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-6-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Remove VHE host restore of CPACR_EL1.SMENMark Rutland2-22/+0
When KVM is in VHE mode, the host kernel tries to save and restore the configuration of CPACR_EL1.SMEN (i.e. CPTR_EL2.SMEN when HCR_EL2.E2H=1) across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the configuration may be clobbered by hyp when running a vCPU. This logic has historically been broken, and is currently redundant. This logic was originally introduced in commit: 861262ab86270206 ("KVM: arm64: Handle SME host state when running guests") At the time, the VHE hyp code would reset CPTR_EL2.SMEN to 0b00 when returning to the host, trapping host access to SME state. Unfortunately, this was unsafe as the host could take a softirq before calling kvm_arch_vcpu_put_fp(), and if a softirq handler were to use kernel mode NEON the resulting attempt to save the live FPSIMD/SVE/SME state would result in a fatal trap. That issue was limited to VHE mode. For nVHE/hVHE modes, KVM always saved/restored the host kernel's CPACR_EL1 value, and configured CPTR_EL2.TSM to 0b0, ensuring that host usage of SME would not be trapped. The issue above was incidentally fixed by commit: 375110ab51dec5dc ("KVM: arm64: Fix resetting SME trap values on reset for (h)VHE") That commit changed the VHE hyp code to configure CPTR_EL2.SMEN to 0b01 when returning to the host, permitting host kernel usage of SME, avoiding the issue described above. At the time, this was not identified as a fix for commit 861262ab86270206. Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME state, there's no need to save/restore the state of the EL0 SME trap. The kernel can safely save/restore state without trapping, as described above, and will restore userspace state (including trap controls) before returning to userspace. Remove the redundant logic. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-5-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Remove VHE host restore of CPACR_EL1.ZENMark Rutland2-17/+0
When KVM is in VHE mode, the host kernel tries to save and restore the configuration of CPACR_EL1.ZEN (i.e. CPTR_EL2.ZEN when HCR_EL2.E2H=1) across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the configuration may be clobbered by hyp when running a vCPU. This logic is currently redundant. The VHE hyp code unconditionally configures CPTR_EL2.ZEN to 0b01 when returning to the host, permitting host kernel usage of SVE. Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME state, there's no need to save/restore the state of the EL0 SVE trap. The kernel can safely save/restore state without trapping, as described above, and will restore userspace state (including trap controls) before returning to userspace. Remove the redundant logic. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-4-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Remove host FPSIMD saving for non-protected KVMMark Rutland7-64/+29
Now that the host eagerly saves its own FPSIMD/SVE/SME state, non-protected KVM never needs to save the host FPSIMD/SVE/SME state, and the code to do this is never used. Protected KVM still needs to save/restore the host FPSIMD/SVE state to avoid leaking guest state to the host (and to avoid revealing to the host whether the guest used FPSIMD/SVE/SME), and that code needs to be retained. Remove the unused code and data structures. To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the VHE hyp code, the nVHE/hVHE version is moved into the shared switch header, where it is only invoked when KVM is in protected mode. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-13KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME stateMark Rutland2-50/+10
There are several problems with the way hyp code lazily saves the host's FPSIMD/SVE state, including: * Host SVE being discarded unexpectedly due to inconsistent configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to result in QEMU crashes where SVE is used by memmove(), as reported by Eric Auger: https://issues.redhat.com/browse/RHEL-68997 * Host SVE state is discarded *after* modification by ptrace, which was an unintentional ptrace ABI change introduced with lazy discarding of SVE state. * The host FPMR value can be discarded when running a non-protected VM, where FPMR support is not exposed to a VM, and that VM uses FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR before unbinding the host's FPSIMD/SVE/SME state, leaving a stale value in memory. Avoid these by eagerly saving and "flushing" the host's FPSIMD/SVE/SME state when loading a vCPU such that KVM does not need to save any of the host's FPSIMD/SVE/SME state. For clarity, fpsimd_kvm_prepare() is removed and the necessary call to fpsimd_save_and_flush_cpu_state() is placed in kvm_arch_vcpu_load_fp(). As 'fpsimd_state' and 'fpmr_ptr' should not be used, they are set to NULL; all uses of these will be removed in subsequent patches. Historical problems go back at least as far as v5.17, e.g. erroneous assumptions about TIF_SVE being clear in commit: 8383741ab2e773a9 ("KVM: arm64: Get rid of host SVE tracking/saving") ... and so this eager save+flush probably needs to be backported to ALL stable trees. Fixes: 93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests") Fixes: 8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Fixes: ef3be86021c3bdf3 ("KVM: arm64: Add save/restore support for FPMR") Reported-by: Eric Auger <eauger@redhat.com> Reported-by: Wilco Dijkstra <wilco.dijkstra@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Eric Auger <eric.auger@redhat.com> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Fuad Tabba <tabba@google.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-2-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-10KVM: arm64: Fix __pkvm_host_mkyoung_guest() return valueMarc Zyngier1-2/+1
Don't use an uninitialised stack variable, and just return 0 on the non-error path. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502100911.8c9DbtKD-lkp@intel.com/ Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-09KVM: arm64: Simplify np-guest hypercallsQuentin Perret1-31/+38
When the handling of a guest stage-2 permission fault races with an MMU notifier, the faulting page might be gone from the guest's stage-2 by the point we attempt to call (p)kvm_pgtable_stage2_relax_perms(). In the normal KVM case, this leads to returning -EAGAIN which user_mem_abort() handles correctly by simply re-entering the guest. However, the pKVM hypercall implementation has additional logic to check the page state using __check_host_shared_guest() which gets confused with absence of a page mapped at the requested IPA and returns -ENOENT, hence breaking user_mem_abort() and hilarity ensues. Luckily, several of the hypercalls for managing the stage-2 page-table of NP guests have no effect on the pKVM ownership tracking (wrprotect, test_clear_young, mkyoung, and crucially relax_perms), so the extra state checking logic is in fact not strictly necessary. So, to fix the discrepancy between standard KVM and pKVM, let's just drop the superfluous __check_host_shared_guest() logic from those hypercalls and make the extra state checking a debug assertion dependent on CONFIG_NVHE_EL2_DEBUG as we already do for other transitions. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250207145438.1333475-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-09KVM: arm64: Improve error handling from check_host_shared_guest()Quentin Perret1-2/+2
The check_host_shared_guest() path expects to find a last-level valid PTE in the guest's stage-2 page-table. However, it checks the PTE's level before its validity, which makes it hard for callers to figure out what went wrong. To make error handling simpler, check the PTE's validity first. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250207145438.1333475-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: timer: Don't adjust the EL2 virtual timer offsetMarc Zyngier2-18/+13
The way we deal with the EL2 virtual timer is a bit odd. We try to cope with E2H being flipped, and adjust which offset applies to that timer depending on the current E2H value. But that's a complexity we shouldn't have to worry about. What we have to deal with is either E2H being RES1, in which case there is no offset, or E2H being RES0, and the virtual timer simply does not exist. Drop the adjusting of the timer offset, which makes things a bit simpler. At the same time, make sure that accessing the HV timer when E2H is RES0 results in an UNDEF in the guest. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECVMarc Zyngier1-20/+10
Both Wei-Lin Chang and Volodymyr Babchuk report that the way we handle the emulation of EL1 timers with NV is completely wrong, specially in the case of HCR_EL2.E2H==0. There are three problems in about as many lines of code: - With E2H==0, the EL1 timers are overwritten with the EL1 state, while they should actually contain the EL2 state (as per the timer map) - With E2H==1, we run the full EL1 timer emulation even when ECV is present, hiding a bug in timer_emulate() (see previous patch) - The comments are actively misleading, and say all the wrong things. This is only attributable to the code having been initially written for FEAT_NV, hacked up to handle FEAT_NV2 *in parallel*, and vaguely hacked again to be FEAT_NV2 only. Oh, and yours truly being a gold plated idiot. The fix is obvious: just delete most of the E2H==0 code, have a unified handling of the timers (because they really are E2H agnostic), and make sure we don't execute any of that when FEAT_ECV is present. Fixes: 4bad3068cfa9f ("KVM: arm64: nv: Sync nested timer state with FEAT_NV2") Reported-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw> Reported-by: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> Link: https://lore.kernel.org/r/fqiqfjzwpgbzdtouu2pwqlu7llhnf5lmy4hzv5vo6ph4v3vyls@jdcfy3fjjc5k Link: https://lore.kernel.org/r/87frl51tse.fsf@epam.com Tested-by: Dmytro Terletskyi <dmytro_terletskyi@epam.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: timer: Always evaluate the need for a soft timerMarc Zyngier1-3/+1
When updating the interrupt state for an emulated timer, we return early and skip the setup of a soft timer that runs in parallel with the guest. While this is OK if we have set the interrupt pending, it is pretty wrong if the guest moved CVAL into the future. In that case, no timer is armed and the guest can wait for a very long time (it will take a full put/load cycle for the situation to resolve). This is specially visible with EDK2 running at EL2, but still using the EL1 virtual timer, which in that case is fully emulated. Any key-press takes ages to be captured, as there is no UART interrupt and EDK2 relies on polling from a timer... The fix is simply to drop the early return. If the timer interrupt is pending, we will still return early, and otherwise arm the soft timer. Fixes: 4d74ecfa6458b ("KVM: arm64: Don't arm a hrtimer for an already pending timer") Cc: stable@vger.kernel.org Tested-by: Dmytro Terletskyi <dmytro_terletskyi@epam.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: Fix nested S2 MMU structures reallocationMarc Zyngier1-4/+5
For each vcpu that userspace creates, we allocate a number of s2_mmu structures that will eventually contain our shadow S2 page tables. Since this is a dynamically allocated array, we reallocate the array and initialise the newly allocated elements. Once everything is correctly initialised, we adjust pointer and size in the kvm structure, and move on. But should that initialisation fail *and* the reallocation triggered a copy to another location, we end-up returning early, with the kvm structure still containing the (now stale) old pointer. Weeee! Cure it by assigning the pointer early, and use this to perform the initialisation. If everything succeeds, we adjust the size. Otherwise, we just leave the size as it was, no harm done, and the new memory is as good as the ol' one (we hope...). Fixes: 4f128f8e1aaac ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20250204145554.774427-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: Fail protected mode init if no vgic hardware is presentOliver Upton1-0/+13
Protected mode assumes that at minimum vgic-v3 is present, however KVM fails to actually enforce this at the time of initialization. As such, when running protected mode in a half-baked state on GICv2 hardware we see the hyp go belly up at vcpu_load() when it tries to restore the vgic-v3 cpuif: $ ./arch_timer_edge_cases [ 130.599140] kvm [4518]: nVHE hyp panic at: [<ffff800081102b58>] __kvm_nvhe___vgic_v3_restore_vmcr_aprs+0x8/0x84! [ 130.603685] kvm [4518]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE [ 130.611962] kvm [4518]: Hyp Offset: 0xfffeca95ed000000 [ 130.617053] Kernel panic - not syncing: HYP panic: [ 130.617053] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000 [ 130.617053] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000 [ 130.617053] VCPU:0000000000000000 [ 130.638013] CPU: 0 UID: 0 PID: 4518 Comm: arch_timer_edge Tainted: G C 6.13.0-rc3-00009-gf7d03fcbf1f4 #1 [ 130.648790] Tainted: [C]=CRAP [ 130.651721] Hardware name: Libre Computer AML-S905X-CC (DT) [ 130.657242] Call trace: [ 130.659656] show_stack+0x18/0x24 (C) [ 130.663279] dump_stack_lvl+0x38/0x90 [ 130.666900] dump_stack+0x18/0x24 [ 130.670178] panic+0x388/0x3e8 [ 130.673196] nvhe_hyp_panic_handler+0x104/0x208 [ 130.677681] kvm_arch_vcpu_load+0x290/0x548 [ 130.681821] vcpu_load+0x50/0x80 [ 130.685013] kvm_arch_vcpu_ioctl_run+0x30/0x868 [ 130.689498] kvm_vcpu_ioctl+0x2e0/0x974 [ 130.693293] __arm64_sys_ioctl+0xb4/0xec [ 130.697174] invoke_syscall+0x48/0x110 [ 130.700883] el0_svc_common.constprop.0+0x40/0xe0 [ 130.705540] do_el0_svc+0x1c/0x28 [ 130.708818] el0_svc+0x30/0xd0 [ 130.711837] el0t_64_sync_handler+0x10c/0x138 [ 130.716149] el0t_64_sync+0x198/0x19c [ 130.719774] SMP: stopping secondary CPUs [ 130.723660] Kernel Offset: disabled [ 130.727103] CPU features: 0x000,00000800,02800000,0200421b [ 130.732537] Memory Limit: none [ 130.735561] ---[ end Kernel panic - not syncing: HYP panic: [ 130.735561] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000 [ 130.735561] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000 [ 130.735561] VCPU:0000000000000000 ]--- Fix it by failing KVM initialization if the system doesn't implement vgic-v3, as protected mode will never do anything useful on such hardware. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/5ca7588c-7bf2-4352-8661-e4a56a9cd9aa@sirena.org.uk/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250203231543.233511-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-01KVM: arm64: Flush/sync debug state in protected modeOliver Upton1-0/+24
The recent changes to debug state management broke self-hosted debug for guests when running in protected mode, since both the debug owner and the debug state itself aren't shared with the hyp's view of the vcpu. Fix it by flushing/syncing the relevant bits with the hyp vcpu. Fixes: beb470d96cec ("KVM: arm64: Use debug_owner to track if debug regs need save/restore") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/5f62740f-a065-42d9-9f56-8fb648b9c63f@sirena.org.uk/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250131222922.1548780-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-21KVM: arm64: Flush hyp bss section after initialization of variables in bssLokesh Vutla1-0/+7
To determine CPU features during initialization, the nVHE hypervisor utilizes sanitized values of the host's CPU features registers. These values, stored in u64 idaa64*_el1_sys_val variables are updated by the kvm_hyp_init_symbols() function at EL1. To ensure EL2 visibility with the MMU off, the data cache needs to be flushed after these updates. However, individually flushing each variable using kvm_flush_dcache_to_poc() is inefficient. These cpu feature variables would be part of the bss section of the hypervisor. Hence, flush the entire bss section of hypervisor once the initialization is complete. Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers") Suggested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Lokesh Vutla <lokeshvutla@google.com> Link: https://lore.kernel.org/r/20250121044016.2219256-1-lokeshvutla@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-17arm64/sysreg: Get rid of TRFCR_ELx SysregFieldsMarc Zyngier4-20/+16
There is no such thing as TRFCR_ELx in the architecture. What we have is TRFCR_EL1, for which TRFCR_EL12 is an accessor. Rename TRFCR_ELx_* to TRFCR_EL1_*, and fix the bit of code using these names. Similarly, TRFCR_EL12 is redefined as a mapping to TRFCR_EL1. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/87cygsqgkh.wl-maz@kernel.org Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com>
2025-01-16KVM: arm64: nv: Fix doc header layout for timersMarc Zyngier1-3/+2
Stephen reports that 'make htmldocs' spits out a warning ("Documentation/virt/kvm/devices/vcpu.rst:147: WARNING: Definition list ends without a blank line; unexpected unindent."). Fix it by keeping all the timer attributes on a single line. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-14KVM: arm64: nv: Apply RESx settings to sysreg reset valuesMarc Zyngier3-4/+12
While we have sanitisation in place for the guest sysregs, we lack that sanitisation out of reset. So some of the fields could be evaluated and not reflect their RESx status, which sounds like a very bad idea. Apply the RESx masks to the the sysreg file in two situations: - when going via a reset of the sysregs - after having computed the RESx masks Having this separate reset phase from the actual reset handling is a bit grotty, but we need to apply this after the ID registers are final. Tested-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250112165029.1181056-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-14KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessorsMarc Zyngier2-23/+17
A lot of the NV code depends on HCR_EL2.{E2H,TGE}, and we assume in places that at least HCR_EL2.E2H is invariant for a given guest. However, we make a point in *not* using the sanitising accessor that would enforce this, and are at the mercy of the guest doing stupid things. Clearly, that's not good. Rework the HCR_EL2 accessors to use __vcpu_sys_reg() instead, guaranteeing that the RESx settings get applied, specially when HCR_EL2.E2H is evaluated. This results in fewer accessors overall. Huge thanks to Joey who spent a long time tracking this bug down. Reported-by: Joey Gouly <Joey.Gouly@arm.com> Tested-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20250112165029.1181056-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-13KVM: arm64: Fix selftests after sysreg field name updateMarc Zyngier2-2/+2
Fix KVM selftests that check for EL0's 64bit-ness, and use a now removed definition. Kindly point them at the new one. Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12coresight: Pass guest TRFCR value to KVMJames Clark5-17/+56
Currently the userspace and kernel filters for guests are never set, so no trace will be generated for them. Add support for tracing guests by passing the desired TRFCR value to KVM so it can be applied to the guest. By writing either E1TRE or E0TRE, filtering on either guest kernel or guest userspace is also supported. And if both E1TRE and E0TRE are cleared when exclude_guest is set, that option is supported too. This change also brings exclude_host support which is difficult to add as a separate commit without excess churn and resulting in no trace at all. cpu_prohibit_trace() gets moved to TRBE because the ETM driver doesn't need the read, it already has the base TRFCR value. TRBE only needs the read to disable it and then restore. Testing ======= The addresses were counted with the following: $ perf report -D | grep -Eo 'EL2|EL1|EL0' | sort | uniq -c Guest kernel only: $ perf record -e cs_etm//Gk -a -- true 535 EL1 1 EL2 Guest user only (only 5 addresses because the guest runs slowly in the model): $ perf record -e cs_etm//Gu -a -- true 5 EL0 Host kernel only: $ perf record -e cs_etm//Hk -a -- true 3501 EL2 Host userspace only: $ perf record -e cs_etm//Hu -a -- true 408 EL0 1 EL2 Signed-off-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20250106142446.628923-8-james.clark@linaro.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12KVM: arm64: Support trace filtering for guestsJames Clark2-0/+20
For nVHE, switch the filter value in and out if the Coresight driver asks for it. This will support filters for guests when sinks other than TRBE are used. For VHE, just write the filter directly to TRFCR_EL1 where trace can be used even with TRBE sinks. Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250106142446.628923-7-james.clark@linaro.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12KVM: arm64: coresight: Give TRBE enabled state to KVMJames Clark4-28/+79
Currently in nVHE, KVM has to check if TRBE is enabled on every guest switch even if it was never used. Because it's a debug feature and is more likely to not be used than used, give KVM the TRBE buffer status to allow a much simpler and faster do-nothing path in the hyp. Protected mode now disables trace regardless of TRBE (because trfcr_while_in_guest is always 0), which was not previously done. However, it continues to flush whenever the buffer is enabled regardless of the filter status. This avoids the hypothetical case of a host that had disabled the filter but not flushed which would arise if only doing the flush when the filter was enabled. Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250106142446.628923-6-james.clark@linaro.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12coresight: trbe: Remove redundant disable callJames Clark1-1/+1
trbe_drain_and_disable_local() just clears TRBLIMITR and drains. TRBLIMITR is already cleared on the next line after this call, so replace it with only drain. This is so we can make a kvm call that has a preempt enabled warning from set_trbe_disabled() in the next commit, where trbe_reset_local() is called from a preemptible hotplug path. Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250106142446.628923-5-james.clark@linaro.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12arm64/sysreg/tools: Move TRFCR definitions to sysregJames Clark3-24/+36
Convert TRFCR to automatic generation. Add separate definitions for ELx and EL2 as TRFCR_EL1 doesn't have CX. This also mirrors the previous definition so no code change is required. Also add TRFCR_EL12 which will start to be used in a later commit. Unfortunately, to avoid breaking the Perf build with duplicate definition errors, the tools copy of the sysreg.h header needs to be updated at the same time rather than the usual second commit. This is because the generated version of sysreg (arch/arm64/include/generated/asm/sysreg-defs.h), is currently shared and tools/ does not have its own copy. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250106142446.628923-4-james.clark@linaro.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12tools: arm64: Update sysreg.h header filesJames Clark2-8/+405
Created with the following: cp include/linux/kasan-tags.h tools/include/linux/ cp arch/arm64/include/asm/sysreg.h tools/arch/arm64/include/asm/ Update the tools copy of sysreg.h so that the next commit to add a new register doesn't have unrelated changes in it. Because the new version of sysreg.h includes kasan-tags.h, that file also now needs to be copied into tools. Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250106142446.628923-3-james.clark@linaro.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12KVM: arm64: Drop pkvm_mem_transition for host/hyp donationsQuentin Perret1-253/+32
Simplify the __pkvm_host_donate_hyp() and pkvm_hyp_donate_host() paths by not using the pkvm_mem_transition machinery. As the last users of this, also remove all the now unused code. No functional changes intended. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250110121936.1559655-4-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12KVM: arm64: Drop pkvm_mem_transition for host/hyp sharingQuentin Perret1-285/+34
Simplify the __pkvm_host_{un}share_hyp() paths by not using the pkvm_mem_transition machinery. As there are the last users of the do_share()/do_unshare(), remove all the now-unused code as well. No functional changes intended. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250110121936.1559655-3-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-12KVM: arm64: Drop pkvm_mem_transition for FF-AQuentin Perret1-26/+10
Simplify the __pkvm_host_{un}share_ffa() paths by using {check,set}_page_state_range(). No functional changes intended. Signed-off-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20250110121936.1559655-2-qperret@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-11KVM: arm64: Explicitly handle BRBE traps as UNDEFINEDMark Rutland1-0/+11
The Branch Record Buffer Extension (BRBE) adds a number of system registers and instructions which we don't currently intend to expose to guests. Our existing logic handles this safely, but this could be improved with some explicit handling of BRBE. KVM currently hides BRBE from guests: the cpufeature code's ftr_id_aa64dfr0[] table doesn't have an entry for the BRBE field, and so this will be zero in the sanitised value of ID_AA64DFR0 exposed to guests via read_sanitised_id_aa64dfr0_el1(). KVM currently traps BRBE usage from guests: the default configuration of the fine-grained trap controls HDFGRTR_EL2.{nBRBDATA,nBRBCTL,nBRBIDR} and HFGITR_EL2.{nBRBINJ_nBRBIALL} cause these to be trapped to EL2. Well-behaved guests shouldn't try to use the registers or instructions, but badly-behaved guests could use these, resulting in unnecessary warnings from KVM before it injects an UNDEF, e.g. | kvm [197]: Unsupported guest access at: 401c98 | { Op0( 2), Op1( 1), CRn( 9), CRm( 0), Op2( 0), func_read }, | kvm [197]: Unsupported guest access at: 401d04 | { Op0( 2), Op1( 1), CRn( 9), CRm( 0), Op2( 1), func_read }, | kvm [197]: Unsupported guest access at: 401d70 | { Op0( 2), Op1( 1), CRn( 9), CRm( 2), Op2( 0), func_read }, | kvm [197]: Unsupported guest access at: 401ddc | { Op0( 2), Op1( 1), CRn( 9), CRm( 1), Op2( 0), func_read }, | kvm [197]: Unsupported guest access at: 401e48 | { Op0( 2), Op1( 1), CRn( 9), CRm( 1), Op2( 1), func_read }, | kvm [197]: Unsupported guest access at: 401eb4 | { Op0( 2), Op1( 1), CRn( 9), CRm( 1), Op2( 2), func_read }, | kvm [197]: Unsupported guest access at: 401f20 | { Op0( 2), Op1( 1), CRn( 9), CRm( 0), Op2( 2), func_read }, | kvm [197]: Unsupported guest access at: 401f8c | { Op0( 1), Op1( 1), CRn( 7), CRm( 2), Op2( 4), func_write }, | kvm [197]: Unsupported guest access at: 401ff8 | { Op0( 1), Op1( 1), CRn( 7), CRm( 2), Op2( 5), func_write }, As with other features that we know how to handle, these warnings aren't particularly interesting, and we can simply treat these as UNDEFINED without any warning. Add the necessary fine-grained undef configuration to make this happen, as suggested by Marc Zyngier: https://lore.kernel.org/linux-arm-kernel/86r0czk6wd.wl-maz@kernel.org/ At the same time, update read_sanitised_id_aa64dfr0_el1() to hide BRBE from guests, as we do for SPE. This will prevent accidentally exposing BRBE to guests if/when ftr_id_aa64dfr0[] gains a BRBE entry. Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20250109223836.419240-1-robh@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-11KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()Thorsten Blum1-2/+3
Remove hard-coded strings by using the str_enabled_disabled() helper function. Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250110225310.369980-2-thorsten.blum@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-08arm64: kvm: Introduce nvhe stack size constantsKalesh Singh8-30/+33
Refactor nvhe stack code to use NVHE_STACK_SIZE/SHIFT constants, instead of directly using PAGE_SIZE/SHIFT. This makes the code a bit easier to read, without introducing any functional changes. Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Link: https://lore.kernel.org/r/20241112003336.1375584-1-kaleshsingh@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-08KVM: arm64: Fix nVHE stacktrace VA bits maskVincent Donnefort3-1/+7
The hypervisor VA space size depends on both the ID map's (IDMAP_VA_BITS) and the kernel stage-1 (VA_BITS). However, the hypervisor stacktrace decoding is solely relying on VA_BITS. This is especially an issue when VA_BITS < IDMAP_VA_BITS (i.e. VA_BITS is 39-bit): the hypervisor may have addresses bigger than the stacktrace is masking. Align this mask with hyp_va_bits. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/r/20250107112821.416591-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-08KVM: arm64: Fix FEAT_MTE in pKVMVladimir Murzin1-0/+6
Make sure we do not trap access to Allocation Tags. Fixes: b56680de9c64 ("KVM: arm64: Initialize trap register values in hyp in pKVM") Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20250106112421.65355-1-vladimir.murzin@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-08Documentation: Update the behaviour of "kvm-arm.mode"Mostafa Saleh1-6/+10
Commit 5053c3f0519c ("KVM: arm64: Use hVHE in pKVM by default on CPUs with VHE support") modified the behaviour of "kvm-arm.mode=protected" without the updating the kernel parameters doc. Update it to match the current implementation. Also, update required architecture version for nested virtualization as suggested by Marc. Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20241025093259.2216093-1-smostafa@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Document EL2 timer APIMarc Zyngier1-6/+9
Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-13-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosityMarc Zyngier8-8/+90
It appears that on Qualcomm's x1e CPU, CNTVOFF_EL2 doesn't really work, specially with HCR_EL2.E2H=1. A non-zero offset results in a screaming virtual timer interrupt, to the tune of a few 100k interrupts per second on a 4 vcpu VM. This is also evidenced by this CPU's inability to correctly run any of the timer selftests. The only case this doesn't break is when this register is set to 0, which breaks VM migration. When HCR_EL2.E2H=0, the timer seems to behave normally, and does not result in an interrupt storm. As a workaround, use the fact that this CPU implements FEAT_ECV, and trap all accesses to the virtual timer and counter, keeping CNTVOFF_EL2 set to zero, and emulate accesses to CVAL/TVAL/CTL and the counter itself, fixing up the timer to account for the missing offset. And if you think this is disgusting, you'd probably be right. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Sanitise CNTHCTL_EL2Marc Zyngier3-1/+18
Inject some sanity in CNTHCTL_EL2, ensuring that we don't handle more than we advertise to the guest. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Propagate CNTHCTL_EL2.EL1NV{P,V}CT bitsMarc Zyngier1-0/+7
Allow a guest hypervisor to trap accesses to CNT{P,V}CT_EL02 by propagating these trap bits to the host trap configuration. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Add trap routing for CNTHCTL_EL2.EL1{NVPCT,NVVCT,TVT,TVCT}Marc Zyngier1-2/+56
For completeness, fun, and cerebral meltdown, add the virtualisation related traps to the counter and timers. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: Handle counter access early in non-HYP contextMarc Zyngier1-13/+21
We already deal with CNTPCT_EL0 accesses in non-HYP context. Let's add CNTVCT_EL0 as a good measure. This is also an opportunity to simplify things and make it plain that this code is only for non-HYP context handling. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Accelerate EL0 counter accesses from hypervisor contextMarc Zyngier1-0/+8
Similarly to handling the physical timer accesses early when FEAT_ECV causes a trap, we try to handle the physical counter without returning to the general sysreg handling. More surprisingly, we introduce something similar for the virtual counter. Although this isn't necessary yet, it will prove useful on systems that have a broken CNTVOFF_EL2 implementation. Yes, they exist. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV in useMarc Zyngier4-18/+122
Although FEAT_ECV allows us to correctly emulate the timers, it also reduces performances pretty badly. Mitigate this by emulating the CTL/CVAL register reads in the inner run loop, without returning to the general kernel. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-6-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timersMarc Zyngier2-3/+35
Although FEAT_NV2 makes most things fast, it also makes it impossible to correctly emulate the timers, as the sysreg accesses are redirected to memory. FEAT_ECV addresses this by giving a hypervisor the ability to trap the EL02 sysregs as well as the virtual timer. Add the required trap setting to make use of the feature, allowing us to elide the ugly resync in the middle of the run loop. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-02KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory stateMarc Zyngier2-1/+22
With FEAT_NV2, the EL0 timer state is entirely stored in memory, meaning that the hypervisor can only provide a very poor emulation. The only thing we can really do is to publish the interrupt state in the guest view of CNT{P,V}_CTL_EL0, and defer everything else to the next exit. Only FEAT_ECV will allow us to fix it, at the cost of extra trapping. Suggested-by: Chase Conklin <chase.conklin@arm.com> Suggested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>