aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2024-11-20KVM: arm64: vgic-its: Add stronger type-checking to the ITS entry sizesMarc Zyngier2-42/+50
The ITS ABI infrastructure allows for some pretty lax code, where the size of the data doesn't have to match the size of the entry, potentially leading to a collection of interesting bugs. Commit 7fe28d7e68f9 ("KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*") added some checks, but starts by implicitly casting all writes to a 64bit value, hiding some of the issues. Instead, introduce macros that will check the data type actually used for dealing with the table entries. The macros are taking a symbolic entry type that is used to fetch the size of the entry type for the current ABI. This immediately catches a couple of low-impact gotchas (zero values that are implicitly 32bit), easy enough to fix. Given that we currently only have a single ABI, hardcode a couple of BUILD_BUG_ON()s that will fire if we use anything but a 64bit quantity, and some (currently unreachable) fallback code that may become useful one day. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20KVM: arm64: vgic: Kill VGIC_MAX_PRIVATE definitionMarc Zyngier2-3/+2
VGIC_MAX_PRIVATE is a pretty useless definition, and is better replaced with VGIC_NR_PRIVATE_IRQS. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20KVM: arm64: vgic: Make vgic_get_irq() more robustMarc Zyngier11-57/+71
vgic_get_irq() has an awkward signature, as it takes both a kvm *and* a vcpu, where the vcpu is allowed to be NULL if the INTID being looked up is a global interrupt (SPI or LPI). This leads to potentially problematic situations where the INTID passed is a private interrupt, but that there is no vcpu. In order to make things less ambiguous, let have *two* helpers instead: - vgic_get_irq(struct kvm *kvm, u32 intid), which is only concerned with *global* interrupts, as indicated by the lack of vcpu. - vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid), which can return *any* interrupt class, but must have of course a non-NULL vcpu. Most of the code nicely falls under one or the other situations, except for a couple of cases (close to the UABI or in the debug code) where we have to distinguish between the two cases. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241117165757.247686-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-20KVM: arm64: vgic-v3: Sanitise guest writes to GICR_INVLPIRMarc Zyngier1-1/+6
Make sure we filter out non-LPI invalidation when handling writes to GICR_INVLPIR. Fixes: 4645d11f4a553 ("KVM: arm64: vgic-v3: Implement MMIO-based LPI invalidation") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241117165757.247686-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-12KVM: arm64: Pass on SVE mapping failuresJames Clark1-2/+1
This function can fail but its return value isn't passed onto the caller. Presumably this could result in a broken state. Fixes: 66d5b53e20a6 ("KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Fuad Tabba <tabba@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241112105604.795809-1-james.clark@linaro.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Clear ITE when DISCARD frees an ITEKunkun Jiang1-1/+5
When DISCARD frees an ITE, it does not invalidate the corresponding ITE. In the scenario of continuous saves and restores, there may be a situation where an ITE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding ITE when DISCARD frees an ITE. Cc: stable@vger.kernel.org Fixes: eff484e0298d ("KVM: arm64: vgic-its: ITT save and restore") Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-6-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Clear DTE when MAPD unmaps a deviceKunkun Jiang1-2/+4
vgic_its_save_device_tables will traverse its->device_list to save DTE for each device. vgic_its_restore_device_tables will traverse each entry of device table and check if it is valid. Restore if valid. But when MAPD unmaps a device, it does not invalidate the corresponding DTE. In the scenario of continuous saves and restores, there may be a situation where a device's DTE is not saved but is restored. This is unreasonable and may cause restore to fail. This patch clears the corresponding DTE when MAPD unmaps a device. Cc: stable@vger.kernel.org Fixes: 57a9a117154c ("KVM: arm64: vgic-its: Device table save/restore") Co-developed-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Shusen Li <lishusen2@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with entry write helper] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-5-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: vgic-its: Add a data length check in vgic_its_save_*Jing Zhang2-12/+31
In all the vgic_its_save_*() functinos, they do not check whether the data length is 8 bytes before calling vgic_write_guest_lock. This patch adds the check. To prevent the kernel from being blown up when the fault occurs, KVM_BUG_ON() is used. And the other BUG_ON()s are replaced together. Cc: stable@vger.kernel.org Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> [Jing: Update with the new entry read/write helpers] Signed-off-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20241107214137.428439-4-jingzhangos@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: selftests: Don't bother deleting memslots in KVM when freeing VMsSean Christopherson1-4/+6
When freeing a VM, don't call into KVM to manually remove each memslot, simply cleanup and free any userspace assets associated with the memory region. KVM is ultimately responsible for ensuring kernel resources are freed when the VM is destroyed, deleting memslots one-by-one is unnecessarily slow, and unless a test is already leaking the VM fd, the VM will be destroyed when kvm_vm_release() is called. Not deleting KVM's memslot also allows cleaning up dead VMs without having to care whether or not the to-be-freed VM is dead or alive. Reported-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/kvmarm/Zy0bcM0m-N18gAZz@google.com/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-11-11KVM: arm64: Make L1Ip feature in CTR_EL0 writable from userspaceShameer Kolothum1-4/+34
Only allow userspace to set VIPT(0b10) or PIPT(0b11) for L1Ip based on what hardware reports as both AIVIVT (0b01) and VPIPT (0b00) are documented as reserved. Using a VIPT for Guest where hardware reports PIPT may lead to over invalidation, but is still correct. Hence, we can allow downgrading PIPT to VIPT, but not the other way around. Reviewed-by: Sebastian Ott <sebott@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20241022073943.35764-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Get rid of userspace_irqchip_in_useRaghavendra Rao Ananta3-19/+4
Improper use of userspace_irqchip_in_use led to syzbot hitting the following WARN_ON() in kvm_timer_update_irq(): WARNING: CPU: 0 PID: 3281 at arch/arm64/kvm/arch_timer.c:459 kvm_timer_update_irq+0x21c/0x394 Call trace: kvm_timer_update_irq+0x21c/0x394 arch/arm64/kvm/arch_timer.c:459 kvm_timer_vcpu_reset+0x158/0x684 arch/arm64/kvm/arch_timer.c:968 kvm_reset_vcpu+0x3b4/0x560 arch/arm64/kvm/reset.c:264 kvm_vcpu_set_target arch/arm64/kvm/arm.c:1553 [inline] kvm_arch_vcpu_ioctl_vcpu_init arch/arm64/kvm/arm.c:1573 [inline] kvm_arch_vcpu_ioctl+0x112c/0x1b3c arch/arm64/kvm/arm.c:1695 kvm_vcpu_ioctl+0x4ec/0xf74 virt/kvm/kvm_main.c:4658 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl fs/ioctl.c:893 [inline] __arm64_sys_ioctl+0x108/0x184 fs/ioctl.c:893 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x78/0x1b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0xe8/0x1b0 arch/arm64/kernel/syscall.c:132 do_el0_svc+0x40/0x50 arch/arm64/kernel/syscall.c:151 el0_svc+0x54/0x14c arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 The following sequence led to the scenario: - Userspace creates a VM and a vCPU. - The vCPU is initialized with KVM_ARM_VCPU_PMU_V3 during KVM_ARM_VCPU_INIT. - Without any other setup, such as vGIC or vPMU, userspace issues KVM_RUN on the vCPU. Since the vPMU is requested, but not setup, kvm_arm_pmu_v3_enable() fails in kvm_arch_vcpu_run_pid_change(). As a result, KVM_RUN returns after enabling the timer, but before incrementing 'userspace_irqchip_in_use': kvm_arch_vcpu_run_pid_change() ret = kvm_arm_pmu_v3_enable() if (!vcpu->arch.pmu.created) return -EINVAL; if (ret) return ret; [...] if (!irqchip_in_kernel(kvm)) static_branch_inc(&userspace_irqchip_in_use); - Userspace ignores the error and issues KVM_ARM_VCPU_INIT again. Since the timer is already enabled, control moves through the following flow, ultimately hitting the WARN_ON(): kvm_timer_vcpu_reset() if (timer->enabled) kvm_timer_update_irq() if (!userspace_irqchip()) ret = kvm_vgic_inject_irq() ret = vgic_lazy_init() if (unlikely(!vgic_initialized(kvm))) if (kvm->arch.vgic.vgic_model != KVM_DEV_TYPE_ARM_VGIC_V2) return -EBUSY; WARN_ON(ret); Theoretically, since userspace_irqchip_in_use's functionality can be simply replaced by '!irqchip_in_kernel()', get rid of the static key to avoid the mismanagement, which also helps with the syzbot issue. Cc: <stable@vger.kernel.org> Reported-by: syzbot <syzkaller@googlegroups.com> Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Reprogram PMU events affected by nested transitionOliver Upton3-0/+36
Start reprogramming PMU events at nested boundaries now that everything is in place to handle the EL2 event filter. Only repaint events where the filter differs between EL1 and EL2 as a slight optimization. PMU now 'works' for nested VMs, albeit slow. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182559.3364829-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Apply EL2 event filtering when in hyp contextOliver Upton1-1/+20
It hopefully comes as no surprise when I say that vEL2 actually runs at EL1. So, the guest hypervisor's EL2 event filter (NSH) needs to actually be applied to EL1 in the perf event. In addition to this, the disable bit for the guest counter range (HPMD) needs to have the effect of stopping the affected counters. Do exactly that by stuffing ::exclude_kernel with the combined effect of these controls. This isn't quite enough yet, as the backing perf events need to be reprogrammed upon nested ERET/exception entry to remap the effective filter onto ::exclude_kernel. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-18-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Honor MDCR_EL2.HLPOliver Upton1-1/+5
Counters that fall in the hypervisor range (i.e. N >= HPMN) have a separate control for enabling 64 bit overflow. Take it into account. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-17-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Honor MDCR_EL2.HPMEOliver Upton1-2/+9
When the PMU is configured with split counter ranges, HPME becomes the enable bit for the counters reserved for EL2. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-16-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add helpers to determine if PMC counts at a given ELOliver Upton1-12/+28
Checking the exception level filters for a PMC is a minor annoyance to open code. Add helpers to check if an event counts at EL0 and EL1, which will prove useful in a subsequent change. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-15-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Adjust range of accessible PMCs according to HPMNOliver Upton3-7/+24
The value of MDCR_EL2.HPMN controls the number of event counters made visible to EL0 and EL1. This means it is possible for the guest hypervisor to allow direct access to event counters to the L2. Rework KVM's PMU register emulation to take the effects of HPMN into account when handling a trap. For bitmask-style registers, writes only affect accessible registers. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-14-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Rename kvm_pmu_valid_counter_mask()Oliver Upton3-12/+12
Nested PMU support requires dynamically changing the visible range of PMU counters based on the exception level and value of MDCR_EL2.HPMN. At the same time, the PMU emulation code needs to know the absolute number of implemented counters, regardless of context. Rename the existing helper to make it obvious that it returns the number of implemented counters and not anything else. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-13-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Advertise support for FEAT_HPMN0Oliver Upton1-2/+3
Everything is in place now for KVM to actually handle MDCR_EL2.HPMN. Not only that, the emulation is capable of doing FEAT_HPMN0. Advertise support for the feature in the VM's ID registers. It is possible to emulate FEAT_HPMN0 on hardware that doesn't support it since KVM currently traps all PMU registers. Having said that, let's only advertise the feature on supporting hardware in case KVM ever provides 'direct' PMU support to VMs w/o involving host perf. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-12-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMNOliver Upton3-64/+120
MDCR_EL2.HPMN splits the PMU event counters into two ranges: the first range is accessible from all ELs, and the second range is accessible only to EL2/3. Supposing the guest hypervisor allows direct access to the PMU counters from the L2, KVM needs to locally handle those accesses. Add a new complex trap configuration for HPMN that checks if the counter index is accessible to the current context. As written, the architecture suggests HPMN only causes PMEVCNTR<n>_EL0 to trap, though intuition (and the pseudocode) suggest that the trap applies to PMEVTYPER<n>_EL0 as well. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-11-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Honor MDCR_EL2.{TPM, TPMCR} in Host EL0Oliver Upton1-2/+4
TPM and TPMCR trap bits also affect Host EL0. How fun. Mark these two trap bits as such and take advantage of the new infrastructure for dealing w/ EL0 traps. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Reinject traps that take effect in Host EL0Oliver Upton2-4/+30
Wire up the other end of traps that affect host EL0 by actually injecting them into the guest hypervisor. Skip over FGT entirely, as a cursory glance suggests no FGT is effective in host EL0. Note that kvm_inject_nested() is already equipped for handling exceptions while the VM is already in a host context. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Rename BEHAVE_FORWARD_ANYOliver Upton1-46/+47
BEHAVE_FORWARD_ANY is slightly ambiguous, especially since we're about to cram some more information into the enum. Rephrase it. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Allow coarse-grained trap combos to use complex trapsOliver Upton1-1/+2
KVM uses a sanity-check to avoid infinite recursion in trap combinations that could potentially depend on itself. Narrow the scope of this sanity check to the exact CGT IDs that correspond w/ trap combos, opening the door to using 'complex' traps as part of a combination. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Describe RES0/RES1 bits of MDCR_EL2Oliver Upton2-1/+38
Add support for sanitising MDCR_EL2 and describe the RES0/RES1 bits according to the feature set exposed to the VM. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Add new definitions for ID_AA64DFR0_EL1Oliver Upton1-3/+12
Align the field definitions w/ DDI0601 2024-09 and opportunistically declare MTPMU as a signed field. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Migrate MDCR_EL2 definition to tableOliver Upton2-29/+35
Migrate MDCR_EL2 over to the sysreg table and align definitions with DDI0601 2024-09. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: sysreg: Describe ID_AA64DFR2_EL1 fieldsOliver Upton1-0/+26
Describe the new ID register in line with DDI0601 2024-09. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Initialize trap register values in hyp in pKVMFuad Tabba2-1/+38
Handle the initialization of trap registers at the hypervisor in pKVM, even for non-protected guests. The host is not trusted with the values of the trap registers, regardless of the VM type. Therefore, when switching between the host and the guests, only flush the HCR_EL2 TWI and TWE bits. The host is allowed to configure these for opportunistic scheduling, as neither affects the protection of VMs or the hypervisor. Reported-by: Will Deacon <will@kernel.org> Fixes: 814ad8f96e92 ("KVM: arm64: Drop trapping of PAuth instructions/keys") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Initialize the hypervisor's VM state at EL2Fuad Tabba1-0/+77
Do not trust the state of the VM as provided by the host when initializing the hypervisor's view of the VM sate. Initialize it instead at EL2 to a known good and safe state, as pKVM already does with hypervisor VCPU states. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-4-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp useFuad Tabba2-5/+4
Move kvm_vcpu_enable_ptrauth() to a shared header to be used by hypervisor code in protected mode. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-3-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Move pkvm_vcpu_init_traps() to init_pkvm_hyp_vcpu()Fuad Tabba5-20/+3
Move pkvm_vcpu_init_traps() to the initialization of the hypervisor's vcpu state in init_pkvm_hyp_vcpu(), and remove the associated hypercall. In protected mode, traps need to be initialized whenever a VCPU is initialized anyway, and not only for protected VMs. This also saves an unnecessary hypercall. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241018074833.2563674-2-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: selftests: Test ID_AA64PFR0.MPAM isn't completely ignoredJames Morse1-1/+98
The ID_AA64PFR0.MPAM bit was previously accidentally exposed to guests, and is ignored by KVM. KVM will always present the guest with 0 here, and trap the MPAM system registers to inject an undef. But, this value is still needed to prevent migration when the value is incompatible with the target hardware. Add a kvm unit test to try and write multiple values to ID_AA64PFR0.MPAM. Only the hardware value previously exposed should be ignored, all other values should be rejected. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-8-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Disable MPAM visibility by default and ignore VMM writesJames Morse1-3/+42
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests, but didn't add trap handling. A previous patch supplied the missing trap handling. Existing VMs that have the MPAM field of ID_AA64PFR0_EL1 set need to be migratable, but there is little point enabling the MPAM CPU interface on new VMs until there is something a guest can do with it. Clear the MPAM field from the guest's ID_AA64PFR0_EL1 and on hardware that supports MPAM, politely ignore the VMMs attempts to set this bit. Guests exposed to this bug have the sanitised value of the MPAM field, so only the correct value needs to be ignored. This means the field can continue to be used to block migration to incompatible hardware (between MPAM=1 and MPAM=5), and the VMM can't rely on the field being ignored. Signed-off-by: James Morse <james.morse@arm.com> Co-developed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-7-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add a macro for creating filtered sys_reg_descs entriesJames Morse1-28/+38
The sys_reg_descs array holds function pointers and reset value for managing the user-space and guest view of system registers. These are mostly created by a set of macro's as only some combinations of behaviour are needed. If a register needs special treatment, its sys_reg_descs entry is open-coded. This is true of some id registers where the value provided by user-space is validated by some helpers. Before adding another one of these, add a helper that covers the existing special cases. 'ID_FILTERED' expects helpers to set the user-space value, and retrieve the modified reset value. Like ID_WRITABLE() this uses id_visibility(), which should have no functional change for the registers converted to use ID_FILTERED(). read_sanitised_id_aa64dfr0_el1() and read_sanitised_id_aa64pfr0_el1() have been refactored to be called from kvm_read_sanitised_id_reg(), to try be consistent with ID_WRITABLE(). Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-6-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Fix missing traps of guest accesses to the MPAM registersJames Morse4-1/+47
commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to guests, but didn't add trap handling. If you are unlucky, this results in an MPAM aware guest being delivered an undef during boot. The host prints: | kvm [97]: Unsupported guest sys_reg access at: ffff800080024c64 [00000005] | { Op0( 3), Op1( 0), CRn(10), CRm( 5), Op2( 0), func_read }, Which results in: | Internal error: Oops - Undefined instruction: 0000000002000000 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc7-00559-gd89c186d50b2 #14616 | Hardware name: linux,dummy-virt (DT) | pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : test_has_mpam+0x18/0x30 | lr : test_has_mpam+0x10/0x30 | sp : ffff80008000bd90 ... | Call trace: | test_has_mpam+0x18/0x30 | update_cpu_capabilities+0x7c/0x11c | setup_cpu_features+0x14/0xd8 | smp_cpus_done+0x24/0xb8 | smp_init+0x7c/0x8c | kernel_init_freeable+0xf8/0x280 | kernel_init+0x24/0x1e0 | ret_from_fork+0x10/0x20 | Code: 910003fd 97ffffde 72001c00 54000080 (d538a500) | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b | ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- Add the support to enable the traps, and handle the three guest accessible registers by injecting an UNDEF. This stops KVM from spamming the host log, but doesn't yet hide the feature from the id registers. With MPAM v1.0 we can trap the MPAMIDR_EL1 register only if ARM64_HAS_MPAM_HCR, with v1.1 an additional MPAM2_EL2.TIDR bit traps MPAMIDR_EL1 on platforms that don't have MPAMHCR_EL2. Enable one of these if either is supported. If neither is supported, the guest can discover that the CPU has MPAM support, and how many PARTID etc the host has ... but it can't influence anything, so its harmless. Fixes: 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") CC: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/20200925160102.118858-1-james.morse@arm.com/ Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-5-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: cpufeature: discover CPU support for MPAMJames Morse7-0/+126
ARMv8.4 adds support for 'Memory Partitioning And Monitoring' (MPAM) which describes an interface to cache and bandwidth controls wherever they appear in the system. Add support to detect MPAM. Like SVE, MPAM has an extra id register that describes some more properties, including the virtualisation support, which is optional. Detect this separately so we can detect mismatched/insane systems, but still use MPAM on the host even if the virtualisation support is missing. MPAM needs enabling at the highest implemented exception level, otherwise the register accesses trap. The 'enabled' flag is accessible to lower exception levels, but its in a register that traps when MPAM isn't enabled. The cpufeature 'matches' hook is extended to test this on one of the CPUs, so that firmware can emulate MPAM as disabled if it is reserved for use by secure world. Secondary CPUs that appear late could trip cpufeature's 'lower safe' behaviour after the MPAM properties have been advertised to user-space. Add a verify call to ensure late secondaries match the existing CPUs. (If you have a boot failure that bisects here its likely your CPUs advertise MPAM in the id registers, but firmware failed to either enable or MPAM, or emulate the trap as if it were disabled) Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-4-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: head.S: Initialise MPAM EL2 registers and disable trapsJames Morse1-0/+14
Add code to head.S's el2_setup to detect MPAM and disable any EL2 traps. This register resets to an unknown value, setting it to the default parititons/pmg before we enable the MMU is the best thing to do. Kexec/kdump will depend on this if the previous kernel left the CPU configured with a restrictive configuration. If linux is booted at the highest implemented exception level el2_setup will clear the enable bit, disabling MPAM. This code can't be enabled until a subsequent patch adds the Kconfig and cpufeature boiler plate. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-3-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64/sysreg: Convert existing MPAM sysregs and add the remaining entriesJames Morse2-12/+161
Move the existing MPAM system register defines from sysreg.h to tools/sysreg and add the remaining system registers. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241030160317.2528209-2-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernateDavid Woodhouse2-1/+49
The PSCI v1.3 specification adds support for a SYSTEM_OFF2 function which is analogous to ACPI S4 state. This will allow hosting environments to determine that a guest is hibernated rather than just powered off, and handle that state appropriately on subsequent launches. Since commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and poweroff") the EFI shutdown method is deliberately preferred over PSCI or other methods. So register a SYS_OFF_MODE_POWER_OFF handler which *only* handles the hibernation, leaving the original PSCI SYSTEM_OFF as a last resort via the legacy pm_power_off function pointer. The hibernation code already exports a system_entering_hibernation() function which is be used by the higher-priority handler to check for hibernation. That existing function just returns the value of a static boolean variable from hibernate.c, which was previously only set in the hibernation_platform_enter() code path. Set the same flag in the simpler code path around the call to kernel_power_off() too. An alternative way to hook SYSTEM_OFF2 into the hibernation code would be to register a platform_hibernation_ops structure with an ->enter() method which makes the new SYSTEM_OFF2 call. But that would have the unwanted side-effect of making hibernation take a completely different code path in hibernation_platform_enter(), invoking a lot of special dpm callbacks. Another option might be to add a new SYS_OFF_MODE_HIBERNATE mode, with fallback to SYS_OFF_MODE_POWER_OFF. Or to use the sys_off_data to indicate whether the power off is for hibernation. But this version works and is relatively simple. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20241019172459.2241939-7-dwmw2@infradead.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Handle WXN attributeMarc Zyngier1-0/+45
Until now, we didn't really care about WXN as it didn't have an effect on the R/W permissions (only the execution could be droppped), and therefore not of interest for AT. However, with S1POE, WXN can revoke the Write permission if an overlay is active and that execution is allowed. This *is* relevant to AT. Add full handling of WXN so that we correctly handle this case. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-38-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Handle stage-1 permission overlaysMarc Zyngier1-0/+53
We now have the intrastructure in place to emulate S1POE: - direct permissions are always overlay-capable - indirect permissions are overlay-capable if the permissions are in the 0b0xxx range - the overlays are strictly substractive Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-37-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Make PAN conditions part of the S1 walk contextMarc Zyngier1-10/+11
Move the conditions describing PAN as part of the s1_walk_info structure, in an effort to declutter the permission processing. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-36-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Disable hierarchical permissions when POE is enabledMarc Zyngier1-0/+36
The hierarchical permissions must be disabled when POE is enabled in the translation regime used for a given table walk. We store the two enable bits in the s1_walk_info structure so that they can be retrieved down the line, as they will be useful. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-35-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add POE save/restore for AT emulation fast-pathMarc Zyngier1-0/+14
Just like the other extensions affecting address translation, we must save/restore POE so that an out-of-context translation context can be restored and used with the AT instructions. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-34-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add save/restore support for POR_EL2Marc Zyngier1-0/+6
POR_EL2 needs saving when the guest is VHE, and restoring in any case. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-33-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add basic support for POR_EL2Marc Zyngier2-0/+12
S1POE support implies support for POR_EL2, which we provide by - adding it to the vcpu_sysreg enum - advertising it as mapped to its EL1 counterpart in get_el2_to_el1_mapping - wiring it in the sys_reg_desc table with the correct visibility - handling POR_EL1 in __vcpu_{read,write}_sys_reg_from_cpu() Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-32-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add kvm_has_s1poe() helperMarc Zyngier4-5/+8
Just like we have kvm_has_s1pie(), add its S1POE counterpart, making the code slightly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-31-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Subject S1PIE/S1POE registers to HCR_EL2.{TVM,TRVM}Marc Zyngier1-0/+4
All the El0/EL1 S1PIE/S1POE system register are caught by the HCR_EL2 TVM and TRVM bits. Reflect this in the coarse grained trap table. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-30-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Drop bogus CPTR_EL2.E0POE trap routingMarc Zyngier1-8/+0
It took me some time to realise it, but CPTR_EL2.E0POE does not apply to a guest, only to EL0 when InHost(). And when InHost(), CPCR_EL2 is mapped to CPACR_EL1, maning that the E0POE bit naturally takes effect without any trap. To sum it up, this trap bit is better left ignored, we will never have to hanedle it. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-29-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>