Age | Commit message (Collapse) | Author | Files | Lines |
|
Aishwarya reports that KVM selftests for arm64 fail with the following
error:
| make[4]: Entering directory '/tmp/kci/linux/tools/testing/selftests/kvm'
| Makefile:270: warning: overriding recipe for target
| '/tmp/kci/linux/build/kselftest/kvm/get-reg-list'
| Makefile:265: warning: ignoring old recipe for target
| '/tmp/kci/linux/build/kselftest/kvm/get-reg-list'
| make -C ../../../../tools/arch/arm64/tools/
| make[5]: Entering directory '/tmp/kci/linux/tools/arch/arm64/tools'
| Makefile:10: ../tools/scripts/Makefile.include: No such file or directory
| make[5]: *** No rule to make target '../tools/scripts/Makefile.include'.
| Stop.
It would appear that this only affects builds from the top-level
Makefile (e.g. make kselftest-all), as $(srctree) is set to ".". Work
around the issue by shadowing the kselftest naming scheme for the source
tree variable.
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Fixes: 0359c946b131 ("tools headers arm64: Update sysreg.h with kernel sources")
Link: https://lore.kernel.org/r/20231027005439.3142015-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
It is a pretty well known fact that KVM does not support MMIO emulation
without valid instruction syndrome information (ESR_EL2.ISV == 0). The
current kvm_pr_unimpl() is pretty useless, as it contains zero context
to relate the event to a vCPU.
Replace it with a precise tracepoint that dumps the relevant context
so the user can make sense of what the guest is doing.
Acked-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231026205306.3045075-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
It looks like a mistake to issue ISB *after* reading PAR_EL1, we should
instead perform it between the AT instruction and the reads of PAR_EL1.
As according to DDI0487J.a IJTYVP,
"When an address translation instruction is executed, explicit
synchronization is required to guarantee the result is visible to
subsequent direct reads of PAR_EL1."
Otherwise all guest_at testcases fail on my box with
==== Test Assertion Failure ====
aarch64/page_fault_test.c:142: par & 1 == 0
pid=1355864 tid=1355864 errno=4 - Interrupted system call
1 0x0000000000402853: vcpu_run_loop at page_fault_test.c:681
2 0x0000000000402cdb: run_test at page_fault_test.c:730
3 0x0000000000403897: for_each_guest_mode at guest_modes.c:100
4 0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1105
5 (inlined by) main at page_fault_test.c:1131
6 0x0000ffffb153c03b: ?? ??:0
7 0x0000ffffb153c113: ?? ??:0
8 0x0000000000401aaf: _start at ??:?
0x1 != 0x0 (par & 1 != 0)
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231007124043.626-2-yuzenghui@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Running page_fault_test on a Cortex A72 fails with
Test: ro_memslot_no_syndrome_guest_cas
Testing guest mode: PA-bits:40, VA-bits:48, 4K pages
Testing memory backing src type: anonymous
==== Test Assertion Failure ====
aarch64/page_fault_test.c:117: guest_check_lse()
pid=1944087 tid=1944087 errno=4 - Interrupted system call
1 0x00000000004028b3: vcpu_run_loop at page_fault_test.c:682
2 0x0000000000402d93: run_test at page_fault_test.c:731
3 0x0000000000403957: for_each_guest_mode at guest_modes.c:100
4 0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1108
5 (inlined by) main at page_fault_test.c:1134
6 0x0000ffff868e503b: ?? ??:0
7 0x0000ffff868e5113: ?? ??:0
8 0x0000000000401aaf: _start at ??:?
guest_check_lse()
because we don't have a guest_prepare stage to check the presence of
FEAT_LSE and skip the related guest_cas testing, and we end-up failing in
GUEST_ASSERT(guest_check_lse()).
Add the missing .guest_prepare() where it's indeed required.
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231007124043.626-1-yuzenghui@huawei.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
It is possible for multiple vCPUs to fault on the same IPA and attempt
to resolve the fault. One of the page table walks will actually update
the PTE and the rest will return -EAGAIN per our race detection scheme.
KVM elides the TLB invalidation on the racing threads as the return
value is nonzero.
Before commit a12ab1378a88 ("KVM: arm64: Use local TLBI on permission
relaxation") KVM always used broadcast TLB invalidations when handling
permission faults, which had the convenient property of making the
stage-2 updates visible to all CPUs in the system. However now we do a
local invalidation, and TLBI elision leads to the vCPU thread faulting
again on the stale entry. Remember that the architecture permits the TLB
to cache translations that precipitate a permission fault.
Invalidate the TLB entry responsible for the permission fault if the
stage-2 descriptor has been relaxed, regardless of which thread actually
did the job.
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230922223229.1608155-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
When trapping accesses from a NV guest that tries to access
SPSR_{irq,abt,und,fiq}, make sure we handle them as RAZ/WI,
as if AArch32 wasn't implemented.
This involves a bit of repainting to make the visibility
handler more generic.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
DBGVCR32_EL2, DACR32_EL2, IFSR32_EL2 and FPEXC32_EL2 are required to
UNDEF when AArch32 isn't implemented, which is definitely the case when
running NV.
Given that this is the only case where these registers can trap,
unconditionally inject an UNDEF exception.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20231023095444.1587322-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Implement a fine grained approach in the _EL2 sysreg range instead of
the current wide cast trap. This ensures that we don't mistakenly
inject the wrong exception into the guest.
[maz: commit message massaging, dropped secure and AArch32 registers
from the list]
Fixes: d0fc0a2519a6 ("KVM: arm64: nv: Add trap forwarding for HCR_EL2")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Some _EL2 encodings are missing. Add them.
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[maz: dropped secure encodings]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Some _EL12 encodings are missing. Add them.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Suzuki noticed that KVM's PMU emulation is oblivious to the NSU and NSK
event filter bits. On systems that have EL3 these bits modify the
filter behavior in non-secure EL0 and EL1, respectively. Even though the
kernel doesn't use these bits, it is entirely possible some other guest
OS does. Additionally, it would appear that these and the M bit are
required by the architecture if EL3 is implemented.
Allow the EL3 event filter bits to be set if EL3 is advertised in the
guest's ID register. Implement the behavior of NSU and NSK according to
the pseudocode, and entirely ignore the M bit for perf event creation.
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20231019185618.3442949-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The NSH bit, which filters event counting at EL2, is required by the
architecture if an implementation has EL2. Even though KVM doesn't
support nested virt yet, it makes no effort to hide the existence of EL2
from the ID registers. Userspace can, however, change the value of PFR0
to hide EL2. Align KVM's sysreg emulation with the architecture and make
NSH RES0 if EL2 isn't advertised. Keep in mind the bit is ignored when
constructing the backing perf event.
While at it, build the event type mask using explicit field definitions
instead of relying on ARMV8_PMU_EVTYPE_MASK. KVM probably should've been
doing this in the first place, as it avoids changes to the
aforementioned mask affecting sysreg emulation.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20231019185618.3442949-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
We currently have a global VTCR_EL2 value for each guest, even
if the guest uses NV. This implies that the guest's own S2 must
fit in the host's. This is odd, for multiple reasons:
- the PARange values and the number of IPA bits don't necessarily
match: you can have 33 bits of IPA space, and yet you can only
describe 32 or 36 bits of PARange
- When userspace set the IPA space, it creates a contract with the
kernel saying "this is the IPA space I'm prepared to handle".
At no point does it constraint the guest's own IPA space as
long as the guest doesn't try to use a [I]PA outside of the
IPA space set by userspace
- We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange.
And then there is the consequence of the above: if a guest tries
to create a S2 that has for input address something that is larger
than the IPA space defined by the host, we inject a fatal exception.
This is no good. For all intent and purposes, a guest should be
able to have the S2 it really wants, as long as the *output* address
of that S2 isn't outside of the IPA space.
For that, we need to have a per-s2_mmu VTCR_EL2 setting, which
allows us to represent the full PARange. Move the vctr field into
the s2_mmu structure, which has no impact whatsoever, except for NV.
Note that once we are able to override ID_AA64MMFR0_EL1.PARange
from userspace, we'll also be able to restrict the size of the
shadow S2 that NV uses.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
To date the VHE code has aggressively reloaded the stage-2 MMU context
on every guest entry, despite the fact that this isn't necessary. This
was probably done for consistency with the nVHE code, which needs to
switch in/out the stage-2 MMU context as both the host and guest run at
EL1.
Hoist __load_stage2() into kvm_vcpu_load_vhe(), thus avoiding a reload
on every guest entry/exit. This is likely to be beneficial to systems
with one of the speculative AT errata, as there is now one fewer context
synchronization event on the guest entry path. Additionally, it is
possible that implementations have hitched correctness mitigations on
writes to VTTBR_EL2, which are now elided on guest re-entry.
Note that __tlb_switch_to_guest() is deliberately left untouched as it
can be called outside the context of a running vCPU.
Link: https://lore.kernel.org/r/20231018233212.2888027-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The names for the helpers we expose to the 'generic' KVM code are a bit
imprecise; we switch the EL0 + EL1 sysreg context and setup trap
controls that do not need to change for every guest entry/exit. Rename +
shuffle things around a bit in preparation for loading the stage-2 MMU
context on vcpu_load().
Link: https://lore.kernel.org/r/20231018233212.2888027-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Naturally, a change to the VMID for an MMU implies a new value for
VTTBR. Reload on VMID change in anticipation of loading stage-2 on
vcpu_load() instead of every guest entry.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231018233212.2888027-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
An MMU notifier could cause us to clobber the stage-2 context loaded on
a CPU when we switch to another VM's context to invalidate. This isn't
an issue right now as the stage-2 context gets reloaded on every guest
entry, but is disastrous when moving __load_stage2() into the
vcpu_load() path.
Restore the previous stage-2 context on the way out of a TLB
invalidation if we installed something else. Deliberately do this after
TGE=1 is synchronized to keep things safe in light of the speculative AT
errata.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231018233212.2888027-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
HCR_EL2.TGE=0 is sufficient to disable stage-2 translation, so there's
no need to explicitly zero VTTBR_EL2.
Link: https://lore.kernel.org/r/20231018233212.2888027-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Add tests to verify setting ID registers from userspace is handled
correctly by KVM. Also add a test case to use ioctl
KVM_ARM_GET_REG_WRITABLE_MASKS to get writable masks.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The users of sysreg.h (perf, KVM selftests) are now generating the
necessary sysreg-defs.h; sync sysreg.h with the kernel sources and
fix the KVM selftests that use macros which suffered a rename.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Start generating sysreg-defs.h for arm64 builds in anticipation of
updating sysreg.h to a version that depends on it.
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Start generating sysreg-defs.h in anticipation of updating sysreg.h to a
version that needs the generated output.
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Use a common Makefile for generating sysreg-defs.h, which will soon be
needed by perf and KVM selftests. The naming scheme of the generated
macros is not expected to change, so just refer to the canonical
script/data in the kernel source rather than copying to tools.
Co-developed-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231011195740.3349631-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Expose the Armv8.8 FEAT_MOPS feature to guests in the ID register and
allow the MOPS instructions to be run in a guest. Only expose MOPS if
the whole system supports it.
Note, it is expected that guests do not use these instructions on MMIO,
similarly to other instructions where ESR_EL2.ISV==0 such as LDP/STP.
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230922112508.1774352-3-kristina.martsenko@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception
if executed on a CPU with a different MOPS implementation option (A or
B) than the CPU where the preceding prologue instruction ran. In this
case the OS exception handler is expected to reset the registers and
restart execution from the prologue instruction.
A KVM guest may use the instructions at EL1 at times when the guest is
not able to handle the exception, expecting that the instructions will
only run on one CPU (e.g. when running UEFI boot services in the guest).
As KVM may reschedule the guest between different types of CPUs at any
time (on an asymmetric system), it needs to also handle the resulting
exception itself in case the guest is not able to. A similar situation
will also occur in the future when live migrating a guest from one type
of CPU to another.
Add handling for the MOPS exception to KVM. The handling can be shared
with the EL0 exception handler, as the logic and register layouts are
the same. The exception can be handled right after exiting a guest,
which avoids the cost of returning to the host exit handler.
Similarly to the EL0 exception handler, in case the main or epilogue
instruction is being single stepped, it makes sense to finish the step
before executing the prologue instruction, so advance the single step
state machine.
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230922112508.1774352-2-kristina.martsenko@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The smccc_filter maple tree is only populated if userspace attempted to
configure it. Use the state of the maple tree to determine if the filter
has been configured, eliminating the VM flag.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231004234947.207507-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The reserved ranges are only useful for preventing userspace from
adding a rule that intersects with functions we must handle in KVM. If
userspace never writes to the SMCCC filter than this is all just wasted
work/memory.
Insert reserved ranges on the first call to KVM_ARM_VM_SMCCC_FILTER.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231004234947.207507-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Eventually we can drop the VM flag, move around the existing
implementation for now.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231004234947.207507-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
KVM/arm64 has a couple schemes for handling vCPU feature selection now,
which is a lot to put on userspace. Add some documentation about how
these interact and provide some recommendations for how to use the
writable ID register scheme.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-11-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
All known fields in ID_AA64ZFR0_EL1 describe the unprivileged
instructions supported by the PE's SVE implementation. Allow userspace
to pick and choose the advertised feature set, though nothing stops the
guest from using undisclosed instructions.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-10-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Allow userspace to change the guest-visible value of the register with
some severe limitation:
- No changes to features not virtualized by KVM (AMU, MPAM, RAS)
- Short of full GICv2 emulation in kernel, hiding GICv3 from the guest
makes absolutely no sense.
- FP is effectively assumed for KVM VMs.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
[oliver: restrict features that are illogical to change]
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-9-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Allow userspace to modify the guest-visible values of these ID
registers. Prevent changes to any of the virtualization features until
KVM picks up support for nested and we have a handle on managing NV
features.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
[oliver: prevent changes to EL2 features for now]
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-8-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Almost all of the features described by the ISA registers have no KVM
involvement. Allow userspace to change the value of these registers with
a couple exceptions:
- MOPS is not writable as KVM does not currently virtualize FEAT_MOPS.
- The PAuth fields are not writable as KVM requires both address and
generic authentication be enabled.
Co-developed-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Since ID_AA64DFR0_EL1 and ID_DFR0_EL1 are now writable from userspace,
it is safe to bump up the default KVM sanitised debug version to v8p8.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The debug architecture is mandatory in ARMv8, so KVM should not allow
userspace to configure a vCPU with less than that. Of course, this isn't
handled elegantly by the generic ID register plumbing, as the respective
ID register fields have a nonzero starting value.
Add an explicit check for debug versions less than v8 of the
architecture.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Much like we do for other fields, extract the Debug architecture version
from the ID register to populate the corresponding field in DBGDIDR.
Rewrite the existing sysreg field extractors to use SYS_FIELD_GET() for
consistency.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Since KVM now supports per-VM ID registers, use per-VM ID register
values for the sake of emulation for DBGDIDR and LORegion.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Add some basic documentation on how to get feature ID register writable
masks from userspace.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
While the Feature ID range is well defined and pretty large, it isn't
inconceivable that the architecture will eventually grow some other
ranges that will need to similarly be described to userspace.
Add a VM ioctl to allow userspace to get writable masks for feature ID
registers in below system register space:
op0 = 3, op1 = {0, 1, 3}, CRn = 0, CRm = {0 - 7}, op2 = {0 - 7}
This is used to support mix-and-match userspace and kernels for writable
ID registers, where userspace may want to know upfront whether it can
actually tweak the contents of an idreg or not.
Add a new capability (KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES) that
returns a bitmap of the valid ranges, which can subsequently be
retrieved, one at a time by setting the index of the set bit as the
range identifier.
Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231003230408.3405722-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
It goes without saying, but it is probably better to spell it out:
If userspace tries to restore and VM, but creates vcpus and/or RDs
in a different order, the vcpu/RD mapping will be different. Yes,
our API is an ugly piece of crap and I can't believe that we missed
this.
If we want to relax the above, we'll need to define a new userspace
API that allows the mapping to be specified, rather than relying
on the kernel to perform the mapping on its own.
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-12-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Our affinity-based SGI injection code is a bit daft. We iterate
over all the CPUs trying to match the set of affinities that the
guest is trying to reach, leading to some very bad behaviours
if the selected targets are at a high vcpu index.
Instead, we can now use the fact that we have an optimised
MPIDR to vcpu mapping, and only look at the relevant values.
This results in a much faster injection for large VMs, and
in a near constant time, irrespective of the position in the
vcpu index space.
As a bonus, this is mostly deleting a lot of hard-to-read
code. Nobody will complain about that.
Suggested-by: Xu Zhao <zhaoxu.35@bytedance.com>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-11-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
If our fancy little table is present when calling kvm_mpidr_to_vcpu(),
use it to recover the corresponding vcpu.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The MPIDR_EL1 register contains a unique value that identifies
the CPU. The only problem with it is that it is stupidly large
(32 bits, once the useless stuff is removed).
Trying to obtain a vcpu from an MPIDR value is a fairly common,
yet costly operation: we iterate over all the vcpus until we
find the correct one. While this is cheap for small VMs, it is
pretty expensive on large ones, specially if you are trying to
get to the one that's at the end of the list...
In order to help with this, it is important to realise that
the MPIDR values are actually structured, and that implementations
tend to use a small number of significant bits in the 32bit space.
We can use this fact to our advantage by computing a small hash
table that uses the "compression" of the significant MPIDR bits
as an index, giving us the vcpu index as a result.
Given that the MPIDR values can be supplied by userspace, and
that an evil VMM could decide to make *all* bits significant,
resulting in a 4G-entry table, we only use this method if the
resulting table fits in a single page. Otherwise, we fallback
to the good old iterative method.
Nothing uses that table just yet, but keep your eyes peeled.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-9-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
By definition, MPIDR_EL1 cannot be modified by the guest. This
means it is pointless to check whether this is loaded on the CPU.
Simplify the kvm_vcpu_get_mpidr_aff() helper to directly access
the in-memory value.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-8-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
While vcpu_id isn't necessarily a bad choice as an identifier for
the currently running vcpu, it is provided by userspace, and there
is close to no guarantee that it would be unique.
Switch it to vcpu_idx instead, for which we have much stronger
guarantees.
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
When dumping the debug information, use vcpu_idx instead of vcpu_id,
as this is independent of any userspace influence.
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
When parsing a GICv2 attribute that contains a cpuid, handle this
as the vcpu_id, not a vcpu_idx, as userspace cannot really know
the mapping between the two. For this, use kvm_get_vcpu_by_id()
instead of kvm_get_vcpu().
Take this opportunity to get rid of the pointless check against
online_vcpus, which doesn't make much sense either, and switch
to FIELD_GET as a way to extract the vcpu_id.
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
As we're about to change the way SGIs are sent, start by splitting
out some of the basic functionnality: instead of intermingling
the broadcast and non-broadcast cases with the actual SGI generation,
perform the following cleanups:
- move the SGI queuing into its own helper
- split the broadcast code from the affinity-driven code
- replace the mask/shift combinations with FIELD_GET()
- fix the confusion between vcpu_id and vcpu when handling
the broadcast case
The result is much more readable, and paves the way for further
optimisations.
Tested-by: Joey Gouly <joey.gouly@arm.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Since our emulated ITS advertises GITS_TYPER.PTA=0, the target
address associated to a collection is a PE number and not
an address. So far, so good. However, the PE number is what userspace
has provided given us (aka the vcpu_id), and not the internal vcpu
index.
Make sure we consistently retrieve the vcpu by ID rather than
by index, adding a helper that deals with most of the cases.
We also get rid of the pointless (and bogus) comparisons to
online_vcpus, which don't really make sense.
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Passing a vcpu_id to kvm_vgic_inject_irq() is silly for two reasons:
- we often confuse vcpu_id and vcpu_idx
- we eventually have to convert it back to a vcpu
- we can't count
Instead, pass a vcpu pointer, which is unambiguous. A NULL vcpu
is also allowed for interrupts that are not private to a vcpu
(such as SPIs).
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230927090911.3355209-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|