aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm/include/asm/kvm_host.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-06-12arm64: KVM: Switch vgic save/restore to alternative_insnMarc Zyngier1-5/+0
So far, we configured the world-switch by having a small array of pointers to the save and restore functions, depending on the GIC used on the platform. Loading these values each time is a bit silly (they never change), and it makes sense to rely on the instruction patching instead. This leads to a nice cleanup of the code. Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-03-12arm/arm64: KVM: Implement Stage-2 page agingMarc Zyngier1-11/+2
Until now, KVM/arm didn't care much for page aging (who was swapping anyway?), and simply provided empty hooks to the core KVM code. With server-type systems now being available, things are quite different. This patch implements very simple support for page aging, by clearing the Access flag in the Stage-2 page tables. On access fault, the current fault handling will write the PTE or PMD again, putting the Access flag back on. It should be possible to implement a much faster handling for Access faults, but that's left for a later patch. With this in place, performance in VMs is degraded much more gracefully. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-03-12KVM: arm/arm64: implement kvm_arch_intc_initializedEric Auger1-0/+2
On arm/arm64 the VGIC is dynamically instantiated and it is useful to expose its state, especially for irqfd setup. This patch defines __KVM_HAVE_ARCH_INTC_INITIALIZED and implements kvm_arch_intc_initialized. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+6
Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...
2015-02-06kvm: add halt_poll_ns module parameterPaolo Bonzini1-0/+1
This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29arm/arm64: KVM: Use set/way op trapping to track the state of the cachesMarc Zyngier1-3/+0
Trying to emulate the behaviour of set/way cache ops is fairly pointless, as there are too many ways we can end-up missing stuff. Also, there is some system caches out there that simply ignore set/way operations. So instead of trying to implement them, let's convert it to VA ops, and use them as a way to re-enable the trapping of VM ops. That way, we can detect the point when the MMU/caches are turned off, and do a full VM flush (which is what the guest was trying to do anyway). This allows a 32bit zImage to boot on the APM thingy, and will probably help bootloaders in general. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20arm/arm64: KVM: make the maximum number of vCPUs a per-VM valueAndre Przywara1-0/+1
Currently the maximum number of vCPUs supported is a global value limited by the used GIC model. GICv3 will lift this limit, but we still need to observe it for guests using GICv2. So the maximum number of vCPUs is per-VM value, depending on the GIC model the guest uses. Store and check the value in struct kvm_arch, but keep it down to 8 for now. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20arm/arm64: KVM: rework MPIDR assignment and add accessorsAndre Przywara1-0/+2
The virtual MPIDR registers (containing topology information) for the guest are currently mapped linearily to the vcpu_id. Improve this mapping for arm64 by using three levels to not artificially limit the number of vCPUs. To help this, change and rename the kvm_vcpu_get_mpidr() function to mask off the non-affinity bits in the MPIDR register. Also add an accessor to later allow easier access to a vCPU with a given MPIDR. Use this new accessor in the PSCI emulation. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-16KVM: arm/arm64: Enable Dirty Page logging for ARMv8Mario Smarduch1-12/+0
This patch enables ARMv8 ditry page logging support. Plugs ARMv8 into generic layer through Kconfig symbol, and drops earlier ARM64 constraints to enable logging at architecture layer. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
2015-01-16KVM: arm: Add initial dirty page locking supportMario Smarduch1-0/+2
Add support for initial write protection of VM memslots. This patch series assumes that huge PUDs will not be used in 2nd stage tables, which is always valid on ARMv7 Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
2015-01-16KVM: arm: Add ARMv7 API to flush TLBsMario Smarduch1-0/+12
This patch adds ARMv7 architecture TLB Flush function. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
2014-12-13arm/arm64: KVM: Clarify KVM_ARM_VCPU_INIT ABIChristoffer Dall1-2/+0
It is not clear that this ioctl can be called multiple times for a given vcpu. Userspace already does this, so clarify the ABI. Also specify that userspace is expected to always make secondary and subsequent calls to the ioctl with the same parameters for the VCPU as the initial call (which userspace also already does). Add code to check that userspace doesn't violate that ABI in the future, and move the kvm_vcpu_set_target() function which is currently duplicated between the 32-bit and 64-bit versions in guest.c to a common static function in arm.c, shared between both architectures. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-09-27Merge tag 'kvm-arm-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-nextPaolo Bonzini1-1/+1
Changes for KVM for arm/arm64 for 3.18 This includes a bunch of changes: - Support read-only memory slots on arm/arm64 - Various changes to fix Sparse warnings - Correctly detect write vs. read Stage-2 faults - Various VGIC cleanups and fixes - Dynamic VGIC data strcuture sizing - Fix SGI set_clear_pend offset bug - Fix VTTBR_BADDR Mask - Correctly report the FSC on Stage-2 faults Conflicts: virt/kvm/eventfd.c [duplicate, different patch where the kvm-arm version broke x86. The kvm tree instead has the right one]
2014-09-24kvm: Add arch specific mmu notifier for page invalidationTang Chen1-0/+5
This will be used to let the guest run while the APIC access page is not pinned. Because subsequent patches will fill in the function for x86, place the (still empty) x86 implementation in the x86.c file instead of adding an inline function in kvm_host.h. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24kvm: Fix page ageing bugsAndres Lagar-Cavilla1-1/+2
1. We were calling clear_flush_young_notify in unmap_one, but we are within an mmu notifier invalidate range scope. The spte exists no more (due to range_start) and the accessed bit info has already been propagated (due to kvm_pfn_set_accessed). Simply call clear_flush_young. 2. We clear_flush_young on a primary MMU PMD, but this may be mapped as a collection of PTEs by the secondary MMU (e.g. during log-dirty). This required expanding the interface of the clear_flush_young mmu notifier, so a lot of code has been trivially touched. 3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate the access bit by blowing the spte. This requires proper synchronizing with MMU notifier consumers, like every other removal of spte's does. Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-18Merge remote-tracking branch 'kvm/next' into queueChristoffer Dall1-5/+8
Conflicts: arch/arm64/include/asm/kvm_host.h virt/kvm/arm/vgic.c
2014-08-29KVM: remove garbage arg to *hardware_{en,dis}ableRadim Krčmář1-1/+1
In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: static inline empty kvm_arch functionsRadim Krčmář1-0/+6
Using static inline is going to save few bytes and cycles. For example on powerpc, the difference is 700 B after stripping. (5 kB before) This patch also deals with two overlooked empty functions: kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c 2df72e9bc KVM: split kvm_arch_flush_shadow and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c. e790d9ef6 KVM: add kvm_arch_sched_in Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29KVM: forward declare structs in kvm_types.hPaolo Bonzini1-5/+2
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid "'struct foo' declared inside parameter list" warnings (and consequent breakage due to conflicting types). Move them from individual files to a generic place in linux/kvm_types.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-27KVM: ARM/arm64: fix non-const declaration of function returning constWill Deacon1-1/+1
Sparse kicks up about a type mismatch for kvm_target_cpu: arch/arm64/kvm/guest.c:271:25: error: symbol 'kvm_target_cpu' redeclared with different type (originally declared at ./arch/arm64/include/asm/kvm_host.h:45) - different modifiers so fix this by adding the missing const attribute to the function declaration. Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-07-11arm64: KVM: split GICv2 world switch from hyp codeMarc Zyngier1-0/+5
Move the GICv2 world switch code into its own file, and add the necessary indirection to the arm64 switch code. Also introduce a new type field to the vgic_params structure. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-07-11arm64: KVM: allow export and import of generic timer regsAlex Bennée1-3/+0
For correct guest suspend/resume behaviour we need to ensure we include the generic timer registers for 64 bit guests. As CONFIG_KVM_ARM_TIMER is always set for arm64 we don't need to worry about null implementations. However I have re-jigged the kvm_arm_timer_set/get_reg declarations to be in the common include/kvm/arm_arch_timer.h headers. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-04-30ARM/ARM64: KVM: Add base for PSCI v0.2 emulationAnup Patel1-1/+1
Currently, the in-kernel PSCI emulation provides PSCI v0.1 interface to VCPUs. This patch extends current in-kernel PSCI emulation to provide PSCI v0.2 interface to VCPUs. By default, ARM/ARM64 KVM will always provide PSCI v0.1 interface for keeping the ABI backward-compatible. To select PSCI v0.2 interface for VCPUs, the user space (i.e. QEMU or KVMTOOL) will have to set KVM_ARM_VCPU_PSCI_0_2 feature when doing VCPU init using KVM_ARM_VCPU_INIT ioctl. Signed-off-by: Anup Patel <anup.patel@linaro.org> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-03-03ARM: KVM: introduce per-vcpu HYP Configuration RegisterMarc Zyngier1-3/+6
So far, KVM/ARM used a fixed HCR configuration per guest, except for the VI/VF/VA bits to control the interrupt in absence of VGIC. With the upcoming need to dynamically reconfigure trapping, it becomes necessary to allow the HCR to be changed on a per-vcpu basis. The fix here is to mimic what KVM/arm64 already does: a per vcpu HCR field, initialized at setup time. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-21ARM/KVM: save and restore generic timer registersAndre Przywara1-0/+3
For migration to work we need to save (and later restore) the state of each core's virtual generic timer. Since this is per VCPU, we can use the [gs]et_one_reg ioctl and export the three needed registers (control, counter, compare value). Though they live in cp15 space, we don't use the existing list, since they need special accessor functions and the arch timer is optional. Acked-by: Marc Zynger <marc.zyngier@arm.com> Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-10-16Merge tag 'kvm-arm-for-3.13-1' of git://git.linaro.org/people/cdall/linux-kvm-arm into nextGleb Natapov1-0/+1
Updates for KVM/ARM including cpu=host and Cortex-A7 support
2013-10-14KVM: ARM: Get rid of KVM_HPAGE definesChristoffer Dall1-5/+0
The KVM_HPAGE_DEFINES are a little artificial on ARM, since the huge page size is statically defined at compile time and there is only a single huge page size. Now when the main kvm code relying on these defines has been moved to the x86 specific part of the world, we can get rid of these. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-02ARM: KVM: Implement kvm_vcpu_preferred_target() functionAnup Patel1-0/+1
This patch implements kvm_vcpu_preferred_target() function for KVM ARM which will help us implement KVM_ARM_PREFERRED_TARGET ioctl for user space. Signed-off-by: Anup Patel <anup.patel@linaro.org> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-06-26arm/kvm: Cleanup KVM_ARM_MAX_VCPUS logicGeoff Levand1-0/+5
Commit d21a1c83c7595e387545632e44cd7797b76e19cc (ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally) changed the Kconfig logic for KVM_ARM_MAX_VCPUS to work around a build error arising from the use of KVM_ARM_MAX_VCPUS when CONFIG_KVM=n. The resulting Kconfig logic is a bit awkward and leaves a KVM_ARM_MAX_VCPUS always defined in the kernel config file. This change reverts the Kconfig logic back and adds a simple preprocessor conditional in kvm_host.h to handle when CONFIG_KVM_ARM_MAX_VCPUS is undefined. Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2013-06-26ARM: KVM: use phys_addr_t instead of unsigned long long for HYP PGDsMarc Zyngier1-2/+2
HYP PGDs are passed around as phys_addr_t, except just before calling into the hypervisor init code, where they are cast to a rather weird unsigned long long. Just keep them around as phys_addr_t, which is what makes the most sense. Reported-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-05-19ARM: KVM: move GIC/timer code to a common locationMarc Zyngier1-2/+2
As KVM/arm64 is looming on the horizon, it makes sense to move some of the common code to a single location in order to reduce duplication. The code could live anywhere. Actually, most of KVM is already built with a bunch of ugly ../../.. hacks in the various Makefiles, so we're not exactly talking about style here. But maybe it is time to start moving into a less ugly direction. The include files must be in a "public" location, as they are accessed from non-KVM files (arch/arm/kernel/asm-offsets.c). For this purpose, introduce two new locations: - virt/kvm/arm/ : x86 and ia64 already share the ioapic code in virt/kvm, so this could be seen as a (very ugly) precedent. - include/kvm/ : there is already an include/xen, and while the intent is slightly different, this seems as good a location as any Eventually, we should probably have independant Makefiles at every levels (just like everywhere else in the kernel), but this is just the first step. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-28ARM: KVM: promote vfp_host pointer to generic host cpu contextMarc Zyngier1-3/+5
We use the vfp_host pointer to store the host VFP context, should the guest start using VFP itself. Actually, we can use this pointer in a more generic way to store CPU speficic data, and arm64 is using it to dump the whole host state before switching to the guest. Simply rename the vfp_host field to host_cpu_context, and the corresponding type to kvm_cpu_context_t. No change in functionnality. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: add architecture specific hook for capabilitiesMarc Zyngier1-0/+5
Most of the capabilities are common to both arm and arm64, but we still need to handle the exceptions. Introduce kvm_arch_dev_ioctl_check_extension, which both architectures implement (in the 32bit case, it just returns 0). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: switch to a dual-step HYP init codeMarc Zyngier1-12/+19
Our HYP init code suffers from two major design issues: - it cannot support CPU hotplug, as we tear down the idmap very early - it cannot perform a TLB invalidation when switching from init to runtime mappings, as pages are manipulated from PL1 exclusively The hotplug problem mandates that we keep two sets of page tables (boot and runtime). The TLB problem mandates that we're able to transition from one PGD to another while in HYP, invalidating the TLBs in the process. To be able to do this, we need to share a page between the two page tables. A page that will have the same VA in both configurations. All we need is a VA that has the following properties: - This VA can't be used to represent a kernel mapping. - This VA will not conflict with the physical address of the kernel text The vectors page seems to satisfy this requirement: - The kernel never maps anything else there - The kernel text being copied at the beginning of the physical memory, it is unlikely to use the last 64kB (I doubt we'll ever support KVM on a system with something like 4MB of RAM, but patches are very welcome). Let's call this VA the trampoline VA. Now, we map our init page at 3 locations: - idmap in the boot pgd - trampoline VA in the boot pgd - trampoline VA in the runtime pgd The init scenario is now the following: - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd, runtime stack, runtime vectors - Enable the MMU with the boot pgd - Jump to a target into the trampoline page (remember, this is the same physical page!) - Now switch to the runtime pgd (same VA, and still the same physical page!) - Invalidate TLBs - Set stack and vectors - Profit! (or eret, if you only care about the code). Note that we keep the boot mapping permanently (it is not strictly an idmap anymore) to allow for CPU hotplug in later patches. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-04-28ARM: KVM: add support for minimal host vs guest profilingMarc Zyngier1-0/+3
In order to be able to correctly profile what is happening on the host, we need to be able to identify when we're running on the guest, and log these events differently. Perf offers a simple way to register callbacks into KVM. Mimic what x86 does and enjoy being able to profile your KVM host. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-03-06ARM: KVM: use kvm_kernel_vfp_t as an abstract type for VFP containersMarc Zyngier1-2/+4
In order to keep the VFP allocation code common, use an abstract type for the VFP containers. Maps onto struct vfp_hard_struct on ARM. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-03-06ARM: KVM: move hyp init to kvm_host.hMarc Zyngier1-0/+19
Make the split of the pgd_ptr an implementation specific thing by moving the init call to an inline function. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-03-06ARM: KVM: move exit handler selection to a separate fileMarc Zyngier1-0/+3
The exit handler selection code cannot be shared with arm64 (two different modes, more exception classes...). Move it to a separate file (handle_exit.c). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-03-06ARM: KVM: abstract fault register accessesMarc Zyngier1-6/+8
Instead of directly accessing the fault registers, use proper accessors so the core code can be shared. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-02-25ARM: KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTSMarc Zyngier1-1/+1
Commit bbacc0c (KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS) broke KVM/ARM by changing a global #define. Apply the same change to fix the compilation breakage. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-02-11ARM: KVM: arch_timers: Add guest timer core supportMarc Zyngier1-0/+5
Add some the architected timer related infrastructure, and support timer interrupt injection, which can happen as a resultof three possible events: - The virtual timer interrupt has fired while we were still executing the guest - The timer interrupt hasn't fired, but it expired while we were doing the world switch - A hrtimer we programmed earlier has fired Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-02-11ARM: KVM: Initial VGIC infrastructure codeMarc Zyngier1-0/+8
Wire the basic framework code for VGIC support and the initial in-kernel MMIO support code for the VGIC, used for the distributor emulation. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-02-11ARM: KVM: Keep track of currently running vcpusMarc Zyngier1-0/+10
When an interrupt occurs for the guest, it is sometimes necessary to find out which vcpu was running at that point. Keep track of which vcpu is being run in kvm_arch_vcpu_ioctl_run(), and allow the data to be retrieved using either: - kvm_arm_get_running_vcpu(): returns the vcpu running at this point on the current CPU. Can only be used in a non-preemptible context. - kvm_arm_get_running_vcpus(): returns the per-CPU variable holding the running vcpus, usable for per-CPU interrupts. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2013-01-23KVM: ARM: Power State Coordination Interface implementationMarc Zyngier1-1/+4
Implement the PSCI specification (ARM DEN 0022A) to control virtual CPUs being "powered" on or off. PSCI/KVM is detected using the KVM_CAP_ARM_PSCI capability. A virtual CPU can now be initialized in a "powered off" state, using the KVM_ARM_VCPU_POWER_OFF feature flag. The guest can use either SMC or HVC to execute a PSCI function. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-23KVM: ARM: Handle I/O abortsChristoffer Dall1-0/+4
When the guest accesses I/O memory this will create data abort exceptions and they are handled by decoding the HSR information (physical address, read/write, length, register) and forwarding reads and writes to QEMU which performs the device emulation. Certain classes of load/store operations do not support the syndrome information provided in the HSR. We don't support decoding these (patches are available elsewhere), so we report an error to user space in this case. This requires changing the general flow somewhat since new calls to run the VCPU must check if there's a pending MMIO load and perform the write after userspace has made the data available. Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-23KVM: ARM: User space API for getting/setting co-proc registersChristoffer Dall1-0/+4
The following three ioctls are implemented: - KVM_GET_REG_LIST - KVM_GET_ONE_REG - KVM_SET_ONE_REG Now we have a table for all the cp15 registers, we can drive a generic API. The register IDs carry the following encoding: ARM registers are mapped using the lower 32 bits. The upper 16 of that is the register group type, or coprocessor number: ARM 32-bit CP15 registers have the following id bit patterns: 0x4002 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3> ARM 64-bit CP15 registers have the following id bit patterns: 0x4003 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3> For futureproofing, we need to tell QEMU about the CP15 registers the host lets the guest access. It will need this information to restore a current guest on a future CPU or perhaps a future KVM which allow some of these to be changed. We use a separate table for these, as they're only for the userspace API. Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-23KVM: ARM: Emulation framework and CP15 emulationChristoffer Dall1-0/+4
Adds a new important function in the main KVM/ARM code called handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns from guest execution. This function examines the Hyp-Syndrome-Register (HSR), which contains information telling KVM what caused the exit from the guest. Some of the reasons for an exit are CP15 accesses, which are not allowed from the guest and this commit handles these exits by emulating the intended operation in software and skipping the guest instruction. Minor notes about the coproc register reset: 1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our table, at cost of 4 bytes per vcpu. 2) Added comments on the table indicating how we handle each register, for simplicity of understanding. Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-23KVM: ARM: World-switch implementationChristoffer Dall1-0/+13
Provides complete world-switch implementation to switch to other guests running in non-secure modes. Includes Hyp exception handlers that capture necessary exception information and stores the information on the VCPU and KVM structures. The following Hyp-ABI is also documented in the code: Hyp-ABI: Calling HYP-mode functions from host (in SVC mode): Switching to Hyp mode is done through a simple HVC #0 instruction. The exception vector code will check that the HVC comes from VMID==0 and if so will push the necessary state (SPSR, lr_usr) on the Hyp stack. - r0 contains a pointer to a HYP function - r1, r2, and r3 contain arguments to the above function. - The HYP function will be called with its arguments in r0, r1 and r2. On HYP function return, we return directly to SVC. A call to a function executing in Hyp mode is performed like the following: <svc code> ldr r0, =BSYM(my_hyp_fn) ldr r1, =my_param hvc #0 ; Call my_hyp_fn(my_param) from HYP mode <svc code> Otherwise, the world-switch is pretty straight-forward. All state that can be modified by the guest is first backed up on the Hyp stack and the VCPU values is loaded onto the hardware. State, which is not loaded, but theoretically modifiable by the guest is protected through the virtualiation features to generate a trap and cause software emulation. Upon guest returns, all state is restored from hardware onto the VCPU struct and the original state is restored from the Hyp-stack onto the hardware. SMP support using the VMPIDR calculated on the basis of the host MPIDR and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier. Reuse of VMIDs has been implemented by Antonios Motakis and adapated from a separate patch into the appropriate patches introducing the functionality. Note that the VMIDs are stored per VM as required by the ARM architecture reference manual. To support VFP/NEON we trap those instructions using the HPCTR. When we trap, we switch the FPU. After a guest exit, the VFP state is returned to the host. When disabling access to floating point instructions, we also mask FPEXC_EN in order to avoid the guest receiving Undefined instruction exceptions before we have a chance to switch back the floating point state. We are reusing vfp_hard_struct, so we depend on VFPv3 being enabled in the host kernel, if not, we still trap cp10 and cp11 in order to inject an undefined instruction exception whenever the guest tries to use VFP/NEON. VFP/NEON developed by Antionios Motakis and Rusty Russell. Aborts that are permission faults, and not stage-1 page table walk, do not report the faulting address in the HPFAR. We have to resolve the IPA, and store it just like the HPFAR register on the VCPU struct. If the IPA cannot be resolved, it means another CPU is playing with the page tables, and we simply restart the guest. This quirk was fixed by Marc Zyngier. Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-23KVM: ARM: Memory virtualization setupChristoffer Dall1-0/+18
This commit introduces the framework for guest memory management through the use of 2nd stage translation. Each VM has a pointer to a level-1 table (the pgd field in struct kvm_arch) which is used for the 2nd stage translations. Entries are added when handling guest faults (later patch) and the table itself can be allocated and freed through the following functions implemented in arch/arm/kvm/arm_mmu.c: - kvm_alloc_stage2_pgd(struct kvm *kvm); - kvm_free_stage2_pgd(struct kvm *kvm); Each entry in TLBs and caches are tagged with a VMID identifier in addition to ASIDs. The VMIDs are assigned consecutively to VMs in the order that VMs are executed, and caches and tlbs are invalidated when the VMID space has been used to allow for more than 255 simultaenously running guests. The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is freed in kvm_arch_destroy_vm(). Both functions are called from the main KVM code. We pre-allocate page table memory to be able to synchronize using a spinlock and be called under rcu_read_lock from the MMU notifiers. We steal the mmu_memory_cache implementation from x86 and adapt for our specific usage. We support MMU notifiers (thanks to Marc Zyngier) through kvm_unmap_hva and kvm_set_spte_hva. Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA, which is used by VGIC support to map the virtual CPU interface registers to the guest. This support is added by Marc Zyngier. Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
2013-01-23KVM: ARM: Hypervisor initializationChristoffer Dall1-0/+1
Sets up KVM code to handle all exceptions taken to Hyp mode. When the kernel is booted in Hyp mode, calling an hvc instruction with r0 pointing to the new vectors, the HVBAR is changed to the the vector pointers. This allows subsystems (like KVM here) to execute code in Hyp-mode with the MMU disabled. We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode to perform a world-switch into a KVM guest. Also provides memory mapping code to map required code pages, data structures, and I/O regions accessed in Hyp mode at the same virtual address as the host kernel virtual addresses, but which conforms to the architectural requirements for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c and comprises: - create_hyp_mappings(from, to); - create_hyp_io_mappings(from, to, phys_addr); - free_hyp_pmds(); Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>