aboutsummaryrefslogtreecommitdiffstats
path: root/virt/kvm
AgeCommit message (Collapse)AuthorFilesLines
2019-10-24Merge remote-tracking branch 'kvmarm/kvm-arm64/stolen-time' into kvmarm-master/nextMarc Zyngier5-85/+218
2019-10-22KVM: Add separate helper for putting borrowed reference to kvmSean Christopherson1-2/+14
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken. [*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22Merge tag 'kvmarm-fixes-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini1-13/+35
KVM/arm fixes for 5.4, take #2 Special PMU edition: - Fix cycle counter truncation - Fix cycle counter overflow limit on pure 64bit system - Allow chained events to be actually functional - Correct sample period after overflow
2019-10-22KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabledWanpeng Li1-13/+16
Don't waste cycles to shrink/grow vCPU halt_poll_ns if host side polling is disabled. Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-21KVM: arm64: Provide VCPU attributes for stolen timeSteven Price1-0/+59
Allow user space to inform the KVM host where in the physical memory map the paravirtualized time structures should be located. User space can set an attribute on the VCPU providing the IPA base address of the stolen time structure for that VCPU. This must be repeated for every VCPU in the VM. The address is given in terms of the physical address visible to the guest and must be 64 byte aligned. The guest will discover the address via a hypercall. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: Allow kvm_device_ops to be constSteven Price1-3/+3
Currently a kvm_device_ops structure cannot be const without triggering compiler warnings. However the structure doesn't need to be written to and, by marking it const, it can be read-only in memory. Add some more const keywords to allow this. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Support stolen time reporting via shared structureSteven Price3-0/+69
Implement the service call for configuring a shared structure between a VCPU and the hypervisor in which the hypervisor can write the time stolen from the VCPU's execution time by other tasks on the host. User space allocates memory which is placed at an IPA also chosen by user space. The hypervisor then updates the shared structure using kvm_put_guest() to ensure single copy atomicity of the 64-bit value reporting the stolen time in nanoseconds. Whenever stolen time is enabled by the guest, the stolen time counter is reset. The stolen time itself is retrieved from the sched_info structure maintained by the Linux scheduler code. We enable SCHEDSTATS when selecting KVM Kconfig to ensure this value is meaningful. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm64: Implement PV_TIME_FEATURES callSteven Price2-1/+27
This provides a mechanism for querying which paravirtualized time features are available in this hypervisor. Also add the header file which defines the ABI for the paravirtualized time features we're about to add. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm/arm64: Factor out hypercall handling from PSCI codeChristoffer Dall2-82/+61
We currently intertwine the KVM PSCI implementation with the general dispatch of hypercall handling, which makes perfect sense because PSCI is the only category of hypercalls we support. However, as we are about to support additional hypercalls, factor out this functionality into a separate hypercall handler file. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [steven.price@arm.com: rebased] Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm/arm64: Allow user injection of external data abortsChristoffer Dall1-0/+1
In some scenarios, such as buggy guest or incorrect configuration of the VMM and firmware description data, userspace will detect a memory access to a portion of the IPA, which is not mapped to any MMIO region. For this purpose, the appropriate action is to inject an external abort to the guest. The kernel already has functionality to inject an external abort, but we need to wire up a signal from user space that lets user space tell the kernel to do this. It turns out, we already have the set event functionality which we can perfectly reuse for this. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: arm/arm64: Allow reporting non-ISV data aborts to userspaceChristoffer Dall2-1/+29
For a long time, if a guest accessed memory outside of a memslot using any of the load/store instructions in the architecture which doesn't supply decoding information in the ESR_EL2 (the ISV bit is not set), the kernel would print the following message and terminate the VM as a result of returning -ENOSYS to userspace: load/store instruction decoding not implemented The reason behind this message is that KVM assumes that all accesses outside a memslot is an MMIO access which should be handled by userspace, and we originally expected to eventually implement some sort of decoding of load/store instructions where the ISV bit was not set. However, it turns out that many of the instructions which don't provide decoding information on abort are not safe to use for MMIO accesses, and the remaining few that would potentially make sense to use on MMIO accesses, such as those with register writeback, are not used in practice. It also turns out that fetching an instruction from guest memory can be a pretty horrible affair, involving stopping all CPUs on SMP systems, handling multiple corner cases of address translation in software, and more. It doesn't appear likely that we'll ever implement this in the kernel. What is much more common is that a user has misconfigured his/her guest and is actually not accessing an MMIO region, but just hitting some random hole in the IPA space. In this scenario, the error message above is almost misleading and has led to a great deal of confusion over the years. It is, nevertheless, ABI to userspace, and we therefore need to introduce a new capability that userspace explicitly enables to change behavior. This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) which does exactly that, and introduces a new exit reason to report the event to userspace. User space can then emulate an exception to the guest, restart the guest, suspend the guest, or take any other appropriate action as per the policy of the running system. Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-20KVM: arm64: pmu: Reset sample period on overflow handlingMarc Zyngier1-0/+20
The PMU emulation code uses the perf event sample period to trigger the overflow detection. This works fine for the *first* overflow handling, but results in a huge number of interrupts on the host, unrelated to the number of interrupts handled in the guest (a x20 factor is pretty common for the cycle counter). On a slow system (such as a SW model), this can result in the guest only making forward progress at a glacial pace. It turns out that the clue is in the name. The sample period is exactly that: a period. And once the an overflow has occured, the following period should be the full width of the associated counter, instead of whatever the guest had initially programed. Reset the sample period to the architected value in the overflow handler, which now results in a number of host interrupts that is much closer to the number of interrupts in the guest. Fixes: b02386eb7dac ("arm64: KVM: Add PMU overflow interrupt routing") Reviewed-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-20KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel eventMarc Zyngier1-3/+3
The current convention for KVM to request a chained event from the host PMU is to set bit[0] in attr.config1 (PERF_ATTR_CFG1_KVM_PMU_CHAINED). But as it turns out, this bit gets set *after* we create the kernel event that backs our virtual counter, meaning that we never get a 64bit counter. Moving the setting to an earlier point solves the problem. Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") Reviewed-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-20KVM: arm64: pmu: Fix cycle counter truncationMarc Zyngier1-10/+12
When a counter is disabled, its value is sampled before the event is being disabled, and the value written back in the shadow register. In that process, the value gets truncated to 32bit, which is adequate for any counter but the cycle counter (defined as a 64bit counter). This obviously results in a corrupted counter, and things like "perf record -e cycles" not working at all when run in a guest... A similar, but less critical bug exists in kvm_pmu_get_counter_value. Make the truncation conditional on the counter not being the cycle counter, which results in a minor code reorganisation. Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") Reviewed-by: Andrew Murray <andrew.murray@arm.com> Reported-by: Julien Thierry <julien.thierry.kdev@gmail.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-03Merge tag 'kvmarm-fixes-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini1-1/+1
KVM/arm fixes for 5.4, take #1 - Remove the now obsolete hyp_alternate_select construct - Fix the TRACE_INCLUDE_PATH macro in the vgic code
2019-09-30kvm: x86, powerpc: do not allow clearing largepages debugfs entryPaolo Bonzini1-3/+7
The largepages debugfs entry is incremented/decremented as shadow pages are created or destroyed. Clearing it will result in an underflow, which is harmless to KVM but ugly (and could be misinterpreted by tools that use debugfs information), so make this particular statistic read-only. Cc: kvm-ppc@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds10-85/+305
Pull KVM updates from Paolo Bonzini: "s390: - ioctl hardening - selftests ARM: - ITS translation cache - support for 512 vCPUs - various cleanups and bugfixes PPC: - various minor fixes and preparation x86: - bugfixes all over the place (posted interrupts, SVM, emulation corner cases, blocked INIT) - some IPI optimizations" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (75 commits) KVM: X86: Use IPI shorthands in kvm guest when support KVM: x86: Fix INIT signal handling in various CPU states KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode KVM: VMX: Stop the preemption timer during vCPU reset KVM: LAPIC: Micro optimize IPI latency kvm: Nested KVM MMUs need PAE root too KVM: x86: set ctxt->have_exception in x86_decode_insn() KVM: x86: always stop emulation on page fault KVM: nVMX: trace nested VM-Enter failures detected by H/W KVM: nVMX: add tracepoint for failed nested VM-Enter x86: KVM: svm: Fix a check in nested_svm_vmrun() KVM: x86: Return to userspace with internal error on unexpected exit reason KVM: x86: Add kvm_emulate_{rd,wr}msr() to consolidate VXM/SVM code KVM: x86: Refactor up kvm_{g,s}et_msr() to simplify callers doc: kvm: Fix return description of KVM_SET_MSRS KVM: X86: Tune PLE Window tracepoint KVM: VMX: Change ple_window type to unsigned int KVM: X86: Remove tailing newline for tracepoints KVM: X86: Trace vcpu_id for vmexit KVM: x86: Manually calculate reserved bits when loading PDPTRS ...
2019-09-18KVM: coalesced_mmio: add bounds checkingMatt Delco1-8/+11
The first/last indexes are typically shared with a user app. The app can change the 'last' index that the kernel uses to store the next result. This change sanity checks the index before using it for writing to a potentially arbitrary address. This fixes CVE-2019-14821. Cc: stable@vger.kernel.org Fixes: 5f94c1741bdc ("KVM: Add coalesced MMIO support (common part)") Signed-off-by: Matt Delco <delco@chromium.org> Signed-off-by: Jim Mattson <jmattson@google.com> Reported-by: syzbot+983c866c3dd6efa3662a@syzkaller.appspotmail.com [Use READ_ONCE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-11KVM: arm/arm64: vgic: Use the appropriate TRACE_INCLUDE_PATHZenghui Yu1-1/+1
Commit 49dfe94fe5ad ("KVM: arm/arm64: Fix TRACE_INCLUDE_PATH") fixes TRACE_INCLUDE_PATH to the correct relative path to the define_trace.h and explains why did the old one work. The same fix should be applied to virt/kvm/arm/vgic/trace.h. Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-09-09KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINEMarc Zyngier1-0/+2
While parts of the VGIC support a large number of vcpus (we bravely allow up to 512), other parts are more limited. One of these limits is visible in the KVM_IRQ_LINE ioctl, which only allows 256 vcpus to be signalled when using the CPU or PPI types. Unfortunately, we've cornered ourselves badly by allocating all the bits in the irq field. Since the irq_type subfield (8 bit wide) is currently only taking the values 0, 1 and 2 (and we have been careful not to allow anything else), let's reduce this field to only 4 bits, and allocate the remaining 4 bits to a vcpu2_index, which acts as a multiplier: vcpu_id = 256 * vcpu2_index + vcpu_index With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2) allowing this to be discovered, it becomes possible to inject PPIs to up to 4096 vcpus. But please just don't. Whilst we're there, add a clarification about the use of KVM_IRQ_LINE on arm, which is not completely conditionned by KVM_CAP_IRQCHIP. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-28Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds4-2/+33
Pull arm64 fixes from Will Deacon: "Hot on the heels of our last set of fixes are a few more for -rc7. Two of them are fixing issues with our virtual interrupt controller implementation in KVM/arm, while the other is a longstanding but straightforward kallsyms fix which was been acked by Masami and resolves an initialisation failure in kprobes observed on arm64. - Fix GICv2 emulation bug (KVM) - Fix deadlock in virtual GIC interrupt injection code (KVM) - Fix kprobes blacklist init failure due to broken kallsyms lookup" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WI KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is long kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol
2019-08-28KVM: arm/arm64: vgic-v2: Handle SGI bits in GICD_I{S,C}PENDR0 as WIMarc Zyngier3-2/+26
A guest is not allowed to inject a SGI (or clear its pending state) by writing to GICD_ISPENDR0 (resp. GICD_ICPENDR0), as these bits are defined as WI (as per ARM IHI 0048B 4.3.7 and 4.3.8). Make sure we correctly emulate the architecture. Fixes: 96b298000db4 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers") Cc: stable@vger.kernel.org # 4.7+ Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-27KVM: arm/arm64: vgic: Fix potential deadlock when ap_list is longHeyi Guo1-0/+7
If the ap_list is longer than 256 entries, merge_final() in list_sort() will call the comparison callback with the same element twice, causing a deadlock in vgic_irq_cmp(). Fix it by returning early when irqa == irqb. Cc: stable@vger.kernel.org # 4.7+ Fixes: 8e4447457965 ("KVM: arm/arm64: vgic-new: Add IRQ sorting") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Heyi Guo <guoheyi@huawei.com> [maz: massaged commit log and patch, added Fixes and Cc-stable] Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-08-25KVM: arm/arm64: vgic: Use a single IO device per redistributorEric Auger2-58/+24
At the moment we use 2 IO devices per GICv3 redistributor: one one for the RD_base frame and one for the SGI_base frame. Instead we can use a single IO device per redistributor (the 2 frames are contiguous). This saves slots on the KVM_MMIO_BUS which is currently limited to NR_IOBUS_DEVS (1000). This change allows to instantiate up to 512 redistributors and may speed the guest boot with a large number of VCPUs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-25KVM: arm/arm64: vgic: Remove spurious semicolonsMarc Zyngier1-1/+1
Detected by Coccinelle (and Will Deacon) using scripts/coccinelle/misc/semicolon.cocci. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-24Merge tag 'kvmarm-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm/fixesWill Deacon2-10/+27
Pull KVM/arm fixes from Marc Zyngier as per Paulo's request at: https://lkml.kernel.org/r/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com "One (hopefully last) set of fixes for KVM/arm for 5.3: an embarassing MMIO emulation regression, and a UBSAN splat. Oh well... - Don't overskip instructions on MMIO emulation - Fix UBSAN splat when initializing PPI priorities" * tag 'kvmarm-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm: KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity KVM: arm/arm64: Only skip MMIO insn once
2019-08-23KVM: arm/arm64: VGIC: Properly initialise private IRQ affinityAndre Przywara1-10/+20
At the moment we initialise the target *mask* of a virtual IRQ to the VCPU it belongs to, even though this mask is only defined for GICv2 and quickly runs out of bits for many GICv3 guests. This behaviour triggers an UBSAN complaint for more than 32 VCPUs: ------ [ 5659.462377] UBSAN: Undefined behaviour in virt/kvm/arm/vgic/vgic-init.c:223:21 [ 5659.471689] shift exponent 32 is too large for 32-bit type 'unsigned int' ------ Also for GICv3 guests the reporting of TARGET in the "vgic-state" debugfs dump is wrong, due to this very same problem. Because there is no requirement to create the VGIC device before the VCPUs (and QEMU actually does it the other way round), we can't safely initialise mpidr or targets in kvm_vgic_vcpu_init(). But since we touch every private IRQ for each VCPU anyway later (in vgic_init()), we can just move the initialisation of those fields into there, where we definitely know the VGIC type. On the way make sure we really have either a VGICv2 or a VGICv3 device, since the existing code is just checking for "VGICv3 or not", silently ignoring the uninitialised case. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reported-by: Dave Martin <dave.martin@arm.com> Tested-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-22KVM: arm/arm64: Only skip MMIO insn onceAndrew Jones1-0/+7
If after an MMIO exit to userspace a VCPU is immediately run with an immediate_exit request, such as when a signal is delivered or an MMIO emulation completion is needed, then the VCPU completes the MMIO emulation and immediately returns to userspace. As the exit_reason does not get changed from KVM_EXIT_MMIO in these cases we have to be careful not to complete the MMIO emulation again, when the VCPU is eventually run again, because the emulation does an instruction skip (and doing too many skips would be a waste of guest code :-) We need to use additional VCPU state to track if the emulation is complete. As luck would have it, we already have 'mmio_needed', which even appears to be used in this way by other architectures already. Fixes: 0d640732dbeb ("arm64: KVM: Skip MMIO insn after emulation") Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: Call kvm_arch_vcpu_blocking early into the blocking sequenceMarc Zyngier1-4/+3
When a vpcu is about to block by calling kvm_vcpu_block, we call back into the arch code to allow any form of synchronization that may be required at this point (SVN stops the AVIC, ARM synchronises the VMCR and enables GICv4 doorbells). But this synchronization comes in quite late, as we've potentially waited for halt_poll_ns to expire. Instead, let's move kvm_arch_vcpu_blocking() to the beginning of kvm_vcpu_block(), which on ARM has several benefits: - VMCR gets synchronised early, meaning that any interrupt delivered during the polling window will be evaluated with the correct guest PMR - GICv4 doorbells are enabled, which means that any guest interrupt directly injected during that window will be immediately recognised Tang Nianyao ran some tests on a GICv4 machine to evaluate such change, and reported up to a 10% improvement for netperf: <quote> netperf result: D06 as server, intel 8180 server as client with change: package 512 bytes - 5500 Mbits/s package 64 bytes - 760 Mbits/s without change: package 512 bytes - 5000 Mbits/s package 64 bytes - 710 Mbits/s </quote> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic: Make function comments match function declarationsAlexandru Elisei2-6/+8
Since commit 503a62862e8f ("KVM: arm/arm64: vgic: Rely on the GIC driver to parse the firmware tables"), the vgic_v{2,3}_probe functions stopped using a DT node. Commit 909777324588 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init") changed the functions again, and now they require exactly one argument, a struct gic_kvm_info populated by the GIC driver. Unfortunately the comments regressed and state that a DT node is used instead. Change the function comments to reflect the current prototypes. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic-irqfd: Implement kvm_arch_set_irq_inatomicMarc Zyngier1-6/+30
Now that we have a cache of MSI->LPI translations, it is pretty easy to implement kvm_arch_set_irq_inatomic (this cache can be parsed without sleeping). Hopefully, this will improve some LPI-heavy workloads. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic-its: Check the LPI translation cache on MSI injectionMarc Zyngier2-0/+37
When performing an MSI injection, let's first check if the translation is already in the cache. If so, let's inject it quickly without going through the whole translation process. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic-its: Cache successful MSI->LPI translationMarc Zyngier1-0/+86
On a successful translation, preserve the parameters in the LPI translation cache. Each translation is reusing the last slot in the list, naturally evicting the least recently used entry. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on vgic teardownMarc Zyngier1-0/+2
In order to avoid leaking vgic_irq structures on teardown, we need to drop all references to LPIs before deallocating the cache itself. This is done by invalidating the cache on vgic teardown. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on ITS disableMarc Zyngier1-0/+2
If an ITS gets disabled, we need to make sure that further interrupts won't hit in the cache. For that, we invalidate the translation cache when the ITS is disabled. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on disabling LPIsMarc Zyngier1-1/+3
If a vcpu disables LPIs at its redistributor level, we need to make sure we won't pend more interrupts. For this, we need to invalidate the LPI translation cache. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on specific commandsMarc Zyngier1-0/+9
The LPI translation cache needs to be discarded when an ITS command may affect the translation of an LPI (DISCARD, MAPC and MAPD with V=0) or the routing of an LPI to a redistributor with disabled LPIs (MOVI, MOVALL). We decide to perform a full invalidation of the cache, irrespective of the LPI that is affected. Commands are supposed to be rare enough that it doesn't matter. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic-its: Add MSI-LPI translation cache invalidationMarc Zyngier2-0/+24
There's a number of cases where we need to invalidate the caching of translations, so let's add basic support for that. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic: Add __vgic_put_lpi_locked primitiveMarc Zyngier2-9/+18
Our LPI translation cache needs to be able to drop the refcount on an LPI whilst already holding the lpi_list_lock. Let's add a new primitive for this. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-18KVM: arm/arm64: vgic: Add LPI translation cache definitionMarc Zyngier3-0/+56
Add the basic data structure that expresses an MSI to LPI translation as well as the allocation/release hooks. The size of the cache is arbitrarily defined as 16*nr_vcpus. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-09Merge tag 'kvmarm-fixes-for-5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini6-2/+54
KVM/arm fixes for 5.3, take #2 - Fix our system register reset so that we stop writing non-sensical values to them, and track which registers get reset instead. - Sync VMCR back from the GIC on WFI so that KVM has an exact vue of PMR. - Reevaluate state of HW-mapped, level triggered interrupts on enable.
2019-08-09Merge tag 'kvmarm-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini3-3/+25
KVM/arm fixes for 5.3 - A bunch of switch/case fall-through annotation, fixing one actual bug - Fix PMU reset bug - Add missing exception class debug strings
2019-08-09kvm: remove unnecessary PageReserved checkPaolo Bonzini1-2/+1
The same check is already done in kvm_is_reserved_pfn. Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-09KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enableAlexandru Elisei1-0/+16
A HW mapped level sensitive interrupt asserted by a device will not be put into the ap_list if it is disabled at the VGIC level. When it is enabled again, it will be inserted into the ap_list and written to a list register on guest entry regardless of the state of the device. We could argue that this can also happen on real hardware, when the command to enable the interrupt reached the GIC before the device had the chance to de-assert the interrupt signal; however, we emulate the distributor and redistributors in software and we can do better than that. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-05KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to blockMarc Zyngier5-2/+38
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: stable@vger.kernel.org # 4.12 Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-08-05KVM: no need to check return value of debugfs_create functionsGreg KH1-16/+5
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Also, when doing this, change kvm_arch_create_vcpu_debugfs() to return void instead of an integer, as we should not care at all about if this function actually does anything or not. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <x86@kernel.org> Cc: <kvm@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05KVM: remove kvm_arch_has_vcpu_debugfs()Paolo Bonzini2-8/+2
There is no need for this function as all arches have to implement kvm_arch_create_vcpu_debugfs() no matter what. A #define symbol let us actually simplify the code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05KVM: Fix leak vCPU's VMCS value into other pCPUWanpeng Li1-1/+24
After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting in the VMs after stress testing: INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073) Call Trace: flush_tlb_mm_range+0x68/0x140 tlb_flush_mmu.part.75+0x37/0xe0 tlb_finish_mmu+0x55/0x60 zap_page_range+0x142/0x190 SyS_madvise+0x3cd/0x9c0 system_call_fastpath+0x1c/0x21 swait_active() sustains to be true before finish_swait() is called in kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin() loop greatly increases the probability condition kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv is enabled the yield-candidate vCPU's VMCS RVI field leaks(by vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current VMCS. This patch fixes it by checking conservatively a subset of events. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Marc Zyngier <Marc.Zyngier@arm.com> Cc: stable@vger.kernel.org Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop) Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-05KVM: Check preempted_in_kernel for involuntary preemptionWanpeng Li1-3/+4
preempted_in_kernel is updated in preempt_notifier when involuntary preemption ocurrs, it can be stale when the voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin() loop. This patch lets it just check preempted_in_kernel for involuntary preemption. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-26KVM: arm: vgic-v3: Mark expected switch fall-throughAnders Roxell1-0/+8
When fall-through warnings was enabled by default the following warnings was starting to show up: ../virt/kvm/arm/hyp/vgic-v3-sr.c: In function ‘__vgic_v3_save_aprs’: ../virt/kvm/arm/hyp/vgic-v3-sr.c:351:24: warning: this statement may fall through [-Wimplicit-fallthrough=] cpu_if->vgic_ap0r[2] = __vgic_v3_read_ap0rn(2); ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ ../virt/kvm/arm/hyp/vgic-v3-sr.c:352:2: note: here case 6: ^~~~ ../virt/kvm/arm/hyp/vgic-v3-sr.c:353:24: warning: this statement may fall through [-Wimplicit-fallthrough=] cpu_if->vgic_ap0r[1] = __vgic_v3_read_ap0rn(1); ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ ../virt/kvm/arm/hyp/vgic-v3-sr.c:354:2: note: here default: ^~~~~~~ Rework so that the compiler doesn't warn about fall-through. Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>