aboutsummaryrefslogtreecommitdiffstats
path: root/virt (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-27KVM: Move initialization of preempt notifier to kvm_vcpu_init()Sean Christopherson1-2/+1
Initialize the preempt notifier immediately in kvm_vcpu_init() to pave the way for removing kvm_arch_vcpu_setup(), i.e. to allow arch specific code to call vcpu_load() during kvm_arch_vcpu_create(). Back when preemption support was added, the location of the call to init the preempt notifier was perfectly sane. The overall vCPU creation flow featured a single arch specific hook and the preempt notifer was used immediately after its initialization (by vcpu_load()). E.g.: vcpu = kvm_arch_ops->vcpu_create(kvm, n); if (IS_ERR(vcpu)) return PTR_ERR(vcpu); preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops); vcpu_load(vcpu); r = kvm_mmu_setup(vcpu); vcpu_put(vcpu); if (r < 0) goto free_vcpu; Today, the call to preempt_notifier_init() is sandwiched between two arch specific calls, kvm_arch_vcpu_create() and kvm_arch_vcpu_setup(), which needlessly forces x86 (and possibly others?) to split its vCPU creation flow. Init the preempt notifier prior to any arch specific call so that each arch can independently decide how best to organize its creation flow. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: Unexport kvm_vcpu_cache and kvm_vcpu_{un}init()Sean Christopherson1-6/+3
Unexport kvm_vcpu_cache and kvm_vcpu_{un}init() and make them static now that they are referenced only in kvm_main.c. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27KVM: Move vcpu alloc and init invocation to common codeSean Christopherson2-30/+20
Now that all architectures tightly couple vcpu allocation/free with the mandatory calls to kvm_{un}init_vcpu(), move the sequences verbatim to common KVM code. Move both allocation and initialization in a single patch to eliminate thrash in arch specific code. The bisection benefits of moving the two pieces in separate patches is marginal at best, whereas the odds of introducing a transient arch specific bug are non-zero. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24KVM: Introduce kvm_vcpu_destroy()Sean Christopherson2-1/+7
Add kvm_vcpu_destroy() and wire up all architectures to call the common function instead of their arch specific implementation. The common destruction function will be used by future patches to move allocation and initialization of vCPUs to common KVM code, i.e. to free resources that are allocated by arch agnostic code. No functional change intended. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24KVM: Add kvm_arch_vcpu_precreate() to handle pre-allocation issuesSean Christopherson2-10/+15
Add a pre-allocation arch hook to handle checks that are currently done by arch specific code prior to allocating the vCPU object. This paves the way for moving the allocation to common KVM code. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24KVM: arm: Drop kvm_arch_vcpu_free()Sean Christopherson1-7/+2
Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from common code, i.e. choosing which function to whack is not completely arbitrary. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-23KVM: async_pf: drop kvm_arch_async_page_present wrappersPaolo Bonzini1-17/+4
The wrappers make it less clear that the position of the call to kvm_arch_async_page_present depends on the architecture, and that only one of the two call sites will actually be active. Remove them. Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21kvm: Refactor handling of VM debugfs filesMilan Pandurov1-71/+71
We can store reference to kvm_stats_debugfs_item instead of copying its values to kvm_stat_data. This allows us to remove duplicated code and usage of temporary kvm_stat_data inside vm_stat_get et al. Signed-off-by: Milan Pandurov <milanpa@amazon.de> Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21KVM: Fix some writing mistakesMiaohe Lin1-1/+1
Fix some writing mistakes in the comments. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21KVM: Fix some grammar mistakesMiaohe Lin1-1/+1
Fix some grammar mistakes in the comments. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21KVM: Fix some wrong function names in commentMiaohe Lin1-1/+1
Fix some wrong function names in comment. mmu_check_roots is a typo for mmu_check_root, vmcs_read_any should be vmcs12_read_any and so on. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVMSean Christopherson1-5/+5
Convert a plethora of parameters and variables in the MMU and page fault flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM. Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical addresses. When TDP is enabled, the fault address is a guest physical address and thus can be a 64-bit value, even when both KVM and its guest are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a 64-bit field, not a natural width field. Using a gva_t for the fault address means KVM will incorrectly drop the upper 32-bits of the GPA. Ditto for gva_to_gpa() when it is used to translate L2 GPAs to L1 GPAs. Opportunistically rename variables and parameters to better reflect the dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain "addr" instead of "vaddr" when the address may be either a GVA or an L2 GPA. Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid a confusing "gpa_t gva" declaration; this also sets the stage for a future patch to combing nonpaging_page_fault() and tdp_page_fault() with minimal churn. Sprinkle in a few comments to document flows where an address is known to be a GVA and thus can be safely truncated to a 32-bit value. Add WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help document such cases and detect bugs. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: lib: use jump label to handle resource release in irq_bypass_register_producer()Miaohe Lin1-9/+10
Use out_err jump label to handle resource release. It's a good practice to release resource in one place and help eliminate some duplicated code. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: lib: use jump label to handle resource release in irq_bypass_register_consumer()Miaohe Lin1-9/+10
Use out_err jump label to handle resource release. It's a good practice to release resource in one place and help eliminate some duplicated code. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: get rid of var page in kvm_set_pfn_dirty()Miaohe Lin1-5/+2
We can get rid of unnecessary var page in kvm_set_pfn_dirty() , thus make code style similar with kvm_set_pfn_accessed(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-22Merge tag 'kvm-ppc-fixes-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-masterPaolo Bonzini2-14/+22
PPC KVM fix for 5.5 - Fix a bug where we try to do an ultracall on a system without an ultravisor.
2019-12-18Merge tag 'kvmarm-fixes-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-masterPaolo Bonzini14-166/+327
KVM/arm fixes for .5.5, take #1 - Fix uninitialised sysreg accessor - Fix handling of demand-paged device mappings - Stop spamming the console on IMPDEF sysregs - Relax mappings of writable memslots - Assorted cleanups
2019-12-12KVM: arm/arm64: Properly handle faulting of device mappingsMarc Zyngier1-4/+17
A device mapping is normally always mapped at Stage-2, since there is very little gain in having it faulted in. Nonetheless, it is possible to end-up in a situation where the device mapping has been removed from Stage-2 (userspace munmaped the VFIO region, and the MMU notifier did its job), but present in a userspace mapping (userpace has mapped it back at the same address). In such a situation, the device mapping will be demand-paged as the guest performs memory accesses. This requires to be careful when dealing with mapping size, cache management, and to handle potential execution of a device mapping. Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191211165651.7889-2-maz@kernel.org
2019-12-06KVM: arm/arm64: Remove excessive permission check in kvm_arch_prepare_memory_regionJia He1-9/+0
In kvm_arch_prepare_memory_region, arm kvm regards the memory region as writable if the flag has no KVM_MEM_READONLY, and the vm is readonly if !VM_WRITE. But there is common usage for setting kvm memory region as follows: e.g. qemu side (see the PROT_NONE flag) 1. mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); memory_region_init_ram_ptr() 2. re mmap the above area with read/write authority. Such example is used in virtio-fs qemu codes which hasn't been upstreamed [1]. But seems we can't forbid this example. Without this patch, it will cause an EPERM during kvm_set_memory_region() and cause qemu boot crash. As told by Ard, "the underlying assumption is incorrect, i.e., that the value of vm_flags at this point in time defines how the VMA is used during its lifetime. There may be other cases where a VMA is created with VM_READ vm_flags that are changed to VM_READ|VM_WRITE later, and we are currently rejecting this use case as well." [1] https://gitlab.com/virtio-fs/qemu/blob/5a356e/hw/virtio/vhost-user-fs.c#L488 Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Link: https://lore.kernel.org/r/20191206020802.196108-1-justin.he@arm.com
2019-12-06KVM: arm/arm64: vgic: Use wrapper function to lock/unlock all vcpus in kvm_vgic_create()Miaohe Lin1-15/+4
Use wrapper function lock_all_vcpus()/unlock_all_vcpus() in kvm_vgic_create() to remove duplicated code dealing with locking and unlocking all vcpus in a vm. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/1575081918-11401-1-git-send-email-linmiaohe@huawei.com
2019-12-06KVM: arm/arm64: vgic: Fix potential double free dist->spis in __kvm_vgic_destroy()Miaohe Lin1-0/+1
In kvm_vgic_dist_init() called from kvm_vgic_map_resources(), if dist->vgic_model is invalid, dist->spis will be freed without set dist->spis = NULL. And in vgicv2 resources clean up path, __kvm_vgic_destroy() will be called to free allocated resources. And dist->spis will be freed again in clean up chain because we forget to set dist->spis = NULL in kvm_vgic_dist_init() failed path. So double free would happen. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/1574923128-19956-1-git-send-email-linmiaohe@huawei.com
2019-12-06KVM: arm/arm64: Get rid of unused arg in cpu_init_hyp_mode()Miaohe Lin1-2/+2
As arg dummy is not really needed, there's no need to pass NULL when calling cpu_init_hyp_mode(). So clean it up. Fixes: 67f691976662 ("arm64: kvm: allows kvm cpu hotplug") Reviewed-by: Steven Price <steven.price@arm.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1574320559-5662-1-git-send-email-linmiaohe@huawei.com
2019-11-23KVM: Fix jump label out_free_* in kvm_init()Miaohe Lin1-4/+3
The jump label out_free_1 and out_free_2 deal with the same stuff, so git rid of one and rename the label out_free_0a to retain the label name order. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21Merge branch 'kvm-tsx-ctrl' into HEADPaolo Bonzini1-30/+179
Conflicts: arch/x86/kvm/vmx/vmx.c
2019-11-21Merge tag 'kvmarm-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini13-136/+303
KVM/arm updates for Linux 5.5: - Allow non-ISV data aborts to be reported to userspace - Allow injection of data aborts from userspace - Expose stolen time to guests - GICv4 performance improvements - vgic ITS emulation fixes - Simplify FWB handling - Enable halt pool counters - Make the emulated timer PREEMPT_RT compliant Conflicts: include/uapi/linux/kvm.h
2019-11-15KVM: remember position in kvm->vcpus arrayRadim Krčmář1-2/+3
Fetching an index for any vcpu in kvm->vcpus array by traversing the entire array everytime is costly. This patch remembers the position of each vcpu in kvm->vcpus array by storing it in vcpus_idx under kvm_vcpu structure. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: MMIO: get rid of odd out_err label in kvm_coalesced_mmio_initMiaohe Lin1-6/+2
The out_err label and var ret is unnecessary, clean them up. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-15KVM: Add a comment describing the /dev/kvm no_compat handlingMarc Zyngier1-0/+7
Add a comment explaining the rational behind having both no_compat open and ioctl callbacks to fend off compat tasks. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-13KVM: Forbid /dev/kvm being opened by a compat task when CONFIG_KVM_COMPAT=nMarc Zyngier1-1/+7
On a system without KVM_COMPAT, we prevent IOCTLs from being issued by a compat task. Although this prevents most silly things from happening, it can still confuse a 32bit userspace that is able to open the kvm device (the qemu test suite seems to be pretty mad with this behaviour). Take a more radical approach and return a -ENODEV to the compat task. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-14/+34
Pull kvm fixes from Paolo Bonzini: "Fix unwinding of KVM_CREATE_VM failure, VT-d posted interrupts, DAX/ZONE_DEVICE, and module unload/reload" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved KVM: VMX: Introduce pi_is_pir_empty() helper KVM: VMX: Do not change PID.NDST when loading a blocked vCPU KVM: VMX: Consider PID.PIR to determine if vCPU has pending interrupts KVM: VMX: Fix comment to specify PID.ON instead of PIR.ON KVM: X86: Fix initialization of MSR lists KVM: fix placement of refcount initialization KVM: Fix NULL-ptr deref after kvm_create_vm fails
2019-11-12KVM: MMU: Do not treat ZONE_DEVICE pages as being reservedSean Christopherson1-3/+23
Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl Reported-by: Adam Borowski <kilobyte@angband.pl> Analyzed-by: David Hildenbrand <david@redhat.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-11KVM: fix placement of refcount initializationPaolo Bonzini1-2/+2
Reported by syzkaller: ============================= WARNING: suspicious RCU usage ----------------------------- ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by repro_11/12688. stack backtrace: Call Trace: dump_stack+0x7d/0xc5 lockdep_rcu_suspicious+0x123/0x170 kvm_dev_ioctl+0x9a9/0x1260 [kvm] do_vfs_ioctl+0x1a1/0xfb0 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x73/0xb0 do_syscall_64+0x108/0xaa0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Commit a97b0e773e4 (kvm: call kvm_arch_destroy_vm if vm creation fails) sets users_count to 1 before kvm_arch_init_vm(), however, if kvm_arch_init_vm() fails, we need to decrease this count. By moving it earlier, we can push the decrease to out_err_no_arch_destroy_vm without introducing yet another error label. syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=15209b84e00000 Reported-by: syzbot+75475908cd0910f141ee@syzkaller.appspotmail.com Fixes: a97b0e773e49 ("kvm: call kvm_arch_destroy_vm if vm creation fails") Cc: Jim Mattson <jmattson@google.com> Analyzed-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-11KVM: Fix NULL-ptr deref after kvm_create_vm failsPaolo Bonzini1-9/+9
Reported by syzkaller: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 14727 Comm: syz-executor.3 Not tainted 5.4.0-rc4+ #0 RIP: 0010:kvm_coalesced_mmio_init+0x5d/0x110 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:121 Call Trace: kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:3446 [inline] kvm_dev_ioctl+0x781/0x1490 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3494 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x196/0x1150 fs/ioctl.c:696 ksys_ioctl+0x62/0x90 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718 do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Commit 9121923c457d ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm") moves memslots and buses allocations around, however, if kvm->srcu/irq_srcu fails initialization, NULL will be returned instead of error code, NULL will not be intercepted in kvm_dev_ioctl_create_vm() and be dereferenced by kvm_coalesced_mmio_init(), this patch fixes it. Moving the initialization is required anyway to avoid an incorrect synchronize_srcu that was also reported by syzkaller: wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 __synchronize_srcu+0x197/0x250 kernel/rcu/srcutree.c:921 synchronize_srcu_expedited kernel/rcu/srcutree.c:946 [inline] synchronize_srcu+0x239/0x3e8 kernel/rcu/srcutree.c:997 kvm_page_track_unregister_notifier+0xe7/0x130 arch/x86/kvm/page_track.c:212 kvm_mmu_uninit_vm+0x1e/0x30 arch/x86/kvm/mmu.c:5828 kvm_arch_destroy_vm+0x4a2/0x5f0 arch/x86/kvm/x86.c:9579 kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:702 [inline] so do it. Reported-by: syzbot+89a8060879fa0bd2db4f@syzkaller.appspotmail.com Reported-by: syzbot+e27e7027eb2b80e44225@syzkaller.appspotmail.com Fixes: 9121923c457d ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm") Cc: Jim Mattson <jmattson@google.com> Cc: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-08Merge remote-tracking branch 'kvmarm/misc-5.5' into kvmarm/nextMarc Zyngier8-50/+55
2019-11-08KVM: arm64: Opportunistically turn off WFI trapping when using direct LPI injectionMarc Zyngier1-2/+2
Just like we do for WFE trapping, it can be useful to turn off WFI trapping when the physical CPU is not oversubscribed (that is, the vcpu is the only runnable process on this CPU) *and* that we're using direct injection of interrupts. The conditions are reevaluated on each vcpu_load(), ensuring that we don't switch to this mode on a busy system. On a GICv4 system, this has the effect of reducing the generation of doorbell interrupts to zero when the right conditions are met, which is a huge improvement over the current situation (where the doorbells are screaming if the CPU ever hits a blocking WFI). Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Link: https://lore.kernel.org/r/20191107160412.30301-3-maz@kernel.org
2019-11-08KVM: vgic-v4: Track the number of VLPIs per vcpuMarc Zyngier3-0/+6
In order to find out whether a vcpu is likely to be the target of VLPIs (and to further optimize the way we deal with those), let's track the number of VLPIs a vcpu can receive. This gets implemented with an atomic variable that gets incremented or decremented on map, unmap and move of a VLPI. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Link: https://lore.kernel.org/r/20191107160412.30301-2-maz@kernel.org
2019-11-07KVM: arm/arm64: Let the timer expire in hardirq context on RTThomas Gleixner1-4/+4
The timers are canceled from an preempt-notifier which is invoked with disabled preemption which is not allowed on PREEMPT_RT. The timer callback is short so in could be invoked in hard-IRQ context on -RT. Let the timer expire on hard-IRQ context even on -RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Julien Grall <julien.grall@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191107095424.16647-1-bigeasy@linutronix.de
2019-11-04kvm: x86: mmu: Recovery of shattered NX large pagesJunaid Shahid1-0/+28
The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is not longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-11-04kvm: Add helper function for creating VM worker threadsJunaid Shahid1-0/+84
Add a function to create a kernel thread associated with a given VM. In particular, it ensures that the worker thread inherits the priority and cgroups of the calling thread. Signed-off-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-10-31kvm: call kvm_arch_destroy_vm if vm creation failsJim Mattson1-5/+7
In kvm_create_vm(), if we've successfully called kvm_arch_init_vm(), but then fail later in the function, we need to call kvm_arch_destroy_vm() so that it can do any necessary cleanup (like freeing memory). Fixes: 44a95dae1d229a ("KVM: x86: Detect and Initialize AVIC support") Signed-off-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Junaid Shahid <junaids@google.com> [Remove dependency on "kvm: Don't clear reference count on kvm_create_vm() error path" which was not committed. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-29KVM: arm/arm64: vgic: Don't rely on the wrong pending tableZenghui Yu1-3/+3
It's possible that two LPIs locate in the same "byte_offset" but target two different vcpus, where their pending status are indicated by two different pending tables. In such a scenario, using last_byte_offset optimization will lead KVM relying on the wrong pending table entry. Let us use last_ptr instead, which can be treated as a byte index into a pending table and also, can be vcpu specific. Fixes: 280771252c1b ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES") Cc: stable@vger.kernel.org Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20191029071919.177-4-yuzenghui@huawei.com
2019-10-29KVM: arm/arm64: vgic: Fix some comments typoZenghui Yu2-2/+2
Fix various comments, including wrong function names, grammar mistakes and specification references. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191029071919.177-3-yuzenghui@huawei.com
2019-10-28KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/putMarc Zyngier5-39/+38
When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
2019-10-25kvm: Allocate memslots and buses before calling kvm_arch_init_vmJim Mattson1-19/+21
This reorganization will allow us to call kvm_arch_destroy_vm in the event that kvm_create_vm fails after calling kvm_arch_init_vm. Suggested-by: Junaid Shahid <junaids@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Junaid Shahid <junaids@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-24Merge remote-tracking branch 'kvmarm/kvm-arm64/stolen-time' into kvmarm-master/nextMarc Zyngier5-85/+218
2019-10-22KVM: Add separate helper for putting borrowed reference to kvmSean Christopherson1-2/+14
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken. [*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22Merge tag 'kvmarm-fixes-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini1-13/+35
KVM/arm fixes for 5.4, take #2 Special PMU edition: - Fix cycle counter truncation - Fix cycle counter overflow limit on pure 64bit system - Allow chained events to be actually functional - Correct sample period after overflow
2019-10-22KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabledWanpeng Li1-13/+16
Don't waste cycles to shrink/grow vCPU halt_poll_ns if host side polling is disabled. Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-21KVM: arm64: Provide VCPU attributes for stolen timeSteven Price1-0/+59
Allow user space to inform the KVM host where in the physical memory map the paravirtualized time structures should be located. User space can set an attribute on the VCPU providing the IPA base address of the stolen time structure for that VCPU. This must be repeated for every VCPU in the VM. The address is given in terms of the physical address visible to the guest and must be 64 byte aligned. The guest will discover the address via a hypercall. Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2019-10-21KVM: Allow kvm_device_ops to be constSteven Price1-3/+3
Currently a kvm_device_ops structure cannot be const without triggering compiler warnings. However the structure doesn't need to be written to and, by marking it const, it can be read-only in memory. Add some more const keywords to allow this. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>