linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2020-01-24	KVM: x86: Invoke kvm_vcpu_uninit() immediately prior to freeing vcpu	Sean Christopherson	1	-2/+2
	Move the call to kvm_vcpu_uninit() in kvm_arch_vcpu_destroy() down a few lines so that it is invoked immediately prior to freeing the vCPU. This paves the way for moving the uninit and free sequence to common KVM code without an associated functional change. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: MIPS: Invoke kvm_vcpu_uninit() immediately prior to freeing vcpu	Sean Christopherson	1	-2/+2
	Move the call to kvm_vcpu_uninit() in kvm_arch_vcpu_destroy() down a few lines so that it is invoked immediately prior to freeing the vCPU. This paves the way for moving the uninit and free sequence to common KVM code without an associated functional change. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: s390: Invoke kvm_vcpu_init() before allocating sie_page	Sean Christopherson	1	-8/+10
	Now that s390's implementation of kvm_arch_vcpu_init() is empty, move the call to kvm_vcpu_init() above the allocation of the sie_page. This paves the way for moving vcpu allocation and initialization into common KVM code without any associated functional change. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: s390: Move guts of kvm_arch_vcpu_init() into kvm_arch_vcpu_create()	Sean Christopherson	1	-28/+34
	Move all of kvm_arch_vcpu_init(), which is invoked at the very end of kvm_vcpu_init(), into kvm_arch_vcpu_create() in preparation of moving the call to kvm_vcpu_init(). Moving kvm_vcpu_init() is itself a preparatory step for moving allocation and initialization to common KVM code. No functional change inteded. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: Add kvm_arch_vcpu_precreate() to handle pre-allocation issues	Sean Christopherson	7	-19/+43
	Add a pre-allocation arch hook to handle checks that are currently done by arch specific code prior to allocating the vCPU object. This paves the way for moving the allocation to common KVM code. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: Remove kvm_arch_vcpu_free() declaration	Sean Christopherson	1	-1/+0
	Remove KVM's declaration of kvm_arch_vcpu_free() now that the function is gone from all architectures (several architectures were relying on the forward declaration). Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: Drop kvm_arch_vcpu_free()	Sean Christopherson	1	-16/+11
	Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() is called from common code, i.e. choosing which function to whack is not completely arbitrary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: Remove spurious clearing of async #PF MSR	Sean Christopherson	1	-2/+0
	Remove a bogus clearing of apf.msr_val from kvm_arch_vcpu_destroy(). apf.msr_val is only set to a non-zero value by kvm_pv_enable_async_pf(), which is only reachable by kvm_set_msr_common(), i.e. by writing MSR_KVM_ASYNC_PF_EN. KVM does not autonomously write said MSR, i.e. can only be written via KVM_SET_MSRS or KVM_RUN. Since KVM_SET_MSRS and KVM_RUN are vcpu ioctls, they require a valid vcpu file descriptor. kvm_arch_vcpu_destroy() is only called if KVM_CREATE_VCPU fails, and KVM declares KVM_CREATE_VCPU successful once the vcpu fd is installed and thus visible to userspace. Ergo, apf.msr_val cannot be non-zero when kvm_arch_vcpu_destroy() is called. Fixes: 344d9588a9df0 ("KVM: Add PV MSR to enable asynchronous page faults delivery.") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: Remove spurious kvm_mmu_unload() from vcpu destruction path	Sean Christopherson	1	-4/+0
	x86 does not load its MMU until KVM_RUN, which cannot be invoked until after vCPU creation succeeds. Given that kvm_arch_vcpu_destroy() is called if and only if vCPU creation fails, it is impossible for the MMU to be loaded. Note, the bogus kvm_mmu_unload() call was added during an unrelated refactoring of vCPU allocation, i.e. was presumably added as an opportunstic "fix" for a perceived leak. Fixes: fb3f0f51d92d1 ("KVM: Dynamically allocate vcpus") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: arm: Drop kvm_arch_vcpu_free()	Sean Christopherson	1	-7/+2
	Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() is called from common code, i.e. choosing which function to whack is not completely arbitrary. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: PPC: Drop kvm_arch_vcpu_free()	Sean Christopherson	1	-7/+2
	Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() is called from common code, i.e. choosing which function to whack is not completely arbitrary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: MIPS: Drop kvm_arch_vcpu_free()	Sean Christopherson	1	-7/+2
	Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() is called from common code, i.e. choosing which function to whack is not completely arbitrary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: MIPS: Use kvm_vcpu_cache to allocate vCPUs	Sean Christopherson	1	-3/+3
	For reasons unknown, MIPS configures the vCPU allocation cache but allocates vCPUs via kzalloc(). Allocate from the vCPU cache in preparation for moving vCPU allocation to common KVM code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: PPC: Move kvm_vcpu_init() invocation to common code	Sean Christopherson	8	-55/+29
	Move the kvm_cpu_{un}init() calls to common PPC code as an intermediate step towards removing kvm_cpu_{un}init() altogether. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: PPC: e500mc: Move reset of oldpir below call to kvm_vcpu_init()	Sean Christopherson	1	-3/+3
	Move the initialization of oldpir so that the call to kvm_vcpu_init() is at the top of kvmppc_core_vcpu_create_e500mc(). oldpir is only use when loading/putting a vCPU, which currently cannot be done until after kvm_arch_vcpu_create() completes. Reording the call to kvm_vcpu_init() paves the way for moving the invocation to common PPC code. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: PPC: Book3S PR: Allocate book3s and shadow vcpu after common init	Sean Christopherson	1	-11/+11
	Call kvm_vcpu_init() in kvmppc_core_vcpu_create_pr() prior to allocating the book3s and shadow_vcpu objects in preparation of moving said call to common PPC code. Although kvm_vcpu_init() has an arch callback, the callback is empty for Book3S PR, i.e. barring unseen black magic, moving the allocation has no real functional impact. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: PPC: Allocate vcpu struct in common PPC code	Sean Christopherson	8	-77/+51
	Move allocation of all flavors of PPC vCPUs to common PPC code. All variants either allocate 'struct kvm_vcpu' directly, or require that the embedded 'struct kvm_vcpu' member be located at offset 0, i.e. guarantee that the allocation can be directly interpreted as a 'struct kvm_vcpu' object. Remove the message from the build-time assertion regarding placement of the struct, as compatibility with the arch usercopy region is no longer the sole dependent on 'struct kvm_vcpu' being at offset zero. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: PPC: e500mc: Add build-time assert that vcpu is at offset 0	Sean Christopherson	1	-0/+2
	In preparation for moving vcpu allocation to common PPC code, add an explicit, albeit redundant, build-time assert to ensure the vcpu member is located at offset 0. The assert is redundant in the sense that kvmppc_core_vcpu_create_e500() contains a functionally identical assert. The motiviation for adding the extra assert is to provide visual confirmation of the correctness of moving vcpu allocation to common code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: Move kvm_vcpu_init() invocation to common code	Sean Christopherson	4	-29/+25
	Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate step to removing kvm_cpu_{un}init() altogether. Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are per-VM allocations and are intentionally kept if vCPU creation fails. They are freed by kvm_arch_destroy_vm(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: Move allocation of pio_data page down a few lines	Sean Christopherson	1	-20/+13
	Allocate the pio_data page after creating the MMU and local APIC so that all direct memory allocations are grouped together. This allows setting the return value to -ENOMEM prior to starting the allocations instead of setting it in the fail path for every allocation. The pio_data page is only consumed when KVM_RUN is invoked, i.e. moving its allocation has no real functional impact. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: Move FPU allocation to common x86 code	Sean Christopherson	3	-48/+23
	The allocation of FPU structs is identical across VMX and SVM, move it to common x86 code. Somewhat arbitrarily place the allocation so that it resides directly above the associated initialization via fx_init(), e.g. instead of retaining its position with respect to the overall vcpu creation flow. Although the names names kvm_arch_vcpu_create() and kvm_arch_vcpu_init() might suggest otherwise, x86 does not have a clean split between 'create' and 'init'. Allocating the struct immediately prior to the first use arguably improves readability now, and will yield even bigger improvements when kvm_arch_vcpu_init() is removed in a future patch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: Allocate vcpu struct in common x86 code	Sean Christopherson	4	-40/+30
	Move allocation of VMX and SVM vcpus to common x86. Although the struct being allocated is technically a VMX/SVM struct, it can be interpreted directly as a 'struct kvm_vcpu' because of the pre-existing requirement that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu struct. Remove the message from the build-time assertions regarding placement of the struct, as compatibility with the arch usercopy region is no longer the sole dependent on 'struct kvm_vcpu' being at offset zero. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: SVM: Use direct vcpu pointer during vCPU create/free	Sean Christopherson	1	-14/+16
	Capture the vcpu pointer in a local varaible and replace '&svm->vcpu' references with a direct reference to the pointer in anticipation of moving bits of the code to common x86 and passing the vcpu pointer into svm_create_vcpu(), i.e. eliminate unnecessary noise from future patches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: VMX: Use direct vcpu pointer during vCPU create/free	Sean Christopherson	1	-20/+22
	Capture the vcpu pointer in a local varaible and replace '&vmx->vcpu' references with a direct reference to the pointer in anticipation of moving bits of the code to common x86 and passing the vcpu pointer into vmx_create_vcpu(), i.e. eliminate unnecessary noise from future patches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: VMX: Allocate VPID after initializing VCPU	Sean Christopherson	1	-3/+3
	Do VPID allocation after calling the common kvm_vcpu_init() as a step towards doing vCPU allocation (via kmem_cache_zalloc()) and calling kvm_vcpu_init() back-to-back. Squishing allocation and initialization together will eventually allow the sequence to be moved to arch-agnostic creation code. Note, the VPID is not consumed until KVM_RUN, slightly delaying its allocation should have no real function impact. VPID allocation was arbitrarily placed in the original patch, commit 2384d2b326408 ("KVM: VMX: Enable Virtual Processor Identification (VPID)"). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: Free wbinvd_dirty_mask if vCPU creation fails	Sean Christopherson	1	-1/+1
	Free the vCPU's wbinvd_dirty_mask if vCPU creation fails after kvm_arch_vcpu_init(), e.g. when installing the vCPU's file descriptor. Do the freeing by calling kvm_arch_vcpu_free() instead of open coding the freeing. This adds a likely superfluous, but ultimately harmless, call to kvmclock_reset(), which only clears vcpu->arch.pv_time_enabled. Using kvm_arch_vcpu_free() allows for additional cleanup in the future. Fixes: f5f48ee15c2ee ("KVM: VMX: Execute WBINVD to keep data consistency with assigned devices") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: PPC: Book3S PR: Free shared page if mmu initialization fails	Sean Christopherson	1	-1/+3
	Explicitly free the shared page if kvmppc_mmu_init() fails during kvmppc_core_vcpu_create(), as the page is freed only in kvmppc_core_vcpu_free(), which is not reached via kvm_vcpu_uninit(). Fixes: 96bc451a15329 ("KVM: PPC: Introduce shared page") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: PPC: Book3S HV: Uninit vCPU if vcore creation fails	Sean Christopherson	1	-1/+3
	Call kvm_vcpu_uninit() if vcore creation fails to avoid leaking any resources allocated by kvm_vcpu_init(), i.e. the vcpu->run page. Fixes: 371fefd6f2dc4 ("KVM: PPC: Allow book3s_hv guests to use SMT processor modes") Cc: stable@vger.kernel.org Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24	KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL	Paolo Bonzini	4	-10/+29
	If the guest is configured to have SPEC_CTRL but the host does not (which is a nonsensical configuration but these are not explicitly forbidden) then a host-initiated MSR write can write vmx->spec_ctrl (respectively svm->spec_ctrl) and trigger a #GP when KVM tries to restore the host value of the MSR. Add a more comprehensive check for valid bits of SPEC_CTRL, covering host CPUID flags and, since we are at it and it is more correct that way, guest CPUID flags too. For AMD, remove the unnecessary is_guest_mode check around setting the MSR interception bitmap, so that the code looks the same as for Intel. Cc: Jim Mattson <jmattson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-23	KVM: async_pf: drop kvm_arch_async_page_present wrappers	Paolo Bonzini	1	-17/+4
	The wrappers make it less clear that the position of the call to kvm_arch_async_page_present depends on the architecture, and that only one of the two call sites will actually be active. Remove them. Cc: Andy Lutomirski <luto@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-23	KVM: x86: list MSR_IA32_UCODE_REV as an emulated MSR	Paolo Bonzini	1	-0/+1
	Even if it's read-only, it can still be written to by userspace. Let them know by adding it to KVM_GET_MSR_INDEX_LIST. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-23	tools/kvm_stat: Fix kvm_exit filter name	Gavin Shan	1	-2/+6
	The filter name is fixed to "exit_reason" for some kvm_exit events, no matter what architect we have. Actually, the filter name ("exit_reason") is only applicable to x86, meaning it's broken on other architects including aarch64. This fixes the issue by providing various kvm_exit filter names, depending on architect we're on. Afterwards, the variable filter name is picked and applied through ioctl(fd, SET_FILTER). Reported-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-23	KVM: x86: fix overlap between SPTE_MMIO_MASK and generation	Paolo Bonzini	1	-5/+5
	The SPTE_MMIO_MASK overlaps with the bits used to track MMIO generation number. A high enough generation number would overwrite the SPTE_SPECIAL_MASK region and cause the MMIO SPTE to be misinterpreted. Likewise, setting bits 52 and 53 would also cause an incorrect generation number to be read from the PTE, though this was partially mitigated by the (useless if it weren't for the bug) removal of SPTE_SPECIAL_MASK from the spte in get_mmio_spte_generation. Drop that removal, and replace it with a compile-time assertion. Fixes: 6eeb4ef049e7 ("KVM: x86: assign two bits to track SPTE kinds") Reported-by: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	kvm: Refactor handling of VM debugfs files	Milan Pandurov	2	-73/+76
	We can store reference to kvm_stats_debugfs_item instead of copying its values to kvm_stat_data. This allows us to remove duplicated code and usage of temporary kvm_stat_data inside vm_stat_get et al. Signed-off-by: Milan Pandurov <milanpa@amazon.de> Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86/mmu: Apply max PA check for MMIO sptes to 32-bit KVM	Sean Christopherson	1	-1/+1
	Remove the bogus 64-bit only condition from the check that disables MMIO spte optimization when the system supports the max PA, i.e. doesn't have any reserved PA bits. 32-bit KVM always uses PAE paging for the shadow MMU, and per Intel's SDM: PAE paging translates 32-bit linear addresses to 52-bit physical addresses. The kernel's restrictions on max physical addresses are limits on how much memory the kernel can reasonably use, not what physical addresses are supported by hardware. Fixes: ce88decffd17 ("KVM: MMU: mmio page fault support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: nVMX: vmread should not set rflags to specify success in case of #PF	Miaohe Lin	1	-1/+3
	In case writing to vmread destination operand result in a #PF, vmread should not call nested_vmx_succeed() to set rflags to specify success. Similar to as done in VMPTRST (See handle_vmptrst()). Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86/mmu: Micro-optimize nEPT's bad memptype/XWR checks	Sean Christopherson	2	-14/+31
	Rework the handling of nEPT's bad memtype/XWR checks to micro-optimize the checks as much as possible. Move the check to a separate helper, __is_bad_mt_xwr(), which allows the guest_rsvd_check usage in paging_tmpl.h to omit the check entirely for paging32/64 (bad_mt_xwr is always zero for non-nEPT) while retaining the bitwise-OR of the current code for the shadow_zero_check in walk_shadow_page_get_mmio_spte(). Add a comment for the bitwise-OR usage in the mmio spte walk to avoid future attempts to "fix" the code, which is what prompted this optimization in the first place[]. Opportunistically remove the superfluous '!= 0' and parantheses, and use BIT_ULL() instead of open coding its equivalent. The net effect is that code generation is largely unchanged for walk_shadow_page_get_mmio_spte(), marginally better for ept_prefetch_invalid_gpte(), and significantly improved for paging32/64_prefetch_invalid_gpte(). Note, walk_shadow_page_get_mmio_spte() can't use a templated version of the memtype/XRW as it works on the host's shadow PTEs, e.g. checks that KVM hasn't borked its EPT tables. Even if it could be templated, the benefits of having a single implementation far outweight the few uops that would be saved for NPT or non-TDP paging, e.g. most compilers inline it all the way to up kvm_mmu_page_fault(). [] https://lkml.kernel.org/r/20200108001859.25254-1-sean.j.christopherson@intel.com Cc: Jim Mattson <jmattson@google.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86/mmu: Reorder the reserved bit check in prefetch_invalid_gpte()	Sean Christopherson	1	-3/+3
	Move the !PRESENT and !ACCESSED checks in FNAME(prefetch_invalid_gpte) above the call to is_rsvd_bits_set(). For a well behaved guest, the !PRESENT and !ACCESSED are far more likely to evaluate true than the reserved bit checks, and they do not require additional memory accesses. Before: Dump of assembler code for function paging32_prefetch_invalid_gpte: 0x0000000000044240 <+0>: callq 0x44245 <paging32_prefetch_invalid_gpte+5> 0x0000000000044245 <+5>: mov %rcx,%rax 0x0000000000044248 <+8>: shr $0x7,%rax 0x000000000004424c <+12>: and $0x1,%eax 0x000000000004424f <+15>: lea 0x0(,%rax,4),%r8 0x0000000000044257 <+23>: add %r8,%rax 0x000000000004425a <+26>: mov %rcx,%r8 0x000000000004425d <+29>: and 0x120(%rsi,%rax,8),%r8 0x0000000000044265 <+37>: mov 0x170(%rsi),%rax 0x000000000004426c <+44>: shr %cl,%rax 0x000000000004426f <+47>: and $0x1,%eax 0x0000000000044272 <+50>: or %rax,%r8 0x0000000000044275 <+53>: jne 0x4427c <paging32_prefetch_invalid_gpte+60> 0x0000000000044277 <+55>: test $0x1,%cl 0x000000000004427a <+58>: jne 0x4428a <paging32_prefetch_invalid_gpte+74> 0x000000000004427c <+60>: mov %rdx,%rsi 0x000000000004427f <+63>: callq 0x44080 <drop_spte> 0x0000000000044284 <+68>: mov $0x1,%eax 0x0000000000044289 <+73>: retq 0x000000000004428a <+74>: xor %eax,%eax 0x000000000004428c <+76>: and $0x20,%ecx 0x000000000004428f <+79>: jne 0x44289 <paging32_prefetch_invalid_gpte+73> 0x0000000000044291 <+81>: mov %rdx,%rsi 0x0000000000044294 <+84>: callq 0x44080 <drop_spte> 0x0000000000044299 <+89>: mov $0x1,%eax 0x000000000004429e <+94>: jmp 0x44289 <paging32_prefetch_invalid_gpte+73> End of assembler dump. After: Dump of assembler code for function paging32_prefetch_invalid_gpte: 0x0000000000044240 <+0>: callq 0x44245 <paging32_prefetch_invalid_gpte+5> 0x0000000000044245 <+5>: test $0x1,%cl 0x0000000000044248 <+8>: je 0x4424f <paging32_prefetch_invalid_gpte+15> 0x000000000004424a <+10>: test $0x20,%cl 0x000000000004424d <+13>: jne 0x4425d <paging32_prefetch_invalid_gpte+29> 0x000000000004424f <+15>: mov %rdx,%rsi 0x0000000000044252 <+18>: callq 0x44080 <drop_spte> 0x0000000000044257 <+23>: mov $0x1,%eax 0x000000000004425c <+28>: retq 0x000000000004425d <+29>: mov %rcx,%rax 0x0000000000044260 <+32>: mov (%rsi),%rsi 0x0000000000044263 <+35>: shr $0x7,%rax 0x0000000000044267 <+39>: and $0x1,%eax 0x000000000004426a <+42>: lea 0x0(,%rax,4),%r8 0x0000000000044272 <+50>: add %r8,%rax 0x0000000000044275 <+53>: mov %rcx,%r8 0x0000000000044278 <+56>: and 0x120(%rsi,%rax,8),%r8 0x0000000000044280 <+64>: mov 0x170(%rsi),%rax 0x0000000000044287 <+71>: shr %cl,%rax 0x000000000004428a <+74>: and $0x1,%eax 0x000000000004428d <+77>: mov %rax,%rcx 0x0000000000044290 <+80>: xor %eax,%eax 0x0000000000044292 <+82>: or %rcx,%r8 0x0000000000044295 <+85>: je 0x4425c <paging32_prefetch_invalid_gpte+28> 0x0000000000044297 <+87>: mov %rdx,%rsi 0x000000000004429a <+90>: callq 0x44080 <drop_spte> 0x000000000004429f <+95>: mov $0x1,%eax 0x00000000000442a4 <+100>: jmp 0x4425c <paging32_prefetch_invalid_gpte+28> End of assembler dump. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: SVM: Override default MMIO mask if memory encryption is enabled	Tom Lendacky	1	-0/+43
	The KVM MMIO support uses bit 51 as the reserved bit to cause nested page faults when a guest performs MMIO. The AMD memory encryption support uses a CPUID function to define the encryption bit position. Given this, it is possible that these bits can conflict. Use svm_hardware_setup() to override the MMIO mask if memory encryption support is enabled. Various checks are performed to ensure that the mask is properly defined and rsvd_bits() is used to generate the new mask (as was done prior to the change that necessitated this patch). Fixes: 28a1f3ac1d0c ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: vmx: delete meaningless nested_vmx_prepare_msr_bitmap() declaration	Miaohe Lin	1	-3/+0
	The function nested_vmx_prepare_msr_bitmap() declaration is below its implementation. So this is meaningless and should be removed. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Refactor and rename bit() to feature_bit() macro	Sean Christopherson	5	-28/+30
	Rename bit() to __feature_bit() to give it a more descriptive name, and add a macro, feature_bit(), to stuff the X68_FEATURE_ prefix to keep line lengths manageable for code that hardcodes the bit to be retrieved. No functional change intended. Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Expand build-time assertion on reverse CPUID usage	Sean Christopherson	2	-6/+30
	Add build-time checks to ensure KVM isn't trying to do a reverse CPUID lookup on Linux-defined feature bits, along with comments to explain the gory details of X86_FEATUREs and bit(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Add CPUID_7_1_EAX to the reverse CPUID table	Sean Christopherson	1	-0/+1
	Add an entry for CPUID_7_1_EAX in the reserve_cpuid array in preparation for incorporating the array in bit() build-time assertions, specifically to avoid an assertion on F(AVX512_BF16) in do_cpuid_7_mask(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Move bit() helper to cpuid.h	Sean Christopherson	2	-5/+5
	Move bit() to cpuid.h in preparation for incorporating the reverse_cpuid array in bit() build-time assertions. Opportunistically use the BIT() macro instead of open-coding the shift. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Add dedicated emulator helpers for querying CPUID features	Sean Christopherson	3	-18/+25
	Add feature-specific helpers for querying guest CPUID support from the emulator instead of having the emulator do a full CPUID and perform its own bit tests. The primary motivation is to eliminate the emulator's usage of bit() so that future patches can add more extensive build-time assertions on the usage of bit() without having to expose yet more code to the emulator. Note, providing a generic guest_cpuid_has() to the emulator doesn't work due to the existing built-time assertions in guest_cpuid_has(), which require the feature being checked to be a compile-time constant. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Add macro to ensure reserved cr4 bits checks stay in sync	Sean Christopherson	1	-40/+25
	Add a helper macro to generate the set of reserved cr4 bits for both host and guest to ensure that adding a check on guest capabilities is also added for host capabilities, and vice versa. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Drop special XSAVE handling from guest_cpuid_has()	Sean Christopherson	3	-4/+2
	Now that KVM prevents setting host-reserved CR4 bits, drop the dedicated XSAVE check in guest_cpuid_has() in favor of open coding similar checks in the SVM/VMX XSAVES enabling flows. Note, checking boot_cpu_has(X86_FEATURE_XSAVE) in the XSAVES flows is technically redundant with respect to the CR4 reserved bit checks, e.g. XSAVES #UDs if CR4.OSXSAVE=0 and arch.xsaves_enabled is consumed if and only if CR4.OXSAVE=1 in guest. Keep (add?) the explicit boot_cpu_has() checks to help document KVM's usage of arch.xsaves_enabled. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Ensure all logical CPUs have consistent reserved cr4 bits	Sean Christopherson	1	-0/+7
	Check the current CPU's reserved cr4 bits against the mask calculated for the boot CPU to ensure consistent behavior across all CPUs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: x86: Don't let userspace set host-reserved cr4 bits	Sean Christopherson	1	-1/+34
	Calculate the host-reserved cr4 bits at runtime based on the system's capabilities (using logic similar to __do_cpuid_func()), and use the dynamically generated mask for the reserved bit check in kvm_set_cr4() instead using of the static CR4_RESERVED_BITS define. This prevents userspace from "enabling" features in cr4 that are not supported by the system, e.g. by ignoring KVM_GET_SUPPORTED_CPUID and specifying a bogus CPUID for the vCPU. Allowing userspace to set unsupported bits in cr4 can lead to a variety of undesirable behavior, e.g. failed VM-Enter, and in general increases KVM's attack surface. A crafty userspace can even abuse CR4.LA57 to induce an unchecked #GP on a WRMSR. On a platform without LA57 support: KVM_SET_CPUID2 // CPUID_7_0_ECX.LA57 = 1 KVM_SET_SREGS // CR4.LA57 = 1 KVM_SET_MSRS // KERNEL_GS_BASE = 0x0004000000000000 KVM_RUN leads to a #GP when writing KERNEL_GS_BASE into hardware: unchecked MSR access error: WRMSR to 0xc0000102 (tried to write 0x0004000000000000) at rIP: 0xffffffffa00f239a (vmx_prepare_switch_to_guest+0x10a/0x1d0 [kvm_intel]) Call Trace: kvm_arch_vcpu_ioctl_run+0x671/0x1c70 [kvm] kvm_vcpu_ioctl+0x36b/0x5d0 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fc08133bf47 Note, the above sequence fails VM-Enter due to invalid guest state. Userspace can allow VM-Enter to succeed (after the WRMSR #GP) by adding a KVM_SET_SREGS w/ CR4.LA57=0 after KVM_SET_MSRS, in which case KVM will technically leak the host's KERNEL_GS_BASE into the guest. But, as KERNEL_GS_BASE is a userspace-defined value/address, the leak is largely benign as a malicious userspace would simply be exposing its own data to the guest, and attacking a benevolent userspace would require multiple bugs in the userspace VMM. Cc: stable@vger.kernel.org Cc: Jun Nakajima <jun.nakajima@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21	KVM: VMX: Add helper to consolidate up PT/RTIT WRMSR fault logic	Sean Christopherson	1	-24/+31
	Add a helper to consolidate the common checks for writing PT MSRs, and opportunistically clean up the formatting of the affected code. No functional change intended. Cc: Chao Peng <chao.p.peng@linux.intel.com> Cc: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>